From QIIME 2 to R — Visualizing Your 16S Amplicon Results

From QIIME 2 to R — visualizing your 16S amplicon results

🧬 𝐷𝑎𝑦 73 𝑜𝑓 𝐷𝑎𝑖𝑙𝑦 𝐵𝑖𝑜𝑖𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑐𝑠 𝑓𝑟𝑜𝑚 𝐽𝑜𝑗𝑦’𝑠

You ran your 16S samples through QIIME 2. You have feature tables, taxonomy assignments, a phylogenetic tree, and alpha and beta diversity results. You opened a few .qzv files in the browser, which is great for exploration — but when it comes to making figures for a paper, or for priliminary analysis you want full control in R.

This post is the bridge between your QIIME 2 outputs and your first set of publication-quality figures. We will export every artifact you need, convert it into formats R can read, and then build the figures step by step.

Unlike my earlier 16S series that relied on microeco for downstream visualization, this workflow uses simple, direct R plotting from exported QIIME 2 outputs, which makes it easier to customize figures for papers, reports, blog posts, and LinkedIn.

What we are building: reads retained per sample, ASV counts, alpha diversity (Shannon, observed features, Faith’s PD), beta diversity (Bray-Curtis PCoA + PERMANOVA), and stacked taxonomy barplots at phylum, family, and genus level — all combined into a single multi-panel figure.


What you need before starting

This guide assumes you have already completed the QIIME 2 analysis (import → trim → DADA2 → taxonomy → filtering → core metrics). Your working directory should contain files like these:

table-no-contam_240_200.qza
taxonomy_240_200.qza
denoising-stats_240_200.qza
rooted-tree_240_200.qza
core-metrics-results_50000/
my_metadata.txt

File names will vary by project — adapt them throughout.


see_your_Genome

Part 1: Exporting from QIIME 2

QIIME 2 stores everything in .qza and .qzv files, which are essentially zip archives with provenance tracking. Before R can read them, you need to export the underlying data files.

Step 1: Create a clean export directory

Keep all exported files together in one folder. This makes paths in R much simpler.

mkdir -p qiime2_R_export

Step 2: Export the feature table

qiime tools export \
  --input-path table-no-contam_240_200.qza \
  --output-path qiime2_R_export/table

This creates qiime2_R_export/table/feature-table.biom. BIOM is a compressed binary format — R cannot read it directly, so we convert it to TSV:

biom convert \
  -i qiime2_R_export/table/feature-table.biom \
  -o qiime2_R_export/table/feature-table.tsv \
  --to-tsv

Step 3: Export taxonomy

qiime tools export \
  --input-path taxonomy_240_200.qza \
  --output-path qiime2_R_export/taxonomy

This produces taxonomy.tsv — a two-column file with feature IDs and their SILVA taxonomy strings.

Step 4: Export denoising statistics

qiime tools export \
  --input-path denoising-stats_240_200.qza \
  --output-path qiime2_R_export/denoising_stats

This gives you stats.tsv — the per-sample read counts at every DADA2 step (input → filtered → denoised → merged → non-chimeric). This is important for quality checking.

Step 5: Export alpha diversity vectors

qiime tools export \
  --input-path core-metrics-results_50000/shannon_vector.qza \
  --output-path qiime2_R_export/shannon

qiime tools export \
  --input-path core-metrics-results_50000/observed_features_vector.qza \
  --output-path qiime2_R_export/observed_features

qiime tools export \
  --input-path core-metrics-results_50000/faith_pd_vector.qza \
  --output-path qiime2_R_export/faith_pd

Each of these exports a simple two-column TSV: sample ID and diversity value.

Step 6: Export beta diversity outputs

qiime tools export \
  --input-path core-metrics-results_50000/bray_curtis_distance_matrix.qza \
  --output-path qiime2_R_export/bray_distance

qiime tools export \
  --input-path core-metrics-results_50000/bray_curtis_pcoa_results.qza \
  --output-path qiime2_R_export/bray_pcoa

Step 7: Collapse taxonomy and convert to relative abundance

For stacked barplots, you want the feature table aggregated at phylum, family, and genus level. QIIME 2 does this cleanly with taxa collapse.

SILVA taxonomy levels:

  • Level 2 = Phylum
  • Level 5 = Family
  • Level 6 = Genus
# Collapse
qiime taxa collapse \
  --i-table table-no-contam_240_200.qza \
  --i-taxonomy taxonomy_240_200.qza \
  --p-level 2 \
  --o-collapsed-table phylum.qza

qiime taxa collapse \
  --i-table table-no-contam_240_200.qza \
  --i-taxonomy taxonomy_240_200.qza \
  --p-level 5 \
  --o-collapsed-table family.qza

qiime taxa collapse \
  --i-table table-no-contam_240_200.qza \
  --i-taxonomy taxonomy_240_200.qza \
  --p-level 6 \
  --o-collapsed-table genus.qza

# Convert to relative abundance
qiime feature-table relative-frequency --i-table phylum.qza --o-relative-frequency-table phylum_rel.qza
qiime feature-table relative-frequency --i-table family.qza --o-relative-frequency-table family_rel.qza
qiime feature-table relative-frequency --i-table genus.qza  --o-relative-frequency-table genus_rel.qza

# Export
qiime tools export --input-path phylum_rel.qza --output-path qiime2_R_export/phylum
qiime tools export --input-path family_rel.qza --output-path qiime2_R_export/family
qiime tools export --input-path genus_rel.qza  --output-path qiime2_R_export/genus

# Convert BIOM to TSV for each
cd qiime2_R_export
biom convert -i phylum/feature-table.biom -o phylum/feature-table.tsv --to-tsv
biom convert -i family/feature-table.biom -o family/feature-table.tsv --to-tsv
biom convert -i genus/feature-table.biom  -o genus/feature-table.tsv  --to-tsv

Step 8: Copy your metadata

cp my_metadata.txt qiime2_R_export/

What your export folder should look like

qiime2_R_export/
├── table/
│   ├── feature-table.biom
│   └── feature-table.tsv
├── taxonomy/
│   └── taxonomy.tsv
├── denoising_stats/
│   └── stats.tsv
├── shannon/
│   └── alpha-diversity.tsv
├── observed_features/
│   └── alpha-diversity.tsv
├── faith_pd/
│   └── alpha-diversity.tsv
├── bray_distance/
│   └── distance-matrix.tsv
├── bray_pcoa/
│   └── ordination.txt
├── phylum/
│   └── feature-table.tsv
├── family/
│   └── feature-table.tsv
├── genus/
│   └── feature-table.tsv
└── my_metadata.txt

Part 2: Visualization in R

Set your working directory to the export folder first:

setwd("path/to/qiime2_R_export")

Libraries

library(tidyverse)   # data manipulation + ggplot2
library(vegan)       # PERMANOVA (adonis2)
library(biomformat)  # read BIOM files if needed
library(viridis)     # colour palettes
library(patchwork)   # combine multiple plots
library(cowplot)     # draw_label for text panels

A: Read metadata

Your metadata file is the anchor that connects every other file. Every sample ID in every TSV must match the IDs here exactly.

meta <- read.delim("my_metadata.txt", sep = "\t", check.names = FALSE)
colnames(meta)[1] <- "SampleID"

# Set a sensible factor order for treatments
meta$Treatment <- factor(meta$Treatment, levels = c("NC", "NPK", "nOB9"))

B: Denoising statistics — how many reads did DADA2 keep?

denoise <- read.delim("denoising_stats/stats.tsv",
                      sep = "\t", check.names = FALSE, stringsAsFactors = FALSE)

# Clean column names (QIIME2 uses hyphens and spaces)
colnames(denoise) <- gsub("[-  ]", "_", colnames(denoise))
colnames(denoise)[1] <- "SampleID"

# Remove the QIIME2 type annotation row
denoise <- denoise[denoise$SampleID != "#q2:types", ]

# Convert counts to numeric
num_cols <- setdiff(colnames(denoise), "SampleID")
denoise[num_cols] <- lapply(denoise[num_cols], as.numeric)

Why the #q2:types row? QIIME 2 metadata files include a second header row that describes data types (e.g. numeric, categorical). It looks like data but it is metadata about the metadata. Always remove it before any numeric conversion.

reads_plot <- ggplot(denoise, aes(x = SampleID, y = non_chimeric, fill = SampleID)) +
  geom_col() +
  theme_bw() +
  labs(title = "Reads retained after DADA2", x = NULL, y = "Non-chimeric reads") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1), legend.position = "none")

C: ASV counts per sample

library(biomformat)

table_biom <- read_biom("table/feature-table.biom")
asv_mat    <- as.data.frame(as.matrix(biom_data(table_biom)))

asv_per_sample <- colSums(asv_mat)
asv_df         <- data.frame(SampleID = names(asv_per_sample), ASV_count = asv_per_sample)

asv_plot <- ggplot(asv_df, aes(x = SampleID, y = ASV_count, fill = SampleID)) +
  geom_col() +
  theme_bw() +
  labs(title = "Total ASV abundance per sample", x = NULL, y = "Total ASV abundance") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1), legend.position = "none")

D: Alpha diversity — Shannon, observed features, Faith’s PD

shannon <- read.delim("shannon/alpha-diversity.tsv",          sep = "\t")
obs     <- read.delim("observed_features/alpha-diversity.tsv", sep = "\t")
faith   <- read.delim("faith_pd/alpha-diversity.tsv",          sep = "\t")

colnames(shannon) <- c("SampleID", "Shannon")
colnames(obs)     <- c("SampleID", "Observed")
colnames(faith)   <- c("SampleID", "FaithPD")

alpha_df <- meta %>%
  left_join(shannon, by = "SampleID") %>%
  left_join(obs,     by = "SampleID") %>%
  left_join(faith,   by = "SampleID")

Plot all three metrics on one faceted panel:

alpha_long <- alpha_df %>%
  pivot_longer(cols = c(Shannon, Observed, FaithPD),
               names_to = "Metric", values_to = "Value")

alpha_plot <- ggplot(alpha_long, aes(x = Treatment, y = Value, fill = Treatment)) +
  geom_boxplot(alpha = 0.5, outlier.shape = NA) +
  geom_jitter(width = 0.1, size = 2) +
  facet_wrap(~ Metric, scales = "free_y") +
  scale_fill_viridis_d(option = "plasma") +
  theme_bw() +
  labs(title = "Alpha diversity", x = NULL, y = NULL) +
  theme(strip.text = element_text(face = "bold"), legend.position = "none")

Optional: statistical testing

kruskal.test(Shannon ~ Treatment, data = alpha_df)
kruskal.test(Observed ~ Treatment, data = alpha_df)
kruskal.test(FaithPD  ~ Treatment, data = alpha_df)

# Pairwise comparisons
pairwise.wilcox.test(alpha_df$Shannon, alpha_df$Treatment, p.adjust.method = "BH")

What these metrics mean:

  • Shannon index — accounts for both richness (how many taxa) and evenness (how evenly distributed). Higher = more diverse.
  • Observed features — simply the count of unique ASVs detected. Sensitive to sampling depth.
  • Faith’s PD — phylogenetic diversity. Sums the branch lengths of the phylogenetic tree covered by the detected ASVs. Higher = more evolutionary breadth.

E: Beta diversity — Bray-Curtis PCoA + PERMANOVA

Reading the PCoA ordination file requires a small trick: QIIME 2 writes a header with metadata about eigenvalues above the coordinate data. We skip 4 lines to get to the actual coordinates.

pcoa <- read.delim("bray_pcoa/ordination.txt",
                   sep = "\t", skip = 4, check.names = FALSE,
                   row.names = 1)

# Keep only sample rows (not the "Site" or "Species" annotation rows)
coords <- pcoa[!(rownames(pcoa) %in% c("Species", "Site")), 1:2, drop = FALSE]
coords$SampleID <- rownames(coords)
colnames(coords) <- c("PC1", "PC2", "SampleID")

coords$PC1 <- as.numeric(coords$PC1)
coords$PC2 <- as.numeric(coords$PC2)

ord_df <- left_join(coords, meta, by = "SampleID")

PERMANOVA tests whether treatment groups have significantly different community composition:

dist_mat <- read.delim("bray_distance/distance-matrix.tsv",
                       sep = "\t", row.names = 1, check.names = FALSE)
dist_obj <- as.dist(dist_mat)

adon <- adonis2(dist_obj ~ Treatment, data = meta, permutations = 999)
pval <- adon$`Pr(>F)`[1]
r2   <- adon$R2[1]

beta_plot <- ggplot(ord_df, aes(PC1, PC2, color = Treatment)) +
  geom_point(size = 3) +
  stat_ellipse() +
  theme_bw() +
  labs(title = paste0("Bray-Curtis PCoA\nPERMANOVA p=", signif(pval, 3),
                      ", R²=", round(r2, 3)),
       x = "PC1", y = "PC2")

What PERMANOVA R² means: the proportion of community variation explained by the grouping variable. R² = 0.35 means 35% of the variation between samples is associated with treatment.

Pairwise PERMANOVA (which specific treatments differ from each other?):

pairwise_adonis <- function(dist, factors) {
  comb <- combn(unique(factors), 2)
  results <- data.frame()
  for (i in 1:ncol(comb)) {
    group <- comb[, i]
    idx   <- factors %in% group
    sub_dist <- as.dist(as.matrix(dist)[idx, idx])
    sub_meta <- data.frame(Treatment = factors[idx])
    ad <- adonis2(sub_dist ~ Treatment, data = sub_meta)
    results <- rbind(results, data.frame(Group1=group[1], Group2=group[2],
                                         R2=ad$R2[1], p=ad$`Pr(>F)`[1]))
  }
  results$p_adj <- p.adjust(results$p, method = "BH")
  results
}

pairwise_adonis(dist_obj, meta$Treatment)

F: Taxonomy helper functions

Two small functions make all the taxonomy plots reusable.

Helper 1: Read a QIIME 2 TSV

read_qiime_tsv <- function(path) {
  x <- read.delim(path, sep = "\t", header = TRUE,
                  comment.char = "", check.names = FALSE,
                  stringsAsFactors = FALSE, skip = 1)
  colnames(x)[1] <- "Taxon"
  x
}

The skip = 1 is important — BIOM-converted TSVs start with a comment line (# Constructed from biom file) that is not data.

Helper 2: Clean SILVA taxonomy strings

SILVA taxonomy strings look like: d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;.... The function below pulls out the level you want and strips the prefix.

extract_tax_label <- function(taxon_string, level = c("phylum","family","genus")) {
  level <- match.arg(level)
  sapply(taxon_string, function(x) {
    parts <- trimws(unlist(strsplit(x, ";")))
    val <- switch(level,
                  phylum = parts[2],
                  family = parts[5],
                  genus  = parts[6])
    if (length(val) == 0 || is.na(val) || val == "") return("Unassigned")
    val <- sub("^[a-z]__", "", val)
    if (val %in% c("", "uncultured", "metagenome", "Ambiguous_taxa")) return("Unassigned")
    val
  })
}

Helper 3: Build a long-format taxonomy table

make_long_taxa <- function(path, meta, level = c("phylum","family","genus")) {
  level <- match.arg(level)
  tab   <- read_qiime_tsv(path)
  tab %>%
    pivot_longer(-Taxon, names_to = "SampleID", values_to = "Abundance") %>%
    mutate(Abundance = as.numeric(Abundance),
           Label = extract_tax_label(Taxon, level = level)) %>%
    group_by(SampleID, Label) %>%
    summarise(Abundance = sum(Abundance, na.rm = TRUE), .groups = "drop") %>%
    left_join(meta, by = "SampleID")
}

G: Stacked taxonomy barplots

plot_taxa_stacked <- function(long_df, title_txt, top_n = 15) {

  # Identify top N taxa by total abundance
  top_taxa <- long_df %>%
    group_by(Label) %>%
    summarise(Total = sum(Abundance, na.rm = TRUE), .groups = "drop") %>%
    arrange(desc(Total)) %>%
    slice_head(n = top_n) %>%
    pull(Label)

  # Merge everything else into "Other"
  plot_df <- long_df %>%
    mutate(Label = ifelse(Label %in% top_taxa, Label, "Other")) %>%
    group_by(SampleID, Treatment, Label) %>%
    summarise(Abundance = sum(Abundance, na.rm = TRUE), .groups = "drop")

  # Order fill by overall abundance
  levs <- plot_df %>%
    group_by(Label) %>%
    summarise(Total = sum(Abundance), .groups = "drop") %>%
    arrange(desc(Total)) %>% pull(Label)
  plot_df$Label     <- factor(plot_df$Label, levels = rev(unique(levs)))
  plot_df$SampleID  <- factor(plot_df$SampleID, levels = levels(meta$SampleID))
  plot_df$Treatment <- factor(plot_df$Treatment, levels = levels(meta$Treatment))

  ggplot(plot_df, aes(x = SampleID, y = Abundance, fill = Label)) +
    geom_col(width = 0.9, colour = "white", linewidth = 0.1) +
    scale_fill_viridis_d(option = "turbo") +
    scale_y_continuous(labels = function(x) paste0(round(x * 100), "%")) +
    facet_grid(. ~ Treatment, scales = "free_x", space = "free_x") +
    labs(title = title_txt, x = "Sample", y = "Relative abundance", fill = NULL) +
    theme_bw() +
    theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1),
          panel.grid = element_blank(),
          strip.background = element_rect(fill = "white"),
          strip.text = element_text(face = "bold"))
}

# Build the three tables
phylum_long <- make_long_taxa("phylum/feature-table.tsv", meta, "phylum")
family_long <- make_long_taxa("family/feature-table.tsv", meta, "family")
genus_long  <- make_long_taxa("genus/feature-table.tsv",  meta, "genus")

# Build the three plots
phylum_plot <- plot_taxa_stacked(phylum_long, "Phylum-level relative abundance", top_n = 12)
family_plot <- plot_taxa_stacked(family_long, "Family-level relative abundance",  top_n = 20)
genus_plot  <- plot_taxa_stacked(genus_long,  "Genus-level relative abundance",   top_n = 20)

H: Text summary panel

This creates a plain-text summary card that works well in the top-left corner of a multi-panel figure.

total_input   <- sum(denoise$input,        na.rm = TRUE)
total_nonchim <- sum(denoise$non_chimeric, na.rm = TRUE)
avg_nonchim   <- mean(denoise$non_chimeric, na.rm = TRUE)

summary_text <- ggdraw() +
  draw_label(
    paste0(
      "16S Amplicon Summary\n\n",
      "Total input reads:  ", format(total_input,   big.mark = ","), "\n",
      "Retained reads:     ", format(total_nonchim, big.mark = ","), "\n",
      "Average per sample: ", round(avg_nonchim, 0), "\n",
      "Samples:            ", nrow(meta), "\n",
      "ASVs detected:      ", nrow(asv_mat)
    ),
    x = 0.05, hjust = 0, vjust = 1, size = 12
  )

I: Combine everything into one figure

top_row    <- summary_text | reads_plot | asv_plot | alpha_plot | beta_plot
bottom_row <- phylum_plot  | family_plot | genus_plot

final_plot <- top_row / bottom_row + plot_layout(heights = c(1, 1.1))

ggsave("reference_style_figure.png", final_plot, width = 22, height = 12, dpi = 300)
ggsave("reference_style_figure.pdf", final_plot, width = 22, height = 12)

Part 3: Bonus — taxon-focused plots

Sometimes you want to zoom into a specific group. In the Bacillus inoculant study, the question was whether different treatments altered the abundance of rhizobia — the nitrogen-fixing bacteria.

Rhizobia genera plot

rhizobia_genera <- c(
  "Rhizobium", "Bradyrhizobium", "Mesorhizobium", "Ensifer",
  "Sinorhizobium", "Azorhizobium", "Allorhizobium", "Neorhizobium",
  "Pararhizobium", "Shinella", "Devosia", "Microvirga"
)

# Keep only rhizobia rows
rhizobia_only <- genus_long %>%
  filter(Label %in% rhizobia_genera) %>%
  group_by(SampleID, Treatment, Label) %>%
  summarise(Abundance = sum(Abundance, na.rm = TRUE), .groups = "drop")

rhizobia_only$SampleID  <- factor(rhizobia_only$SampleID,  levels = levels(meta$SampleID))
rhizobia_only$Treatment <- factor(rhizobia_only$Treatment, levels = levels(meta$Treatment))

# Custom colours so "Other" is always grey
taxa_no_other <- unique(rhizobia_only$Label)
cols <- setNames(viridis::viridis(length(taxa_no_other), option = "turbo"), taxa_no_other)

rhizobia_plot <- ggplot(rhizobia_only,
                        aes(x = SampleID, y = Abundance, fill = Label)) +
  geom_col(width = 0.9, colour = "white", linewidth = 0.1) +
  scale_fill_manual(values = cols) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 0.01)) +
  facet_grid(. ~ Treatment, scales = "free_x", space = "free_x") +
  labs(title = "Rhizobia-related genera (zoomed)", x = "Sample",
       y = "Relative abundance", fill = NULL) +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1),
        panel.grid = element_blank(),
        strip.text = element_text(face = "bold"))

Total rhizobia abundance per treatment (boxplot)

rhizobia_total <- genus_long %>%
  mutate(is_rhizobia = Label %in% rhizobia_genera) %>%
  group_by(SampleID, Treatment) %>%
  summarise(Rhizobia = sum(Abundance[is_rhizobia]), .groups = "drop")

ggplot(rhizobia_total, aes(Treatment, Rhizobia, fill = Treatment)) +
  geom_boxplot(alpha = 0.5) +
  geom_jitter(width = 0.1, size = 2) +
  scale_y_continuous(labels = scales::percent) +
  theme_bw() +
  labs(title = "Total rhizobia abundance by treatment")

Common beginner mistakes

Metadata IDs do not match sample IDs in the table

This is the most common silent failure. Your left_join(coords, meta, by = "SampleID") will produce rows full of NA if IDs differ even by a trailing space or capital letter. Always run:

setdiff(colnames(asv_mat), meta$SampleID)  # in table but not in metadata
setdiff(meta$SampleID, colnames(asv_mat))  # in metadata but not in table

Both should return character(0). If not, fix the metadata file or use trimws() on your IDs.

The #q2:types row breaks numeric conversion

QIIME 2 metadata files include a second header row describing types. It looks like a data row but contains strings like "numeric" or "categorical". Always filter it out before calling as.numeric().

denoise <- denoise[denoise$SampleID != "#q2:types", ]

BIOM-to-TSV conversion produces a comment line

The first line of a BIOM-converted TSV is # Constructed from biom file. The read_qiime_tsv() helper above uses skip = 1 to ignore it. If you use read.delim() directly without skipping, R will read this comment as the header — and all your column names will be wrong.

PCoA ordination file has non-sample rows

The QIIME 2 ordination file has rows called "Species" and "Site" below the coordinate data. Always filter them out:

coords <- pcoa[!(rownames(pcoa) %in% c("Species", "Site")), ]

Stacked barplots with raw counts instead of relative abundance

Make sure you used phylum_rel.qza (after qiime feature-table relative-frequency) and not the raw phylum.qza. Raw counts make barplots uninterpretable when samples have different total read depths.


Summary: what does each output file contain?

File What is in it R use
table/feature-table.tsv ASVs × samples raw counts Count checks, richness
taxonomy/taxonomy.tsv Feature ID → SILVA taxonomy Taxonomy parsing
denoising_stats/stats.tsv Per-sample DADA2 read retention QC plot
shannon/alpha-diversity.tsv Shannon index per sample Alpha plot
observed_features/alpha-diversity.tsv ASV count per sample (rarefied) Alpha plot
faith_pd/alpha-diversity.tsv Phylogenetic diversity per sample Alpha plot
bray_distance/distance-matrix.tsv Pairwise Bray-Curtis distances PERMANOVA
bray_pcoa/ordination.txt PCoA coordinates Beta diversity plot
phylum/feature-table.tsv Relative abundance collapsed to phylum Stacked barplot
family/feature-table.tsv Relative abundance collapsed to family Stacked barplot
genus/feature-table.tsv Relative abundance collapsed to genus Stacked barplot

Statistics

You can also do simple statistics on the diversity analysis like below.

############################################################
# Simple stats for alpha and beta diversity
############################################################

library(tidyverse)
library(vegan)

# ==========================
# 1. Load data
# ==========================
# Replace with your paths
meta <- read.delim("my_metadata.txt", sep = "\t")
shannon <- read.delim("shannon/alpha-diversity.tsv", sep = "\t")
observed <- read.delim("observed_features/alpha-diversity.tsv", sep = "\t")
faith <- read.delim("faith_pd/alpha-diversity.tsv", sep = "\t")

colnames(shannon) <- c("SampleID", "Shannon")
colnames(observed) <- c("SampleID", "Observed")
colnames(faith) <- c("SampleID", "FaithPD")

# Merge
alpha_df <- meta %>%
  left_join(shannon, by = "SampleID") %>%
  left_join(observed, by = "SampleID") %>%
  left_join(faith, by = "SampleID")

alpha_df$Treatment <- factor(alpha_df$Treatment)

# ==========================
# 2. Alpha diversity stats
# ==========================

# ---- Kruskal-Wallis (global)
kw_shannon <- kruskal.test(Shannon ~ Treatment, data = alpha_df)
kw_observed <- kruskal.test(Observed ~ Treatment, data = alpha_df)
kw_faith <- kruskal.test(FaithPD ~ Treatment, data = alpha_df)

kw_shannon
kw_observed
kw_faith

# ---- Pairwise Wilcoxon
pw_shannon <- pairwise.wilcox.test(alpha_df$Shannon, alpha_df$Treatment, p.adjust.method = "BH")
pw_observed <- pairwise.wilcox.test(alpha_df$Observed, alpha_df$Treatment, p.adjust.method = "BH")
pw_faith <- pairwise.wilcox.test(alpha_df$FaithPD, alpha_df$Treatment, p.adjust.method = "BH")

pw_shannon
pw_observed
pw_faith

# ==========================
# 3. Beta diversity stats
# ==========================

# Load Bray-Curtis distance matrix
dist_mat <- read.delim("bray_distance/distance-matrix.tsv",
                       sep = "\t", row.names = 1)

dist_obj <- as.dist(dist_mat)

# ---- PERMANOVA
adon <- adonis2(dist_obj ~ Treatment, data = meta, permutations = 999)
adon

# Extract values
pval <- adon$`Pr(>F)`[1]
r2 <- adon$R2[1]

cat("PERMANOVA: R2 =", r2, "p =", pval, "\n")

# ==========================
# 4. Pairwise PERMANOVA
# ==========================

pairwise_adonis <- function(dist, factors) {
  comb <- combn(unique(factors), 2)
  results <- data.frame()

  for(i in 1:ncol(comb)){
    group <- comb[,i]
    idx <- factors %in% group

    sub_dist <- as.dist(as.matrix(dist)[idx, idx])
    sub_meta <- data.frame(Treatment = factors[idx])

    ad <- adonis2(sub_dist ~ Treatment, data = sub_meta)

    results <- rbind(results, data.frame(
      Group1 = group[1],
      Group2 = group[2],
      R2 = ad$R2[1],
      p = ad$`Pr(>F)`[1]
    ))
  }

  results$p_adj <- p.adjust(results$p, method = "BH")
  results
}

pairwise_results <- pairwise_adonis(dist_obj, meta$Treatment)
pairwise_results

# ==========================
# 5. Dispersion test (IMPORTANT)
# ==========================

disp <- betadisper(dist_obj, meta$Treatment)
anova_disp <- anova(disp)

anova_disp

# ==========================
# 6. Save summary tables
# ==========================

alpha_summary <- data.frame(
  Metric = c("Shannon", "Observed", "FaithPD"),
  p_value = c(kw_shannon$p.value, kw_observed$p.value, kw_faith$p.value)
)

write.csv(alpha_summary, "alpha_stats.csv", row.names = FALSE)
write.csv(pairwise_results, "pairwise_permanova.csv", row.names = FALSE)

Repository

For detailed R scripts and other files see this Repository

This workflow covers the Bacillus inoculant soil microbiome project (16S V4 amplicons, 9 samples across NC/NPK/nOB9 treatments, SILVA 138, DADA2 pseudo-pooling, sampling depth 50,000 reads). The same pipeline applies to any QIIME 2 amplicon dataset — adapt file names and taxonomy level labels to your study.


Tags: QIIME2 16S amplicon R ggplot2 alpha-diversity beta-diversity taxonomy DADA2 PERMANOVA microbiome beginner

see_your_plot