Metadata & Diversity — Interactive Explainer
Click rows · switch views · explore how metadata drives every diversity analysis
A metadata file links each sample ID to its context — treatment, season, location, measured variables. Without it, QIIME 2 and R cannot group or compare your samples. Click any row to highlight that treatment group across all panels.
my_metadata.txt
#SampleID · Treatment · Replicate · Season · Shannon
#SampleID Treatment Rep Season Shannon
Click any row to highlight its treatment group
What each column does
#SampleID must match feature table exactly
Treatment categorical → boxplot groups
Replicate numeric or categorical
Season categorical → Kruskal-Wallis
Shannon numeric (calculated separately)
How to use in QIIME 2
qiime diversity core-metrics-phylogenetic \
  --i-phylogeny rooted-tree.qza \
  --i-table table-no-contam.qza \
  --p-sampling-depth 50000 \
  --m-metadata-file my_metadata.txt \
  --output-dir core-metrics/
Alpha diversity measures how diverse each individual sample is. The metadata file's grouping column (Treatment) is what lets you compare distributions — without it, you just have 9 numbers with no biological meaning.
Shannon index by treatment
NC (negative control)
NPK fertiliser
nOB9 bioinoculant
Group statistics
Three metrics explained
Shannon index
richness + evenness combined
most common
Observed features
raw ASV count after rarefaction
simple count
Faith's PD
phylogenetic branch length covered
phylogenetic
R code to plot
ggplot(alpha_df,
  aes(Treatment, Shannon,
      fill=Treatment)) +
  geom_boxplot(alpha=.5) +
  geom_jitter(width=.1, size=2) +
  theme_bw()
Beta diversity asks how different communities are between samples. The PCoA plot below is coloured by the Treatment column from the metadata. Switch the colour-by variable to see how different metadata columns reveal different structure. PERMANOVA tests whether your groups are significantly different.
Colour points by:
Bray-Curtis PCoA
PERMANOVA (Treatment): R²=0.68, p=0.001 — treatment explains 68% of community variation
Three beta metrics
Bray-Curtis
relative abundance differences
most used
Unweighted UniFrac
presence/absence only, phylogenetic
presence/absence
Weighted UniFrac
abundance + phylogeny combined
phylogenetic
PERMANOVA in R
library(vegan)
dist_mat <- read.delim(
  "bray_distance/distance-matrix.tsv",
  row.names=1)
dist_obj <- as.dist(dist_mat)

adon <- adonis2(
  dist_obj ~ Treatment,
  data = meta,
  permutations = 999)

# R² = variance explained
# p < 0.05 → significant
What the PCoA shows
Clusters close togethersimilar communities
Clusters far apartdifferent communities
Mixed/overlappingtreatment not significant
A minimal lab metadata file has Treatment, Replicate, and Season. A real environmental study adds physical and chemical measurements. Each extra column unlocks a new class of analysis. Below is a real-world example from an estuarine microbiome study.
Environmental metadata example
#SampleID Bay Season Fraction Salinity Temp °C
CP_Sp_FL_01 CP Spring FL 8.2 14.3
CP_Su_PA_01 CP Summer PA 12.7 26.8
DE_Su_FL_01 DE Summer FL 28.6 25.4
DE_Fa_PA_01 DE Fall PA 30.2 17.8
FL = Free Living · PA = Particle Attached
FL (0.2–0.8 µm)planktonic bacteria
PA (>0.8 µm)particle-attached bacteria
What each variable enables
Salinity
continuous, 0–40 PSU
Spearman ρ with Shannon
Temperature
continuous, °C
dbRDA gradient analysis
Bay
Chesapeake / Delaware
PERMANOVA factor
Season
Spring / Summer / Fall
Kruskal-Wallis
Size fraction
FL / PA
Wilcoxon test
Fraction × Season
interaction term
two-way PERMANOVA
pH, DO, depth
continuous drivers
indicator species analysis
Complexity comparison
Lab experiment3 treatments × 3 reps = 9 rows
Environmental study2 bays × 3 seasons × 2 fractions × 3 reps = 36 rows
QIIME 2 has strict requirements for metadata files. Most analysis failures trace back to a formatting mistake here. Read these once and save yourself hours of debugging.
Required format
  • First column header must be #SampleID, sample-id, or id
  • Tab-separated (.tsv), not comma-separated (.csv)
  • UTF-8 encoding. Use Google Sheets or Excel "Tab Delimited Text" export
  • Column names must be unique regardless of case
  • No spaces in column names — use underscores: Sample_ID not Sample ID
  • IDs cannot start with # — rows starting with # are treated as comments
Common mistakes
  • Sample IDs differ between metadata and feature table (extra space, underscore, capital)
  • Forgot to remove the #q2:types row before reading into R with as.numeric()
  • Numeric treatment IDs (1, 2, 3) inferred as continuous — declare as categorical
  • Saved as CSV instead of TSV from Excel
  • Stale metadata rows for samples removed from the feature table
Validation tips
  • Use Keemei in Google Sheets to validate before importing
  • In R: setdiff(meta$SampleID, colnames(table)) should return character(0)
  • Use qiime metadata tabulate to check column types after import
  • Keep a backup of the original metadata before editing
  • When in doubt, collect more metadata in the field — you cannot retroactively measure salinity
The #q2:types row
#SampleID  Treatment  pH
#q2:types  categorical  numeric
S1a        NC          6.8
S1b        NC          6.7
This row tells QIIME 2 the column type explicitly. Always filter it out before calling as.numeric() in R.