Amplicon Week β Day 4: Functional Prediction from Amplicon Data (PICRUSt2, Tax4Fun2, FAPROTAX)
π± Amplicon Week β Day 4
Functional Prediction from Amplicon Data
β‘οΈ This tutorial focuses on 16S rRNA, the most commonly used marker for functional prediction. β‘οΈ PICRUSt2 can predict functions from 16S, 18S, and ITS, but not from COI or 12S* (because they do not reflect microbial metabolic functions). β‘οΈ **Tax4Fun2 and FAPROTAX are specifically designed for **bacterial/archaeal** 16S and cannot be applied to COI, 12S, or 18S.
So yesβmany amplicon datasets can be used for functional inference, but the tool depends on the marker gene.
PICRUSt2 β’ Tax4Fun2 β’ FAPROTAX
Amplicon sequencing reveals who is present β but often we also want to know:
π What functions can this microbial community potentially perform? π What metabolic pathways may be active? π How similar or redundant are communities in their functional potential?
Todayβs post introduces three major tools used to infer function from 16S/ITS/18S amplicon data:
β PICRUSt2 β KEGG orthologs, EC numbers, pathways
β Tax4Fun2 β KEGG pathways using SILVA-based nearest neighbors
β FAPROTAX β Ecological roles (nitrification, methanogenesis, sulfate reduction, etc.)
βThese methods do not replace shotgun metagenomics, but they provide meaningful functional hypotheses when only 16S/18S amplicons are available.β
1οΈβ£ Input Files Required for All Functional Tools
You can use the following files generated during day2 or day3 from Qiime2
feature-table.tsv
taxonomy.tsv
dna-sequences.fasta
phylogenetic tree (optional for PICRUSt2)
Required formats: | Tool | Input Files Needed | | ββββ | βββββββββββββββββββββββββ | | PICRUSt2 | ASV sequences (dna-sequences.fasta), feature table (feature-table.tsv) | | Tax4Fun2 | Feature table + SILVA taxonomy + representative sequences | | FAPROTAX | Feature table + taxonomy table |
You can find a complete example dataset in the GitHub Day 3 folder. https://github.com/jojyjohn28/AmpliconWeek_2025
2οΈβ£ PICRUSt2: KEGG-Based Functional Prediction
PICRUSt2 predicts the functional profile of a community by:
-
Placing ASVs into a reference phylogeny
-
Inferring gene families (KOs, ECs)
-
Predicting metabolic pathways
For installation please see installation_and_run.md in the GitHub Day 3 folder. https://github.com/jojyjohn28/AmpliconWeek_2025
𧬠2.1 Run PICRUSt2
picrust2_pipeline.py \
-s dna-sequences.fasta \
-i feature-table.tsv \
-o picrust2_out \
-p 6
Main outputs: | File | Description | | ββββββββββ | ββββββββ- | | pred_metagenome_unstrat.tsv | KO abundance table | | EC_metagenome_unstrat.tsv | Enzyme commission numbers | | path_abun_unstrat.tsv | KEGG pathway abundances | | marker_pred.tsv | NSTI scores (quality) |
π 2.2 Basic Visualization in R (ggplot2)
Here is a simple ggplot example for visulization.
library(tidyverse)
ko <- read.table("picrust2_out/KO_metagenome_out/pred_metagenome_unstrat.tsv",
sep="\t", header=TRUE, row.names=1, check.names=FALSE)
ko_top20 <- ko %>%
mutate(KO = rownames(.)) %>%
pivot_longer(-KO, names_to="Sample", values_to="Abundance") %>%
group_by(KO) %>%
summarise(meanA = mean(Abundance)) %>%
top_n(20, meanA) %>%
pull(KO)
ko_long <- ko[ko_top20, ] %>%
mutate(KO = rownames(.)) %>%
pivot_longer(-KO, names_to="Sample", values_to="Abundance")
ggplot(ko_long, aes(x=KO, y=Abundance, fill=Sample)) +
geom_col(position="dodge") +
coord_flip() +
theme_bw() +
labs(title="Top 20 KO Functions (PICRUSt2)", x="KEGG Ortholog", y="Predicted Abundance")
3οΈβ£ Tax4Fun2: KEGG Pathways Based on SILVA
Tax4Fun2 maps 16S ASVs to KEGG using nearest-neighbor matches from SILVA-derived genomes.
It is fast, accurate, and integrates beautifully with Microeco.
𧬠3.1 Run Tax4Fun2 in R
library(Tax4Fun)
library(microeco)
t4f <- trans_func$new(meco_physeq_ra) # microeco object from Day 3
t4f$for_what <- "prok"
t4f$cal_tax4fun2(
blast_tool_path = "/path/to/blast/",
path_to_reference_data = "Tax4Fun2_ReferenceData_v2"
)
Outputs:
β res_tax4fun2_KO β KO table
β res_tax4fun2_pathway β pathway table
β res_tax4fun2_aFRI β absolute functional redundancy
β res_tax4fun2_rFRI β relative functional redundancy
π‘ Functional Redundancy
Functional redundancy measures whether multiple taxa share similar functional potential. Tax4Fun2 outputs both aFRI and rFRI, and Microeco can visualize and compare them.
Why it matters:
β High redundancy β resilient, stable communities
β Low redundancy β specialized, vulnerable to disturbance But more genome/MAG based modeliing is needed to consider FRed as proxy to ecosytem functiong or resilience.
π¨ Visualization
All plotting scripts will be included in: https://github.com/jojyjohn28/AmpliconWeek_2025 π day4_functional_prediction/tax4fun_plot.R
4οΈβ£ FAPROTAX: Ecological Function Annotation
FAPROTAX is a curated database linking taxa β ecological processes: | Category | Examples | | ββββ | ββββββββββ- | | Carbon | methanogenesis, photoautotrophy | | Nitrogen | nitrification, denitrification | | Sulfur | sulfate reduction | | Hydrocarbon | degradation pathways |
It is taxonomy-based, so accuracy depends on classification quality.
𧬠4.1 Running FAPROTAX via Microeco
t_func <- trans_func$new(meco_physeq_ra)
t_func$for_what <- "prok"
t_func$cal_spe_func(prok_database = "FAPROTAX")
t_func$cal_func() # abundance
t_func$cal_func_FR() # redundancy
t_func$cal_spe_func_perc() # percentage
Outputs include:
β Ecological function presence/absence
β Function abundance
β Functional redundancy
π¨ Visualizing FAPROTAX
ggplot scripts will be in: https://github.com/jojyjohn28/AmpliconWeek_2025 π day4_functional_prediction/faprotax_plot.R
π¦ Whatβs Included in the Day 4 GitHub Folder β installation_and_run.md
Step-by-step installation + execution for:
β PICRUSt2
β Tax4Fun2
β FAPROTAX
β picrust2_plot.R
ggplot visualizations for KO and pathway output
β tax4fun_plot.R
Visualization of pathways + redundancy
β faprotax_plot.R
Visualization of ecological functions
If you are interested in trying, Iβve included all the required base files in my repository, along with a fully running R script. You can find: feature-table.tsv taxonomy.tsv tree.nwk metadata.txt in day 3 and all codes in day 4 of https://github.com/jojyjohn28/AmpliconWeek_2025
Todayβs image features functional predictions I created myself using Microeco.
Thank you to the Campbell Labβs graduate student seminar series β especially Dinu and Nichole β for the presentation on Microeco
