Repositories
Research code, bioinformatics pipelines, and reproducible workflows.
𧬠Research Repositories
This page provides an overview of code repositories, pipelines, and computational resources I maintain or actively develop. My repositories emphasize reproducibility, scalability, and clarity, with workflows designed for HPC environments and real-world biological datasets.
Most projects are hosted on GitHub:
π GitHub: https://github.com/jojyjohn28
π Blog for more details : https://jojyjohn28.github.io/blog/
π§ͺ Amplicon Analysis
π https://github.com/jojyjohn28/AmpliconWeek_2025
A structured, day-by-day workflow for amplicon sequencing analysis, developed as both a training resource and a reproducible analysis guide.
Focus areas:
- QIIME2-based amplicon processing
- Diversity analysis and visualization
- Ecological interpretation of ASVs
- Best practices for reproducible workflows

𧬠Genome-Resolved Metagenomics
π https://github.com/jojyjohn28/semibin2-soil-mag-workflow
A Snakemake-based workflow for SemiBin2-assisted MAG recovery, optimized for soil and complex environmental metagenomes.
Focus areas:
- Co-assembly and binning strategies
- Semi-supervised MAG recovery
- Genome quality assessment
- Scalable execution on HPC systems
𧬠Metagenome Analysis Series
π https://github.com/jojyjohn28/metagenome-analysis-series
A comprehensive, hands-on tutorial series for analyzing metagenomic data from raw reads to biological insights. This series covers the complete workflow used in modern metagenomics research, from quality control to advanced downstream analyses.
Focus areas: π Day 1: QC & Taxonomic Profiling
π Day 2: Genome Assembly
π Day 3: Genome Binning
π Day 4: Dereplication & Taxonomy
π Day 5: Genome Annotation
π Day 6: Specialized Functions
π Day 7: Comparative Genomics
π Day 8: Workflow Platforms
π Day 9: Visualization
π Day 10: Multi-Omics Integration
Files, toy data and codes are available π Course Materials GitHub
π 10-Day Series GitHub π 10-Day Series blog

𧬠Whole Genome Sequencing Analysis β From Raw Reads to Biological Insight
https://github.com/jojyjohn28/whole-genome-sequencing-analysis
This repository documents a step-by-step, reproducible whole-genome sequencing (WGS) analysis workflow, developed and applied to bacterial genomes generated from both Illumina (short-read) and Oxford Nanopore (long-read) sequencing.
Focus areas: Day 1 β Raw Reads to Clean, Analysis-Ready Data Day 2 β Genome Assembly & Assembly Quality Assessment Day 3 β Taxonomy, Phylogeny & Genome Similarity Day 4 β Genome Annotation & Functional Potential Day 5 β Comparative & Downstream Genomic Analyses
𧬠ML-heart-to-ecosystem π«π±
https://github.com/jojyjohn28/ML-heart-to-ecosystem
A comprehensive machine learning journey from clinical heart disease prediction to ecosystem multifunctionality analysis. Validating Random Forest and ensemble methods on medical data before applying them to predict soil ecosystem functions from microbial communities.
Focus areas:
-
Heart Disease Prediction (Clinical ML) - Complete β β303 patients, 14 clinical features β 5 algorithms compared: Logistic Regression, Random Forest, XGBoost, SVM, KNN β Best Result: 88.52% accuracy, 91.18% recall (Logistic Regression)
-
Soil Ecosystem Multifunctionality (Ecological ML) - In Progress π§ β Several soil samples across different cover crops β Predicting ecosystem functions from microbial communities β Applying validated Random Forest methods
𧬠Functional Redundancy and Metabolic Flexibility of Estuarine Microbiomes
https://github.com/jojyjohn28/FRed-estuarine-microbiome
This repository contains data, code, and figures supporting the manuscript:βFunctional redundancy and metabolic flexibility of estuarine microbiomesβ Read more at : https://academic.oup.com/ismecommun/article/6/1/ycag021/8454622

π Size_Fractionated_Microbiome_Analysis**
π https://github.com/jojyjohn28/Size_Fractionated_Microbiome_Analysis
A comprehensive analysis framework for free-living and particle-associated microbial communities, integrating metagenomics and metatranscriptomics.
Focus areas:
- Read-based taxonomic and functional profiling
- DNAβRNA comparisons (total vs active communities)
- Resource partitioning and substrate uptake
- Functional redundancy (FRed) analysis
- Co-occurrence and ecological interpretation
ongoing work
Across all repositories, I emphasize:
- Reproducibility over one-off scripts
- Clear documentation and modular design
- Compatibility with HPC environments (SLURM)
- Teaching workflows that users can reproduce and troubleshoot independently
These repositories are used in research projects, student mentoring, and collaborative studies.
π Notes
- Repositories may changes as the work progress.!
- Please visit : https://github.com/jojyjohn28 for updates
