About This Blog
Welcome to Daily Bioinformatics from Jojy’s Desk — my living notebook of daily bioinformatics work, microbial ecology, MAGs, MTX/MG, HPC troubleshooting, and coding.
- Metagenome & metatranscriptome analysis
- MAGs, viruses, CAZymes, energy metabolism markers
- Functional redundancy (FRed) modeling
- Hybrid-assembly & whole-genome workflows
- Machine-learning MAG binning (GPU/CPU)
- Daily troubleshooting, R/Python tips, figures
I post short updates daily. Scroll down for the latest posts ↓
-
Size-Fractionated Microbiome Analysis — Day 4: Gene Catalog Construction & Functional Annotation
Day 4 of the size-fractionated microbiome series: Building a non-redundant prokaryotic gene catalog from metagenomes using Prodigal, CD-HIT clustering, and eggNOG functional annotation.
-
Bridging Clinical and Ecological ML: From Heart Disease to Soil Ecosystem Multifunctionality
Revisiting my masters in datascience project on heart disease prediction to explore Random Forest applications in microbial ecosystem multifunctionality research. A comprehensive comparison of 5 ML algorithms with real results and lessons learned.
-
Building Custom HUMAnN3 Databases with Struo2 and GTDB: A Realistic Guide from the Trenches
A comprehensive, honest guide to building custom HUMAnN3 databases using Struo2 with GTDB r220, including all the troubleshooting, workarounds, and lessons learned from 40 hours in the trenches.
-
Visualize Your Data — Day 5: Bubble Plots in Bioinformatics
Day 5 of the Visualize Your Data series: using bubble plots to visualize multi-dimensional relationships in genomics and functional ecology.
-
Visualize Your Data — Day 4: Volcano Plots in Bioinformatics
Day 4 of the Visualize Your Data series: mastering volcano plots for differential expression and abundance analysis in bioinformatics.
-
Visualize Your Data — Day 3: Ordination Plots (PCA, PCoA, NMDS) in Bioinformatics
Day 3 of the Visualize Your Data series: exploring PCA, PCoA, and NMDS to visualize sample relationships and community structure in bioinformatics and microbial ecology.
-
Visualize Your Data — Day 2: Heatmaps in Bioinformatics
Day 2 of the Visualize Your Data series: understanding heatmaps, scaling, clustering, interactive heatmaps, commonly used in bioinformatics and molecular genomics.
-
Visualize Your Data — Day 1: Box Plot vs Violin Plot in Bioinformatics
Day 1 of the Visualize Your Data series: understanding when and how to use box plots and violin plots to visualize data distributions in bioinformatics and molecular genomics.
-
Whole Genome Sequencing — Day 5: Comparative & Downstream Genomic Analyses
Day 5 of the WGS series: moving from individual genomes to comparative insights through pangenome analysis, secondary metabolite screening, and pathogen detection
-
Whole Genome Sequencing — Day 4: Genome Annotation & Functional Potential
Day 4 of the WGS series: gene prediction and functional annotation using Prokka, Prodigal, InterProScan, and DRAM to reconstruct metabolic pathways and genome-scale functional potential.
-
Whole Genome Sequencing — Day 3: Taxonomy, Phylogeny & Genome Similarity
Day 3 of the WGS series: genome-based taxonomy using GTDB-Tk, phylogenomic tree construction, and species/genus boundary testing using ANI, AAI, and dDDH across large genome sets.
-
Whole Genome Sequencing — Day 2: Genome Assembly, Quality Assessment, and Topology
Day 2 of the WGS series: short- and long-read genome assembly using SPAdes, MEGAHIT, Shovill, and Flye, followed by assembly evaluation with QUAST, GC content visualization, topology assessment, and graph inspection with Bandage.
-
Whole Genome Analysis — Day 1: From Raw Reads to Clean, Assembly-Ready Data
Day 1 of a whole-genome analysis series covering raw Illumina and Nanopore reads, integrity checks, quality control with FastQC and NanoPlot, and read trimming using Trimmomatic and Cutadapt.
-
Exploring the Pan-Genome with panX: A Practical Workflow for DARPA Isolates
5 complete walkthrough using panX to analyze the pan-genome of 20 DARPA isolates—annotation, pipeline execution, interactive visualization, and core/pan gene extraction.
-
Extracting Full-Length 16S rRNA from Bacterial Genomes Using Barrnap
A quick and reproducible workflow to extract 16S rRNA genes from assembled genomes and validate taxonomy using NCBI BLAST.
-
From Genome Assembly to Novel Species: A Step-by-Step Genomic Workflow
How do we confidently determine whether a newly sequenced genome represents a novel bacterial species?
-
Using Syncthing, tmux, and Git to Sync My Bioinformatics Workflow Across Devices
How I synchronize files, terminal sessions, and version-controlled projects across my Linux desktop and laptop.
-
Size-Fractionated Microbiome Analysis — Day 3: Species-Level Profiling with mOTUs
Day 3 of the size-fractionated microbiome series: mOTUs-based species-level profiling, batch processing, manual merging of profiles, and visualization.
-
Genome Topology and Genome Announcement Reports: A Practical, Reviewer-Safe Workflow
A step-by-step guide to answering NCBI genome topology questions and generating a complete, automated genome report table for Genome Resource Announcements.
-
Size Fractionated Microbiome Analysis — Day 2: Kaiju Classification and Extraction of Bacterial & Archaeal Reads
Day 2 of a metagenomics series: Kaiju-based taxonomic classification, generation of heatmap-ready summary tables, and extraction of bacterial and archaeal reads for downstream analyses.
-
Size Fractionated Microbiome Anlaysis — Day 1: From Raw Reads to Clean Data
Day 1 of a new metagenomics series focusing on raw read quality control and preprocessing for total community profiling.
-
Amplicon Week — Day 6: Wrap-Up: From Reads to Insights
Day 6 of Amplicon sequencing and analyis.
-
Amplicon Week — Day 5: Web-based microbiome analysis with easy16S
Day 5 of Amplicon sequencing and analyis.
-
Amplicon Week — Day 4: Functional Prediction from Amplicon Data (PICRUSt2, Tax4Fun2, FAPROTAX)
Day 4 of Amplicon sequencing and analyis.
-
Amplicon Week — Day 3: Visualization & Ecological Analysis Using QIIME2 + Microeco (R)
Day 3 of Amplicon sequencing and analyis.
-
Amplicon Week — Day 2: QIIME2 Setup, Importing Data & Classifier Training
Day 2 of Amplicon sequencing and analyis.
-
Amplicon Week — Day 1: Introduction to 16S, ITS, 18S, 12S, and COI Metabarcoding
Day 1 of Amplicon sequencing and analyis.
-
Cleaning and Preparing Genomes for NCBI Submission — A Complete Workflow
Step-by-step guide for fixing adapter contamination, removing unwanted contigs, renaming FASTA headers, renumbering contigs, and preparing high-quality genomes for NCBI submission.
-
Machine-learning MAG binning with SemiBin2 and Snakemake (soil metagenomes)
How I used SemiBin2 and a Snakemake workflow to recover high-quality MAGs from fragmented, high-diversity soil metagenomes.
-
Installing & Setting Up My New Linux Laptop: A Real Journey Through Dual-Boot, Partitions & Persistence
A complete walkthrough of dual-booting Ubuntu with Windows on a modern Dell NVMe laptop — from partition problems to corrupted USB drives and manual GParted rescue.
-
GTDB-Tk Complete Workflow
A practical walkthrough of GTDB-Tk Complete Workflow
-
Genome Assembly Day: Shovill for Illumina & Flye for Nanopore Reads
A practical walkthrough of assembling bacterial genomes using Shovill (Illumina short reads) and Flye (Nanopore long reads) on the Palmetto HPC cluster.
-
Visualizing Genome Assemblies with Bandage: Building from Source on Linux
A practical guide to installing Bandage from source using Qt and visualizing assembly graphs generated by Flye and Shovill.
-
Re-working Figure 5: Visualizing Functional Redundancy (FRed) Across Bays, Seasons, and Salinity
Final version of my R-based figure for the FRed manuscript revision, including data cleaning, outlier filtering, and multi-layered ggplot styling.
-
My Journey into Microbiology, the Deep Sea, and Bioinformatics
From traditional medicine to deep-sea expeditions to becoming a computational microbial ecologist — this is my story.