Day 1: Laptop vs HPC — Why Bioinformatics Needs Both

This is Day 1 of a 6-part series: From Laptop to HPC: Scaling Computational Biology Workflows. Each post covers a real challenge you’ll face when moving from local development to high-performance computing. No prior HPC experience needed and 🧬 Day 58 of Daily Bioinformatics from Jojy’s Desk Dataset:

It Started With a Crash

I was processing a microbiome dataset — 50 samples, nothing extraordinary. I fired up my script, walked to the kitchen to make coffee, came back, and saw this:

Killed: out of memory
RAM used: 31.8 GB / 32 GB

My laptop had literally run out of memory trying to do bioinformatics work that, in the field, is considered routine.

That moment is what most computational biologists experience eventually. Your laptop is powerful, but genomics data doesn’t care about your RAM. And that’s exactly why you need to understand both worlds: your local machine and an HPC cluster.

By the end of this post, you’ll know exactly what each one is, when to use which, and why they work together — not against each other.

What You’ll Need for This Series

Before we dive in, here’s what’s assumed:

You know basic command line (you can cd, ls, and run a script)
You have some bioinformatics background (you know what FASTQ, FASTA, or BAM files are)
You do not need any prior HPC or cluster experience — that’s what this series is for

What Is a Local Machine?

Your laptop or workstation is your personal computing environment. When you type a command, it runs immediately. When you install software with sudo apt-get install, it installs on your system. You own it completely.

Here’s a realistic picture of what a modern research laptop looks like:

Resource	Typical Range
RAM (Memory)	8 – 32 GB
CPU cores	4 – 16
Storage	500 GB – 4 TB
Root access (`sudo`)	✅ Yes
Job scheduler	❌ None — you run commands directly
Cost	One-time, yours forever

The laptop is great for:

Writing and testing scripts
Exploring small datasets
Developing analysis pipelines
Running quick quality checks

The laptop struggles with:

Datasets larger than your available RAM
Jobs that take days to run
Processing dozens or hundreds of samples at once
Analyses that need specialized hardware (high-memory nodes, GPU)

💡 Think of your laptop like a personal car. Fast, flexible, goes wherever you point it. But you can’t move 10,000 packages with it.

What Is an HPC Cluster?

An HPC (High-Performance Computing) cluster is a collection of many powerful computers — called nodes — connected together via a high-speed network and managed by a central system.

When you log into an HPC, you’re not logging into one computer. You’re logging into a shared ecosystem of computing resources that can be allocated to your jobs on demand.

Here’s what a typical academic HPC cluster looks like:

Resource	Typical Range
RAM per node	256 GB – 4 TB
CPU cores per node	32 – 128
Number of nodes	100 – 1,000+
Total storage	Petabytes (shared filesystem)
Root access (`sudo`)	❌ No
Job scheduler	✅ Yes (SLURM, PBS, LSF)
Cost	Shared institutional resource

💡 Think of an HPC like a shipping fleet. You don’t own it. You book capacity. It can move massive loads in parallel that no personal car ever could.

How Does Access Work?

You typically access HPC clusters through:

SSH — a secure terminal connection: ssh username@hpc.university.edu
A login node — a shared entry point where you prepare and submit jobs (not where you run them — more on this in Day 3!)
A job scheduler — software like SLURM that manages who gets what resources and when

Why Does Bioinformatics Specifically Need HPC?

Not every field runs into this problem. A data scientist might work comfortably with gigabytes of tabular data for years. Bioinformatics is different because of the nature of the data itself.

The Data Is Enormous

Data Type	Typical Size per Sample
Whole Genome Sequencing (WGS)	50 – 100+ GB raw reads
Metagenomics (gut microbiome)	5 – 20 GB per sample
RNA-seq (transcriptomics)	1 – 5 GB per sample
Single-cell RNA-seq	10 – 50 GB per experiment

Now multiply any of those by 100 samples. You’re looking at terabytes of data, and processing each sample can require gigabytes of RAM simultaneously.

Reference Databases Are Huge

Tools like Kraken2 (metagenomic classification) or STAR (RNA-seq alignment) require loading enormous reference databases into memory:

Kraken2 standard database: ~100 GB RAM just to load
STAR human genome index: ~30 GB RAM
DIAMOND protein database: ~200+ GB RAM

Your laptop never had a chance.

Many Samples = Many Jobs

In a real study, you don’t process one sample. You process cohorts:

50 patients? That’s 50 alignment jobs.
200 microbiome samples? 200 assembly + annotation pipelines.
1,000-sample GWAS study? You need parallel computing or you’ll be waiting months.

Running these sequentially on a laptop isn’t just slow — it’s often practically impossible.

Head-to-Head Comparison

Here’s the full picture side by side:

Feature	Your Laptop	HPC Cluster
RAM	16–32 GB	256 GB – 4 TB per node
CPUs	4–16 cores	32–128+ cores per node
Storage	1–4 TB	Petabyte-scale shared FS
Root access	✅ `sudo` works	❌ No `sudo`
Install software	`apt-get`, `pip` freely	`module load` or `conda`
Run a command	Just type it	Submit a job script
Run 200 samples	Days (sequential)	Hours (parallel)
Internet access	Full	Often restricted
Cost	Your machine	Shared institutional
Best for	Development, testing	Production runs

You Need Both — Here’s Why

A common mistake beginners make is thinking HPC replaces their laptop. It doesn’t. They serve fundamentally different purposes.

Use your laptop for:

Writing scripts and code
Testing on a small subset of data (5–10 samples)
Visualizing results in R or Python
Quick data exploration

Use HPC for:

Running validated pipelines on full datasets
Any job needing >32 GB RAM
Parallelizing across many samples
Long-running jobs (hours to days)

The workflow that actually works in practice looks like this:

Write script on laptop
          ↓
Test on 2–3 samples locally
          ↓
Transfer data + script to HPC
          ↓
Submit job to scheduler
          ↓
Download/analyze results locally

This back-and-forth is not a limitation — it’s the correct workflow.

A Note on What Changes (and What Doesn’t)

Moving to HPC doesn’t mean learning an entirely new way to do bioinformatics. The tools are the same: samtools, bowtie2, fastqc, kraken2. The concepts are the same. What changes is:

How you install software — no sudo, but there are solutions (Day 2)
How you run jobs — not directly, but through a scheduler (Day 3)
How you scale — not for loops, but job arrays (Day 4)
How you structure pipelines — towards reproducibility (Day 5)

Each of those changes is the topic of the next four posts in this series.

Common Beginner Mistakes on Day One

Watch out for these when you first log into an HPC:

Running compute jobs on the login node — this is like blocking the office entrance to do your work. Don’t do it. (Covered in Day 3)
Trying to sudo apt-get install — you’ll get “Permission denied”. There’s a better way. (Covered in Day 2)
Expecting interactive terminal behavior — HPC jobs run in the background. You submit and wait. (Day 3)
Not checking available storage quotas — most HPC systems have per-user storage limits. Check early.

Connecting from Windows: PowerShell and MobaXterm

If you’re on a Mac or Linux laptop, SSH just works in your terminal. If you’re on Windows, you have two excellent options — and you only need to pick one.

Option 1: Windows PowerShell (Built-in, No Install Needed)

Modern Windows 10 and 11 include a built-in SSH client. Open PowerShell (search for it in the Start menu) and connect exactly like you would on Linux:

# Connect to your HPC
ssh username@hpc.yourinstitution.edu

# Transfer a file to HPC
scp myfile.fastq.gz username@hpc.yourinstitution.edu:~/data/

# Transfer a folder from HPC back to your Windows machine
scp -r username@hpc.yourinstitution.edu:~/results/ C:\Users\YourName\Desktop\

PowerShell SSH is clean, fast, and requires zero setup. For most day-to-day HPC work, it’s all you need.

💡 Check your SSH version: Open PowerShell and type ssh -V. If you see OpenSSH_8 or higher, you’re good to go. If not, update Windows or install OpenSSH from Settings → Optional Features.

Option 2: MobaXterm (Recommended for Beginners)

MobaXterm is a free Windows application that bundles SSH, file transfer, and a graphical interface all in one. It’s the most popular HPC client among Windows-based researchers for good reason.

Why researchers love MobaXterm:

Built-in file browser — drag and drop files between your laptop and HPC
Graphical display (X11) — run GUI tools on the HPC and see them on your Windows screen
Saved sessions — save your HPC login details so you don’t retype them every time
Multiple tabs — open several HPC connections simultaneously
SFTP sidebar — automatically shows your HPC files in a panel on the left

Getting started with MobaXterm:

Download the free Home Edition from mobaxterm.mobatek.net
Open it and click Session → SSH
Enter your HPC hostname (e.g., hpc.yourinstitution.edu) and username
Click OK — you’re in

# Once connected in MobaXterm, everything works the same as Linux/Mac
ssh username@hpc.yourinstitution.edu   # or use the GUI session manager

# The left panel automatically opens an SFTP browser
# Drag files from your Windows desktop straight into ~/data/

Which Should You Use?

	PowerShell	MobaXterm
Install required	❌ No	✅ Yes (free)
File transfer GUI	❌ Command only	✅ Drag and drop
X11 / GUI forwarding	⚠️ Needs extra setup	✅ Built-in
Best for	Quick connections, scripting	Beginners, file management

If you’re just getting started and you’re on Windows, download MobaXterm. It removes a lot of friction in the early days. Once you’re comfortable with the command line, PowerShell works great for quick connections.

Try It Yourself

If you already have HPC access, try these commands right now:

# Log in (replace with your actual HPC address)
ssh username@hpc.yourinstitution.edu

# Check the system
hostname                    # What node am I on?
cat /proc/cpuinfo | grep "model name" | head -1   # CPU info
free -h                     # Available memory
df -h $HOME                 # Your storage quota

# See what compute nodes look like
sinfo                       # Show available partitions (SLURM)

Don’t worry if sinfo shows a lot of unfamiliar output. By Day 3, it’ll make complete sense.

Summary

Your laptop is fast, flexible, and you have full control — but it has hard limits on RAM and can’t run hundreds of jobs in parallel
An HPC cluster gives you enormous compute power but requires learning new tools and workflows
Bioinformatics specifically hits laptop limits because of large reference databases, multi-GB sample files, and the need to process large cohorts in parallel
You need both: laptop for development and testing, HPC for production runs
Moving to HPC changes how you run things — not the bioinformatics tools themselves

Up Next

Day 2: Software Installation — sudo vs module load vs conda

We’ll tackle the first real surprise of HPC life: you can’t install software the way you’re used to. No sudo. No apt-get. But there are elegant solutions — and one of them works identically on both your laptop and HPC.

Questions or corrections? Drop a comment below — I read every one.

Laptop vs HPC: Scaling Computational Biology