Visualize Your Data β Day 1: Box Plot vs Violin Plot in Bioinformatics
π Visualize Your Data β Day 1
Box Plot vs Violin Plot in Bioinformatics
After successfully completing my Whole Genome Sequencing analysis series, Iβm starting a new daily blog series focused on data visualization commonly used in bioinformatics and molecular genomics papers.
Many learners can generate results but struggle with:
β Choosing the right plot
β Interpreting distributions correctly
β Producing publication-ready figures
This new series β βVisualize Your Dataβ β focuses on common plots seen in bioinformatics papers, using reproducible toy data and practical tips from real manuscript preparation.
π― Todayβs focus
Box Plot vs Violin Plot
Both plots visualize data distributions, but they emphasize different aspects of the data.
πΉ When should you use a box plot?
A box plot is ideal when you want to:
β Compare medians across groups
β Highlight quartiles (Q1βQ3) and outliers
β Keep figures simple and reviewer-friendly
π Common in:
β Gene expression comparisons
β MAG abundance across conditions
β DNA:RNA ratio analyses
πΉ When should you use a violin plot?
A violin plot is useful when:
You have larger datasets
Distribution shape matters (e.g., multimodal data)
You want to visualize density, not just summary statistics
π Common in:
β Transcript abundance distributions
β Functional gene abundance
β Single-cell or large-sample datasets
π§ͺ Toy dataset (example)
Weβll use a simple toy dataset representing gene expression across two conditions.
Create toy data R version
set.seed(123)
toy_data <- data.frame(
Condition = rep(c("Control", "Treatment"), each = 30),
Expression = c(
rnorm(30, mean = 10, sd = 2),
rnorm(30, mean = 15, sd = 3)
)
)
π¦ R visualization (ggplot2) *Box plot (R)
library(ggplot2)
theme_pub <- theme(
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_blank(),
panel.border = element_rect(color = "black", fill = NA, size = 0.8),
axis.line = element_blank(),
axis.text = element_text(color = "black", size = 12),
axis.title = element_text(color = "black", size = 12),
plot.title = element_text(color = "black", size = 12, face = "bold"),
legend.text = element_text(color = "black", size = 12),
legend.title = element_text(color = "black", size = 12)
)
ggplot(toy_data, aes(x = Condition, y = Expression, fill = Condition)) +
geom_boxplot(outlier.shape = 16, alpha = 0.7) +
labs(
title = "Box Plot of Gene Expression",
y = "Expression level",
x = ""
) +
theme_pub +
theme(legend.position = "none")
*Violin plot (R)
ggplot(toy_data, aes(x = Condition, y = Expression, fill = Condition)) +
geom_violin(trim = FALSE, alpha = 0.7) +
geom_boxplot(width = 0.15, fill = "white", outlier.shape = NA) +
labs(
title = "Violin Plot of Gene Expression",
y = "Expression level",
x = ""
) +
theme_pub +
theme(legend.position = "none")

π Python equivalents (matplotlib + seaborn)
Many bioinformatics workflows rely on Python, so below are equivalent plots using Python.
Create toy data (Python)
import numpy as np
import pandas as pd
np.random.seed(123)
toy_data = pd.DataFrame({
"Condition": ["Control"] * 30 + ["Treatment"] * 30,
"Expression": np.concatenate([
np.random.normal(10, 2, 30),
np.random.normal(15, 3, 30)
])
})
*Box plot (Python)
palette = {
"Control": "#4C72B0", # blue
"Treatment": "#DD8452" # orange
}
sns.boxplot(
data=toy_data,
x="Condition",
y="Expression",
palette=palette
)
plt.title("Box Plot of Gene Expression")
plt.ylabel("Expression level")
plt.xlabel("")
plt.show()
*Violin plot (Python)
import seaborn as sns
import matplotlib.pyplot as plt
sns.violinplot(
data=toy_data,
x="Condition",
y="Expression",
palette=palette,
inner="box",
cut=0
)
plt.title("Violin Plot of Gene Expression")
plt.ylabel("Expression level")
plt.xlabel("")
plt.show()

These codes and themes are customized based on my preferences and are routinely used in my manuscripts, posters, and other presentations.
β οΈ Common mistakes to avoid
β Using violin plots for very small datasets β Showing density without summary statistics β Overusing colors or inconsistent palettes β Forgetting to label axes clearly
π¨ Note on editing figures in Adobe Illustrator
Even with powerful tools like R and Python, they donβt always give exactly what journals or reviewers expect.
In real-world manuscript preparation:
β Fonts may need adjustment
β Panel alignment may require manual tweaking
β Figure spacing, labels, or symbols may need fine control
β you may need to combine two different figure to 1 single figure
π Best practice
β Export plots as vector formats (.pdf or .svg)
β Perform final polishing in Adobe Illustrator (or Inkscape)
β Examples of common Illustrator edits:
β Aligning multi-panel figures
β Standardizing font sizes across panels
β Adjusting legend placement
β Adding panel labels (A, B, C)
This hybrid workflow (R/Python β Illustrator) is very common in published bioinformatics papers.
π Box plot vs Violin plot β quick summary
| Feature | Box plot | Violin plot |
|---|---|---|
| Shows median & quartiles | β | β |
| Shows distribution shape | β | β |
| Suitable for small datasets | β | β |
| Reviewer-friendly | β | β οΈ |
π Take-home message
β Box plots are ideal for clean, simple comparisons
β Violin plots reveal distribution structure
β Combining both often provides the clearest interpretation
β Final figure polishing is often done outside R/Python
π Coming next in the series
β Heatmaps (scaling & clustering)
β PCA vs UMAP
β Volcano plots
β Bubble plots
β Presenceβabsence plots
If youβre learning bioinformatics or preparing figures for a manuscript, I hope this series helps you visualize your data with confidence.