How to Interpret DESeq2 Results

Estimated reading time: 6 min

Running DESeq2 is the straightforward part. Understanding what the output actually means — and avoiding the mistakes that lead to wrong conclusions — is where most researchers struggle.

This guide explains every column in the DESeq2 results table, what the numbers mean biologically, and how to make defensible decisions about which genes are truly differentially expressed.

If you need end-to-end support with RNA-seq analysis, from raw FASTQ files to differential expression and pathway interpretation, explore our Transcriptomics Services.

What does a DESeq2 results table contain?

After running results() in DESeq2, you get a table with one row per gene and typically six columns:

Column	What it measures	What to look for
baseMean	Average normalised read count across all samples	Very low baseMean (<10) → treat results with caution regardless of p-value
log2FoldChange	Estimated expression ratio (condition B vs A) on a log2 scale	Positive = higher in condition B; negative = lower in condition B
lfcSE	Standard error of the log2FC estimate	Large SE relative to log2FC = uncertain estimate; use lfcShrink
stat	Wald test statistic (log2FC / lfcSE)	Used internally to compute p-values
pvalue	Raw p-value from the Wald test	Do not use this alone — always use padj
padj	BH-adjusted p-value (False Discovery Rate correction)	Standard threshold: padj < 0.05. NA = gene excluded from testing

Understanding log2FoldChange

The log2 fold change describes how much expression differs between conditions. A value of 1 means the gene is 2× more expressed; a value of 2 means 4× more expressed; −1 means half as expressed.

Three things to keep in mind:

Direction depends on the reference level. DESeq2 always computes the fold change relative to the denominator of your contrast (usually the control condition). If you flip the contrast, the sign of log2FC flips too.
Large fold changes in lowly expressed genes are unreliable. A gene expressed at 2 counts in one condition and 8 in another gives log2FC = 2, but the estimate is noisy. This is why log2FC shrinkage (lfcShrink()) is strongly recommended before visualisation and ranking.
A biologically meaningful threshold is your call, not DESeq2’s. Statistically significant does not mean biologically important. Many researchers apply an additional log2FC threshold (e.g. |log2FC| > 1) alongside padj < 0.05 to focus on genes with substantial changes.

p-value vs adjusted p-value: which one to use?

Always use padj. Never filter on raw p-value alone.

When you test thousands of genes simultaneously, roughly 5% of them will appear significant by chance at p < 0.05 — even if nothing is truly differentially expressed. The adjusted p-value (padj) applies Benjamini-Hochberg correction to control the False Discovery Rate (FDR): the expected proportion of false positives among the genes you call significant.

A padj < 0.05 means you accept that up to 5% of the genes in your significant list may be false positives. This is the standard threshold, though some studies use 0.10 for exploratory analysis or 0.01 for conservative calls.

Why are some padj values NA?

DESeq2 sets padj to NA for genes that were excluded from independent filtering. This happens when:

The gene has zero counts across all samples
The gene has an extreme outlier count in one sample (Cook’s distance filter)
The gene falls below the automatic low-count filter (independent filtering to maximise power)

NA padj does not mean “not significant” — it means the gene was not tested. These genes should be excluded from your significant gene list but should not be treated as having padj = 1.

Log2FC shrinkage: why it matters for visualisation

Raw log2FC estimates from DESeq2 can be inflated for lowly expressed genes, where variance is high. The lfcShrink() function applies shrinkage estimators (apeglm, ashr, or normal) to pull noisy fold change estimates towards zero, producing more reliable values for ranking and plotting.

Best practice:

Use lfcShrink(dds, coef=..., type="apeglm") for volcano plots, MA plots, and ranked gene lists
Shrunken log2FC values are better for biological interpretation but do not change the padj values — hypothesis testing is unchanged
Always label your figures as using shrunken or unshrunken values to be transparent with reviewers

Common mistakes when interpreting DESeq2 output

1. Filtering on raw p-value instead of padj

This inflates your significant gene list with false positives. Always use padj.

2. Ignoring baseMean

A gene with baseMean = 3 and padj = 0.001 should raise a flag. Low-count genes produce unstable estimates. Inspect their raw counts before including them in your biological narrative.

3. Treating all statistically significant genes as biologically important

Statistical significance is a function of effect size and sample size. With sufficient replicates, even tiny fold changes become significant. Always interpret log2FC alongside padj, and consider biological plausibility.

4. Not checking the reference level

DESeq2 uses the first alphabetical factor level as the denominator by default. If your conditions are “treated” and “control”, DESeq2 may set “control” as numerator if it comes first alphabetically. Always verify your contrast direction with resultsNames(dds) and set the reference explicitly with relevel().

5. Using normalised counts from DESeq2 for downstream plotting without understanding transformation

For PCA, heatmaps, and sample clustering, use variance-stabilising transformation (vst()) or regularised log transformation (rlog()), not raw or normalised counts. These transformations stabilise variance across the expression range and are appropriate for exploratory visualisation.

A practical filtering strategy for your significant gene list

A commonly used starting point for defining differentially expressed genes (DEGs):

padj < 0.05 — statistical significance threshold
|log2FoldChange| ≥ 1 — at least 2× change in expression (adjust based on biology)
baseMean ≥ 10 — sufficient expression to trust the estimate

These three filters together produce a conservative but defensible list of DEGs. The thresholds can be adjusted based on the organism, study design, and scientific question — but you should always pre-specify and justify your choices rather than optimising them post-hoc.

What to do after getting your DEG list

A list of differentially expressed genes is rarely the final output. The next steps typically include:

Volcano plot — visualise the relationship between fold change and statistical significance across all tested genes
Heatmap — show expression patterns of top DEGs across samples using vst()-transformed counts
PCA — confirm sample clustering is consistent with the experimental design
Functional enrichment — Gene Ontology (GO), KEGG pathway analysis, or eggNOG-mapper to understand the biological processes represented in your DEG list
Validation — if critical genes drive your main conclusions, consider RT-qPCR validation of a subset

For a complete overview of the RNA-seq analysis process from raw reads to pathway interpretation, see our guide to the RNA-seq data analysis pipeline.

Special considerations for microbial RNA-seq

DESeq2 was developed primarily with eukaryotic datasets in mind, but it is widely used for microbial transcriptomics. A few adjustments are worth knowing:

Bacterial genomes are smaller — you are typically testing hundreds of genes rather than tens of thousands. Independent filtering thresholds may need adjustment.
Operons matter — co-regulated genes in operons often appear together in your DEG list. This is biologically meaningful, not a statistical artefact.
rRNA depletion quality affects results — if rRNA contamination is high in some samples, normalisation can be skewed. Always inspect MultiQC reports before running DESeq2.
Very small genomes — for organisms with <500 annotated genes, the number of tested genes is small, which affects the power of FDR correction.

If you are working on a microbial transcriptomics project and have questions about your DESeq2 output, see our troubleshooting guide for low RNA-seq mapping rates for related quality control considerations.

Frequently asked questions

What log2FoldChange threshold should I use?

There is no universal rule. |log2FC| ≥ 1 (2× change) is a common starting point, but some studies use 0.5 or 2 depending on the biology. The threshold should reflect what constitutes a biologically meaningful change in your system, not be chosen to maximise or minimise the number of significant genes.

How many DEGs should I expect?

This varies enormously depending on the comparison, organism, and experimental conditions. A drug treatment in bacteria might produce 20–200 DEGs; a major environmental perturbation could produce thousands. If you get zero DEGs or tens of thousands, that is a signal to investigate your data quality, replicate structure, and model design.

Can I use DESeq2 for a dataset with no replicates?

DESeq2 requires at least two samples per condition to estimate dispersion. Without replicates, statistical testing is not possible. If you have no biological replicates, consult the DESeq2 documentation on estimateDispersionsGeneEst() with dispersion fitting options — but results should be treated as exploratory only.

Should I use DESeq2 or edgeR?

Both are well-validated and widely used. DESeq2 is often preferred for its automated dispersion shrinkage and intuitive workflow. edgeR offers more flexibility for complex designs. In most well-powered studies with standard designs, results are broadly concordant. The most important thing is to use one consistently and document your choice.

Need help interpreting your RNA-seq results?

Tailoredomics provides end-to-end RNA-seq bioinformatics analysis — from raw FASTQ files or count matrices to DESeq2 differential expression, pathway interpretation, and publication-ready figures. We work with any organism and any sequencing provider. No wet-lab work involved.

Explore Transcriptomics Services

Discuss your project

Ready to uncover the functional landscape of your microbial samples?

Explore our services at Tailoredomics. Request a quote or contact us for consultation

Click Here

How to Interpret DESeq2 Results

Running DESeq2 is the straightforward part. Understanding what the output actually means — and avoiding the mistakes that lead to wrong conclusions — is where most researchers struggle. This guide explains every column in the DESeq2 results table, what the numbers mean biologically, and how to make defensible decisions about which genes are truly differentially expressed. If you need end-to-end support with RNA-seq analysis, from raw FASTQ files to differential expression and pathway interpretation, explore our Transcriptomics Services. What does a DESeq2 results table contain? After running results() in DESeq2, you get a table with one row per gene and

Rubén Javier López July 1, 2026 No Comments

Proteomics

How to Submit Proteomics Data to PRIDE: A Practical Guide

Submitting proteomics data to the PRIDE repository is a mandatory requirement for publication in most journals — yet it is one of the most common bottlenecks that delays manuscript submission in proteomics groups. The science is done. The paper is written. And then everything stalls at data deposition. This post explains what PRIDE submission involves, why it fails more often than it should, and what your options are when you need it done quickly and correctly. Note: Tailoredomics provides downstream proteomics bioinformatics and PRIDE data deposition services. We do not perform mass spectrometry or wet-lab work — we work with

Rubén Javier López June 25, 2026 No Comments

Tips

How to Choose a Bioinformatics Service Provider

Sequencing data are easier to generate than ever, but analyzing them correctly remains difficult. Many research groups now receive FASTQ files, count tables, genome assemblies or metagenomic datasets from sequencing facilities, but do not always have the time, computational resources or specialized expertise to process them into reliable biological results. This is where a bioinformatics service provider can help. The right provider can turn raw sequencing data into reproducible workflows, interpretable figures, clear reports and publication-ready results. The wrong provider can produce generic outputs, poorly documented methods, unclear files, weak interpretation or results that are difficult to defend in a

Rubén Javier López June 17, 2026 No Comments

How to Interpret DESeq2 Results

Table of Contents

What does a DESeq2 results table contain?

Understanding log2FoldChange

p-value vs adjusted p-value: which one to use?

Why are some padj values NA?

Log2FC shrinkage: why it matters for visualisation

Common mistakes when interpreting DESeq2 output

1. Filtering on raw p-value instead of padj

2. Ignoring baseMean

3. Treating all statistically significant genes as biologically important

4. Not checking the reference level

5. Using normalised counts from DESeq2 for downstream plotting without understanding transformation

A practical filtering strategy for your significant gene list

What to do after getting your DEG list

Special considerations for microbial RNA-seq

Frequently asked questions

What log2FoldChange threshold should I use?

How many DEGs should I expect?

Can I use DESeq2 for a dataset with no replicates?

Should I use DESeq2 or edgeR?

Need help interpreting your RNA-seq results?

Related reading

Rubén Javier López

Ready to uncover the functional landscape of your microbial samples?

Leave a Reply Cancel Reply

How to Interpret DESeq2 Results

How to Submit Proteomics Data to PRIDE: A Practical Guide

How to Choose a Bioinformatics Service Provider

How to Interpret DESeq2 Results

Table of Contents

What does a DESeq2 results table contain?

Understanding log2FoldChange

p-value vs adjusted p-value: which one to use?

Why are some padj values NA?

Log2FC shrinkage: why it matters for visualisation

Common mistakes when interpreting DESeq2 output

1. Filtering on raw p-value instead of padj

2. Ignoring baseMean

3. Treating all statistically significant genes as biologically important

4. Not checking the reference level

5. Using normalised counts from DESeq2 for downstream plotting without understanding transformation

A practical filtering strategy for your significant gene list

What to do after getting your DEG list

Special considerations for microbial RNA-seq

Frequently asked questions

What log2FoldChange threshold should I use?

How many DEGs should I expect?

Can I use DESeq2 for a dataset with no replicates?

Should I use DESeq2 or edgeR?

Need help interpreting your RNA-seq results?

Related reading

Rubén Javier López

Our Fact Checking Process

Our Review Board

Ready to uncover the functional landscape of your microbial samples?

Leave a Reply Cancel Reply

How to Interpret DESeq2 Results

How to Submit Proteomics Data to PRIDE: A Practical Guide

How to Choose a Bioinformatics Service Provider