Common DESeq2 Mistakes and How to Avoid Them

Estimated reading time: 13 min

Table of Contents

DESeq2 is one of the most widely used tools for differential gene expression analysis in RNA-seq experiments.

It is powerful, well documented, and suitable for many standard bulk RNA-seq designs. However, it is also easy to misuse.

Many problematic RNA-seq results are not caused by DESeq2 itself, but by mistakes before, during, or after the DESeq2 analysis. These mistakes can lead to false positives, missing differentially expressed genes, misleading volcano plots, incorrect biological conclusions, or results that are difficult to reproduce.

Common problems include using the wrong input data, ignoring batch effects, designing the model incorrectly, filtering genes too aggressively, using too few biological replicates, or interpreting adjusted p-values and log2 fold changes incorrectly.

In this article, we explain the most common DESeq2 mistakes and how to avoid them.

For a broader overview of the full workflow, see our guide: RNA-Seq Data Analysis Pipeline: From FASTQ Files to Differential Gene Expression.


Why DESeq2 mistakes matter

Differential expression analysis is often one of the final steps in an RNA-seq project.

By the time you run DESeq2, many previous decisions have already influenced the result:

  • RNA extraction quality;
  • sequencing depth;
  • read quality;
  • library strandedness;
  • reference genome or transcriptome choice;
  • alignment or quantification method;
  • gene annotation;
  • sample metadata;
  • experimental design;
  • replicate structure;
  • batch effects;
  • filtering strategy.

DESeq2 cannot fully rescue a poorly designed experiment or a badly prepared input table.

This matters because differential expression results are often used to generate biological interpretation, pathway analysis, figures, candidate gene lists, and publication conclusions.

A small technical mistake can therefore become a biological overinterpretation.


Mistake 1: Using normalized counts, TPM, or FPKM as DESeq2 input

One of the most common DESeq2 mistakes is using the wrong type of input data.

DESeq2 is designed to work with count data, not normalized expression values such as TPM, FPKM, or RPKM.

The correct input is usually a gene-level count matrix, where:

  • rows are genes;
  • columns are samples;
  • values represent read counts or estimated counts;
  • sample names match the metadata table.

If you quantified reads with an aligner and a counting tool, such as STAR/HISAT2 plus featureCounts or HTSeq-count, you normally provide raw gene counts.

If you used transcript-level quantification tools such as Salmon, kallisto, or RSEM, you should usually import the results with a tool such as tximport, which can summarize transcript-level estimates to the gene level in a way suitable for downstream differential expression analysis.

What you should generally avoid is manually feeding TPM or FPKM values into DESeq2 as if they were raw counts.

Why is this a problem?

Because DESeq2 models count data and estimates size factors, dispersions, and statistical significance based on assumptions appropriate for counts. TPM and FPKM are already normalized expression measures and do not preserve the same statistical structure.

A typical warning sign of this mistake is a DESeq2 input matrix with decimal-normalized expression values that were exported directly from another program without understanding what they represent.

How to avoid it

Use:

  • raw gene counts from featureCounts, HTSeq-count, or similar tools;
  • estimated gene-level counts imported with tximport;
  • a clear and reproducible count-generation workflow.

Avoid:

  • TPM as direct DESeq2 input;
  • FPKM/RPKM as direct DESeq2 input;
  • Excel-modified expression tables;
  • manually normalized counts unless you know exactly what you are doing.

Mistake 2: Poor sample metadata

DESeq2 does not only need a count matrix. It also needs accurate sample metadata.

The metadata table tells DESeq2 which samples belong to which conditions and which variables should be included in the statistical design.

A typical metadata table may include:

  • sample ID;
  • condition;
  • treatment;
  • time point;
  • biological replicate;
  • batch;
  • patient or subject ID;
  • sequencing lane;
  • library preparation batch;
  • sex, genotype, tissue, or other covariates.

Many DESeq2 errors come from metadata problems rather than count data problems.

Common metadata mistakes include:

  • sample names in the metadata do not match column names in the count matrix;
  • conditions are misspelled;
  • replicate labels are wrong;
  • batches are missing;
  • sample order is assumed instead of checked;
  • numeric variables are accidentally treated as categories, or vice versa;
  • the reference condition is not what the user thinks it is.

These mistakes can completely change the result.

For example, if your metadata says that sample A is a control but it is actually a treated sample, DESeq2 will not know that. It will simply analyze the incorrect design you gave it.

How to avoid it

Before running DESeq2, check:

  • Are all count matrix columns present in the metadata?
  • Are all metadata rows present in the count matrix?
  • Are sample names identical?
  • Are condition names consistent?
  • Are biological replicates correctly assigned?
  • Are batches recorded?
  • Is the reference level correct?
  • Does the design formula match the biological question?

This step is basic, but it is one of the most important parts of the analysis.


Mistake 3: Confusing technical replicates and biological replicates

DESeq2 needs biological replication to estimate variability between biological samples.

Technical replicates can be useful for sequencing or library-preparation quality assessment, but they do not replace biological replicates.

A biological replicate represents an independent biological unit, such as:

  • different patients;
  • different animals;
  • different cultures;
  • different biological samples;
  • independent experimental units.

A technical replicate represents repeated measurement of the same biological material.

The distinction matters because differential expression analysis is not only asking whether two count values are different. It is asking whether expression differs between biological conditions while accounting for biological variability.

If you only have technical replicates, you may underestimate variability and overstate significance.

How to avoid it

Use true biological replicates whenever possible.

For standard bulk RNA-seq differential expression analysis, three biological replicates per condition is often treated as a practical minimum, although more replicates are usually better, especially when biological variability is high.

If your experiment has no biological replication, DESeq2 may still produce numbers, but the statistical interpretation is weak. In that case, the analysis should be treated as exploratory.

For more on sequencing depth and experimental planning, see: How Many Reads Do You Need for RNA-Seq? Sequencing Depth Explained.


Mistake 4: Using the wrong design formula

The design formula is one of the most important parts of a DESeq2 analysis.

It tells DESeq2 which variables to model.

For a simple two-condition experiment, a design may look like:

~ condition

But many real experiments are more complex.

Examples include:

  • treated vs control samples across different batches;
  • paired patient samples;
  • time-course experiments;
  • multiple tissues;
  • multiple genotypes;
  • treatment plus time;
  • treatment plus donor;
  • interaction effects.

In these cases, a simple ~ condition design may be insufficient or misleading.

For example, if you have paired samples from the same patient before and after treatment, the patient effect should usually be included in the model. Otherwise, patient-to-patient variation may obscure or distort the treatment effect.

Similarly, if all control samples were sequenced in one batch and all treated samples in another batch, condition and batch are confounded. In that situation, it may be impossible to separate biological treatment effects from technical batch effects.

How to avoid it

Before writing the design formula, define the biological question clearly.

Ask:

  • What comparison do I want to test?
  • What variables could affect expression?
  • Are samples paired?
  • Are there time points?
  • Are there batches?
  • Are any variables confounded?
  • Do I need an interaction term?

Examples:

Simple two-group comparison:

~ condition

Batch-aware comparison:

~ batch + condition

Paired design:

~ patient + condition

Time and treatment design:

~ time + treatment

Interaction design:

~ genotype + treatment + genotype:treatment

The correct formula depends on the experiment. There is no universal DESeq2 design that fits every dataset.


Mistake 5: Ignoring confounding

Confounding happens when two variables are mixed together in a way that makes their effects impossible or difficult to separate.

For example:

  • all control samples were sequenced in batch 1;
  • all treated samples were sequenced in batch 2.

In that case, batch and condition are perfectly confounded.

If the treated samples look different from the controls, is that because of treatment or because of the sequencing batch?

The answer may be impossible to determine from the data alone.

This is not a DESeq2 problem. It is an experimental design problem.

DESeq2 can include batch in the design formula, but it cannot magically separate variables that are perfectly confounded.

How to avoid it

The best solution is prevention.

During experimental design:

  • randomize samples across extraction batches;
  • randomize samples across library preparation batches;
  • randomize samples across sequencing lanes;
  • balance conditions across batches;
  • avoid processing all controls on one day and all treated samples on another day.

During analysis:

  • inspect metadata carefully;
  • create contingency tables between condition and batch;
  • use PCA plots to detect unwanted clustering;
  • include known batch variables in the design when appropriate;
  • be explicit about limitations if confounding cannot be resolved.

Mistake 6: Not checking sample-level QC before differential expression

Running DESeq2 without sample-level QC is risky.

Before focusing on differentially expressed genes, you should check whether the samples behave as expected.

Useful exploratory checks include:

  • library size distribution;
  • number of detected genes per sample;
  • PCA plots;
  • sample-to-sample distance heatmaps;
  • clustering by condition;
  • clustering by batch;
  • outlier samples;
  • mapping or assignment rates;
  • strandedness consistency;
  • gene body coverage where relevant.

If one sample is very different from all others, it may strongly affect the results.

The reason may be biological, but it may also be technical:

  • degraded RNA;
  • poor library preparation;
  • wrong sample label;
  • contamination;
  • low sequencing depth;
  • poor mapping rate;
  • wrong strandedness setting;
  • sample swap.

A common mistake is to discover these problems only after producing a volcano plot.

How to avoid it

Before DESeq2 testing, inspect the dataset.

A reasonable pre-analysis QC workflow includes:

  1. Read quality assessment.
  2. Alignment or quantification summary.
  3. Count assignment rate.
  4. Library size checks.
  5. PCA or sample distance plots.
  6. Outlier inspection.
  7. Metadata verification.

If you see unexpectedly low mapping or assignment rates, this article may be useful: Low RNA-seq Mapping Rate: Causes and Fixes.


Mistake 7: Ignoring batch effects

Batch effects are one of the most common sources of misleading RNA-seq results.

A batch effect occurs when technical factors influence gene expression measurements.

Possible batch sources include:

  • RNA extraction date;
  • library preparation batch;
  • sequencing run;
  • sequencing lane;
  • operator;
  • reagent lot;
  • sample storage time;
  • center or laboratory;
  • instrument;
  • RNA integrity differences.

If batch effects are not modeled, DESeq2 may identify genes associated with technical variation rather than biological condition.

This is especially problematic when batch structure overlaps with the experimental groups.

How to avoid it

If the batch variable is known and not fully confounded with the condition of interest, include it in the design formula.

For example:

~ batch + condition

This allows DESeq2 to estimate the condition effect while accounting for batch-related variation.

However, be careful with visualization.

Variance-stabilized or regularized-log transformed values are useful for PCA and clustering, but they are not the values used for differential testing. Batch removal methods can be useful for visualization, but you should avoid using batch-corrected transformed values as the input for DESeq2 differential testing.

In short:

  • model known batch effects in the DESeq2 design;
  • use transformed data for visualization;
  • do not use PCA batch correction as a substitute for a correct model;
  • do not ignore confounded designs.

Mistake 8: Filtering genes incorrectly

Low-count genes contain little information for differential expression analysis.

Filtering them can improve speed, reduce noise in visualizations, and sometimes improve statistical power.

However, filtering can also be done badly.

Common filtering mistakes include:

  • removing genes based on differential expression results;
  • filtering separately by condition;
  • applying arbitrary thresholds without documenting them;
  • filtering too aggressively;
  • filtering after looking at p-values;
  • removing genes that are biologically relevant but lowly expressed.

A simple and common approach is to remove genes with very low counts across nearly all samples before running the full analysis.

For example, you may keep genes that have at least a minimal count in a minimal number of samples. The exact threshold depends on sample size, sequencing depth, and biological context.

How to avoid it

Use a transparent pre-filtering rule.

For example:

  • remove genes with almost no counts in all samples;
  • base the threshold on the smallest relevant group size;
  • document the rule;
  • avoid filtering based on the final test result.

Filtering should remove uninformative features, not sculpt the result.


Mistake 9: Misunderstanding adjusted p-values

RNA-seq differential expression analysis tests thousands of genes.

If you test 20,000 genes, some will have small p-values by chance.

This is why multiple-testing correction is necessary.

DESeq2 reports adjusted p-values, often using the Benjamini-Hochberg false discovery rate approach. In practice, researchers often use thresholds such as:

  • adjusted p-value < 0.05;
  • adjusted p-value < 0.1.

A common mistake is to focus on raw p-values instead of adjusted p-values.

Another mistake is treating statistical significance as biological importance.

A gene with a very small adjusted p-value but tiny fold change may be statistically reliable but not necessarily biologically meaningful. Conversely, a gene with a large fold change but weak statistical support may be interesting but uncertain.

How to avoid it

Interpret results using both:

  • adjusted p-value;
  • log2 fold change;
  • expression level;
  • biological context;
  • consistency across replicates;
  • pathway or functional relevance.

Do not rank genes by raw p-value alone.


Mistake 10: Misinterpreting log2 fold change

Log2 fold change describes the estimated expression difference between conditions.

For example:

  • log2 fold change = 1 means approximately 2-fold higher expression;
  • log2 fold change = -1 means approximately 2-fold lower expression;
  • log2 fold change = 2 means approximately 4-fold higher expression;
  • log2 fold change = -2 means approximately 4-fold lower expression.

However, log2 fold changes can be unstable for genes with low counts.

A gene with very low expression may show a large fold change simply because it has a few reads in one condition and almost none in the other.

This is one reason why DESeq2 includes log2 fold-change shrinkage methods, which can improve the stability and interpretability of effect-size estimates, especially for ranking and visualization.

Volcano plot showing differentially expressed genes with log2 fold change on the x-axis and statistical significance on the y-axis.

How to avoid it

When interpreting fold changes:

  • check the base mean or average expression level;
  • inspect normalized counts for important genes;
  • use shrinkage estimates when appropriate;
  • avoid overinterpreting huge fold changes from very low-count genes;
  • consider both effect size and statistical support.

A volcano plot can be useful, but it should not be the only basis for interpretation.


Mistake 11: Misinterpreting volcano plots

Volcano plots are popular because they summarize differential expression results visually.

They usually show:

  • log2 fold change on the x-axis;
  • statistical significance on the y-axis;
  • highlighted genes passing selected thresholds.

However, volcano plots are often overinterpreted.

Common mistakes include:

  • treating all highlighted genes as equally important;
  • ignoring low-count genes;
  • ignoring gene annotation quality;
  • using arbitrary thresholds without explanation;
  • labeling too many genes;
  • ignoring batch effects or poor experimental design;
  • assuming that a volcano plot proves biological mechanism.

A volcano plot is a summary, not a complete interpretation.

It tells you which genes meet certain statistical and effect-size criteria. It does not tell you whether the experiment was well designed, whether the annotation is correct, or whether the result makes biological sense.

How to avoid it

Use volcano plots together with:

  • MA plots;
  • PCA plots;
  • normalized count plots for key genes;
  • sample-level QC;
  • pathway enrichment;
  • functional annotation;
  • biological replicates;
  • domain knowledge.

The best differential expression interpretation combines statistics with biology.


Mistake 12: Treating DESeq2 as a complete RNA-seq pipeline

DESeq2 is not the entire RNA-seq workflow.

It is one part of the differential expression analysis.

A complete RNA-seq workflow may include:

  • experimental design;
  • RNA extraction and quality assessment;
  • sequencing;
  • raw read quality control;
  • adapter and quality trimming;
  • alignment or transcript quantification;
  • gene-level count generation;
  • sample metadata preparation;
  • exploratory analysis;
  • differential expression testing;
  • visualization;
  • pathway or functional enrichment;
  • biological interpretation;
  • reproducible reporting.

DESeq2 starts after you already have suitable count data and metadata.

If earlier steps are wrong, DESeq2 results may be unreliable even if the code runs without errors.

For a general introduction to transcriptome-wide analysis, see: What Is Transcriptomics? How RNA-Seq Reveals What Microbes Are Doing.


Mistake 13: Ignoring annotation quality

Differential expression analysis depends on gene annotation.

If the annotation is incomplete, outdated, inconsistent, or poorly matched to the organism or strain, the results may be difficult to interpret.

This is especially important in microbial transcriptomics.

Problems can include:

  • missing genes;
  • incorrect gene boundaries;
  • inconsistent gene IDs;
  • duplicated gene names;
  • poorly annotated hypothetical proteins;
  • strain differences between the reference and the studied organism;
  • use of an annotation file that does not match the genome FASTA;
  • mismatch between transcript IDs and gene IDs.

For non-model organisms, microbial isolates, MAGs, or custom genomes, annotation quality can strongly influence downstream interpretation.

You may obtain a valid DESeq2 result table, but many genes may be difficult to interpret biologically if the annotation is weak.

How to avoid it

Check that:

  • the genome FASTA and annotation file match;
  • gene IDs are consistent across tools;
  • annotation sources are documented;
  • functional annotation is added where possible;
  • hypothetical proteins are interpreted cautiously;
  • important genes are manually inspected when needed.

For microbial RNA-seq, differential expression and genome annotation are tightly connected.


Mistake 14: Comparing too many contrasts without a clear question

DESeq2 can test many contrasts, but that does not mean every possible comparison is useful.

In complex experiments, users often compare:

  • every condition against every other condition;
  • every time point against every other time point;
  • every treatment combination;
  • every subgroup.

This can create a large number of result tables and increase the risk of confused interpretation.

The problem is not only statistical. It is conceptual.

If the biological question is unclear, the analysis becomes a fishing expedition.

How to avoid it

Before running all possible contrasts, define:

  • primary comparisons;
  • secondary comparisons;
  • exploratory comparisons;
  • expected direction of change;
  • biological hypotheses;
  • reporting priorities.

For example:

Primary contrast:

  • treated vs control at final time point.

Secondary contrasts:

  • treated vs control at earlier time points;
  • time response within treated samples.

Exploratory contrasts:

  • subgroup analyses;
  • interaction effects.

This makes the analysis easier to interpret and report.


Mistake 15: Overlooking reproducibility

A DESeq2 analysis should be reproducible.

Common reproducibility problems include:

  • manual editing of count files;
  • undocumented filtering;
  • unrecorded package versions;
  • unclear sample metadata;
  • missing code;
  • inconsistent gene IDs;
  • plots generated manually without scripts;
  • no record of reference genome or annotation version;
  • no session information.

These issues become serious when writing a paper, responding to reviewers, revisiting an old project, or comparing results across projects.

How to avoid it

A reproducible DESeq2 analysis should include:

  • raw input files or clear file paths;
  • count-generation method;
  • reference genome and annotation version;
  • metadata table;
  • R script or R Markdown file;
  • filtering criteria;
  • design formula;
  • contrasts tested;
  • package versions;
  • exported result tables;
  • figures generated from code;
  • short interpretation report.

This is especially important for collaborative projects, where the person interpreting the results may not be the same person who ran the pipeline.


Practical DESeq2 checklist

Before trusting your DESeq2 results, check the following:

  1. Are you using raw gene counts or appropriate estimated counts?
  2. Are you avoiding TPM/FPKM as direct DESeq2 input?
  3. Do count matrix column names match metadata sample names?
  4. Are biological replicates correctly defined?
  5. Is the design formula appropriate for the experiment?
  6. Are known batch effects included when appropriate?
  7. Are condition and batch confounded?
  8. Have you checked PCA or sample distance plots?
  9. Have you inspected outlier samples?
  10. Have you filtered low-count genes transparently?
  11. Are you using adjusted p-values, not only raw p-values?
  12. Are log2 fold changes interpreted with expression levels?
  13. Are volcano plots supported by additional QC and biological context?
  14. Is the annotation file correct and consistent?
  15. Is the analysis reproducible?

If several of these points are uncertain, the analysis may need revision before drawing biological conclusions.


DESeq2 is powerful, but interpretation still matters

DESeq2 is a robust and widely used tool for RNA-seq differential expression analysis, but it is not a substitute for good experimental design, careful metadata preparation, appropriate quality control, and biological interpretation.

Many RNA-seq mistakes happen before DESeq2 is even run.

Others happen after the result table is generated, when adjusted p-values, log2 fold changes, and volcano plots are interpreted too mechanically.

A good RNA-seq analysis should answer a biological question, not just produce a list of differentially expressed genes.

That requires a complete workflow: quality control, correct quantification, well-structured metadata, appropriate statistical design, transparent filtering, careful visualization, and interpretation in biological context.

If you are working with RNA-seq data and need support with count generation, DESeq2 analysis, differential expression interpretation, pathway analysis, or publication-ready figures, Tailoredomics offers transcriptomics data analysis services for microbial and biological research projects.


FAQ

What input should I use for DESeq2?

DESeq2 should generally be used with raw gene-level counts or appropriate estimated counts imported from transcript quantification tools using workflows such as tximport. TPM, FPKM, or RPKM values should not be used directly as DESeq2 input.

Can I use TPM values in DESeq2?

No, not as direct input for standard DESeq2 differential expression analysis. TPM values are normalized abundance estimates, while DESeq2 models count data. If you used transcript quantification tools such as Salmon or kallisto, use an appropriate import workflow to summarize transcript estimates to gene-level counts.

How many replicates do I need for DESeq2?

DESeq2 can technically run with small sample sizes, but meaningful differential expression analysis requires biological replication. Three biological replicates per condition is often treated as a practical minimum for simple bulk RNA-seq experiments, but more replicates are better when variability is high.

Should batch effects be removed before DESeq2?

Known batch variables should usually be included in the DESeq2 design formula when appropriate and not fully confounded with the condition of interest. Batch-corrected transformed data may be useful for visualization, but should not replace a correct statistical design for differential expression testing.

Why are some DESeq2 adjusted p-values NA?

Adjusted p-values may be set to NA because of independent filtering, very low counts, or outlier handling. This does not necessarily mean the analysis failed. It means those genes were not assigned an adjusted p-value under the applied filtering or outlier rules.

Is a volcano plot enough to interpret RNA-seq results?

No. A volcano plot is useful, but it should be interpreted together with sample QC, PCA plots, MA plots, normalized counts for important genes, annotation quality, pathway analysis, and biological context.

Fact Checked & Editorial Guidelines
Reviewed by: Subject Matter Experts

Ready to uncover the functional landscape of your microbial samples?

Explore our services at Tailoredomics. Request a quote or contact us for consultation

Leave a Reply

Bioinformatic Workflows
Rubén Javier López

Common DESeq2 Mistakes and How to Avoid Them

DESeq2 is one of the most widely used tools for differential gene expression analysis in RNA-seq experiments. It is powerful, well documented, and suitable for many standard bulk RNA-seq designs. However, it is also easy to misuse. Many problematic RNA-seq results are not caused by DESeq2 itself, but by mistakes before, during, or after the DESeq2 analysis. These mistakes can lead to false positives, missing differentially expressed genes, misleading volcano plots, incorrect biological conclusions, or results that are difficult to reproduce. Common problems include using the wrong input data, ignoring batch effects, designing the model incorrectly, filtering genes too aggressively,

Read More »
Diagram showing fragmented metagenome assembly with short reads, multiple contigs, low coverage regions, and microbial community complexity.
Metagenomics & Microbiome
Rubén Javier López

Why Is My Metagenome Assembly So Fragmented? Common Causes and Fixes

Metagenome assembly is one of the most useful steps in shotgun metagenomics, but it is also one of the most frustrating. You may start with millions of high-quality reads, run a standard assembler, and still obtain an output with thousands or millions of short contigs, a low N50, poor genome recovery, and few usable metagenome-assembled genomes. This does not always mean that the analysis failed. Metagenomes are intrinsically difficult to assemble because they contain DNA from many organisms at different abundances, often with closely related strains, repeated regions, mobile genetic elements, plasmids, viruses, and uneven sequencing depth. In other words,

Read More »
Metagenomics Services
Metagenomics & Microbiome
Rubén Javier López

Common Metagenomics Mistakes and How to Avoid Them

Metagenomics can generate powerful insights into microbial communities, from taxonomic composition to metabolic potential and genome recovery. But it is also one of the easiest omics approaches to get wrong. Poor experimental design, inappropriate sequencing strategies, weak preprocessing, low-quality assemblies, and overconfident biological interpretation can all compromise the final results. In many cases, the biggest problems do not appear at the end of the workflow. They start much earlier, when samples are collected, metadata is incomplete, sequencing depth is insufficient, or the wrong analytical approach is chosen. In this guide, we review some of the most common metagenomics mistakes and

Read More »