Metagenome-assembled genomes, or MAGs, are one of the most useful outputs of shotgun metagenomics.
Instead of only asking which organisms are present, MAGs allow researchers to reconstruct draft genomes directly from complex microbial communities. This can reveal metabolic potential, ecological roles, genome content, functional pathways, and possible interactions between organisms.
However, MAG recovery is not always successful.
A metagenomics project may produce many bins, but only a few of them may be useful. Some bins may have low completeness, high contamination, poor taxonomic consistency, or fragmented assemblies. Others may look acceptable numerically but still be difficult to interpret biologically.
In this article, we explain why MAGs become low quality, how to diagnose the problem, and what can be done to improve MAG recovery.
For a broader overview of the upstream workflow, see our guide: Metagenome Assembly Pipeline: From Raw Reads to MAGs.
What is a low-quality MAG?
A MAG is a draft genome reconstructed from metagenomic sequencing data.
Because the genome is not isolated from a pure culture, its quality must be estimated computationally. The most common quality metrics are:
- completeness;
- contamination;
- strain heterogeneity;
- number of contigs;
- N50;
- total genome size;
- GC content;
- taxonomic consistency;
- presence of marker genes;
- functional annotation completeness.
A low-quality MAG usually has one or more of these problems:
- low estimated completeness;
- high estimated contamination;
- many short contigs;
- inconsistent taxonomic assignment;
- abnormal genome size;
- missing essential marker genes;
- duplicated single-copy marker genes;
- poor functional annotation;
- unclear biological interpretation.
A MAG can be low quality because the binning step failed, but the problem often starts earlier: sampling, sequencing depth, read quality, assembly fragmentation, strain variation, or inappropriate co-assembly strategy.
MAG quality depends on the whole workflow
MAG quality is not determined only by the binning tool.
It depends on the complete metagenomics workflow:
- sample type and microbial complexity;
- DNA extraction quality;
- sequencing depth;
- read quality control;
- host read removal;
- metagenome assembly;
- contig coverage estimation;
- binning;
- bin refinement;
- quality assessment;
- taxonomic and functional annotation.
If the assembly is highly fragmented, binning becomes harder.
If coverage is too low, genomes cannot be reconstructed well.
If closely related strains are present, contigs may be incorrectly split or mixed.
If the sample contains many low-abundance organisms, many genomes may remain incomplete.
This is why low-quality MAGs should not be interpreted as a simple “binning tool failure”. They are often a symptom of upstream limitations.
For more on assembly fragmentation, see: Why Is My Metagenome Assembly So Fragmented?
Cause 1: Low sequencing depth
Low sequencing depth is one of the most common causes of incomplete MAGs.
To reconstruct a genome from a metagenome, the organism must be represented by enough reads. If an organism is rare, only a small fraction of the total reads will come from that genome.
This creates several problems:
- short contigs;
- missing genomic regions;
- low marker gene recovery;
- poor binning signal;
- incomplete metabolic pathways;
- low estimated completeness.
A dataset may have many reads overall but still provide insufficient coverage for most organisms in the community.
This is especially common in highly diverse samples such as soil, sediment, wastewater, marine samples, and environmental biofilms.
How to fix it
Possible solutions include:
- deeper sequencing;
- focusing on dominant organisms;
- co-assembling biologically related samples;
- reducing host contamination before sequencing;
- using enrichment strategies when appropriate;
- setting realistic expectations for rare taxa.
More sequencing helps most when the target organisms are present but under-covered. It may not fully solve the problem in extremely complex communities where added reads are still spread across hundreds or thousands of organisms.
Cause 2: Fragmented metagenome assembly
MAGs are built from assembled contigs. If the assembly is fragmented, bins will also tend to be fragmented.
A fragmented assembly can result from:
- low coverage;
- short reads;
- high community complexity;
- strain variation;
- repetitive regions;
- poor read quality;
- inappropriate assembler settings;
- excessive host or contaminant DNA.
When contigs are too short, binning tools have less information to work with. Short contigs often have less reliable tetranucleotide composition, weaker coverage signals, and fewer marker genes.
This makes it harder to group contigs into correct genome bins.
How to fix it
Useful steps include:
- checking read quality before and after trimming;
- mapping reads back to the assembly;
- comparing different assemblers;
- testing individual assembly vs co-assembly;
- excluding very short contigs before binning;
- evaluating assembly metrics before binning;
- using long-read or hybrid strategies when appropriate.
The best assembly for binning is not always the assembly with the highest N50. It is the assembly that produces biologically coherent, low-contamination bins.
Cause 3: Closely related strains
Strain variation is a major challenge in MAG reconstruction.
Many microbial communities contain multiple strains of the same species. These strains may share most of their genomes but differ in mobile regions, plasmids, prophages, accessory genes, and SNP-rich regions.
For binning tools, this creates ambiguity.
Closely related strains may have similar GC content and similar tetranucleotide composition. If their coverage profiles are also similar, the binning tool may merge them into one contaminated bin.
Alternatively, the assembler may split strain-variable regions into many short contigs, reducing completeness.
How to fix it
Possible approaches include:
- avoiding unnecessary co-assembly when it increases strain complexity;
- comparing sample-specific assemblies and grouped co-assemblies;
- using coverage profiles across multiple samples;
- applying bin refinement tools;
- manually inspecting suspicious bins;
- using strain-aware methods when strain resolution is central to the project.
If the biological question requires strain-level resolution, standard MAG reconstruction may not be enough.
Cause 4: High contamination
A contaminated MAG contains sequences from more than one organism.
Contamination can occur when contigs from different taxa are grouped into the same bin.
Common signs include:
- high CheckM contamination estimate;
- duplicated single-copy marker genes;
- inconsistent GC content;
- inconsistent coverage;
- mixed taxonomic assignments;
- abnormal genome size;
- conflicting functional signals.
High contamination is especially problematic because it can lead to false biological conclusions.
For example, a pathway may appear to be present in a genome only because contigs from another organism were incorrectly included in the same bin.
How to fix it
Possible fixes include:
- use multiple binning tools;
- refine bins manually or semi-automatically;
- inspect GC and coverage plots;
- remove contigs with inconsistent taxonomy;
- split suspicious bins;
- compare bin quality before and after refinement;
- avoid overinterpreting contaminated bins.
A slightly incomplete but clean MAG is often more useful than a more complete but highly contaminated one.
Cause 5: Poor coverage variation across samples
Binning tools often use coverage patterns to separate genomes.
If you have multiple related samples, different organisms may vary in abundance across samples. This variation helps binning tools distinguish contigs from different genomes.
However, if all samples have very similar coverage patterns, or if only one sample is available, coverage-based separation becomes harder.
This is especially problematic when several organisms have similar sequence composition.
How to fix it
Useful strategies include:
- including multiple biologically related samples when possible;
- using differential coverage across conditions or time points;
- avoiding over-broad co-assemblies;
- selecting assembly groups carefully;
- using both sequence composition and coverage-aware binning tools.
Good metadata and experimental design can improve MAG recovery because they make coverage patterns more informative.
Cause 6: Host or non-target DNA
Host contamination reduces the fraction of reads available for microbial assembly and binning.
This is common in:
- gut microbiome samples;
- animal-associated microbiomes;
- plant-associated microbiomes;
- clinical samples;
- low-biomass samples.
If host DNA dominates the dataset, microbial genomes may have insufficient coverage even when the total sequencing output seems high.
How to fix it
Depending on the project, consider:
- host DNA depletion before sequencing;
- computational host read removal;
- deeper sequencing;
- careful sample preparation;
- realistic expectations for low-biomass samples.
Host read removal is not only a privacy or filtering step. It directly affects how much useful microbial information remains for assembly and MAG recovery.
Cause 7: Inappropriate binning strategy
Different binning tools can produce different results.
Common binning tools include MetaBAT2, MaxBin2 and CONCOCT. Each uses somewhat different assumptions and may perform better or worse depending on the dataset.
For important projects, it is often useful to compare multiple binning outputs and refine the results.
For more on this, see: Metagenomic Binning Tools Compared: MetaBAT2 vs MaxBin2 vs CONCOCT
How to fix it
A practical strategy is:
- assemble the metagenome;
- map reads back to contigs;
- run more than one binning tool;
- compare bin quality;
- refine bins;
- remove contaminated contigs;
- evaluate final bins with quality tools;
- annotate only bins that are good enough for the research question.
The objective is not to maximize the number of bins. It is to recover bins that are biologically interpretable.
How to evaluate MAG quality
A useful MAG evaluation should include:
- completeness;
- contamination;
- strain heterogeneity;
- genome size;
- GC content;
- number of contigs;
- N50;
- marker gene recovery;
- taxonomic classification;
- functional annotation;
- read coverage;
- consistency with the sample ecology.
Tools such as CheckM, CheckM2, GTDB-Tk and anvi’o can help assess different aspects of MAG quality.
However, quality metrics should not be interpreted mechanically.
A MAG with 92% completeness and 8% contamination may be less useful than a MAG with 82% completeness and 1% contamination, depending on the question.
For metabolic reconstruction, missing genes can affect pathway interpretation.
For phylogenomic placement, marker gene completeness matters.
For comparative genomics, contamination can create false gene presence.
The correct quality threshold depends on the downstream objective.
When is a low-quality MAG still useful?
A low-quality MAG is not always useless.
It may still help with:
- exploratory taxonomic placement;
- partial metabolic inference;
- identifying dominant organisms;
- detecting specific genes;
- generating hypotheses;
- guiding future sequencing.
However, low-quality MAGs should be interpreted cautiously.
They are usually not ideal for:
- detailed metabolic reconstruction;
- comparative genomics;
- species description;
- pathway completeness analysis;
- genome-scale metabolic modeling;
- publication claims about complete gene repertoires.
If a MAG is incomplete or contaminated, the limitations should be clearly reported.
Practical checklist for improving MAG quality
If your MAGs are low quality, check the following:
- Was sequencing depth sufficient?
- Were reads properly quality-controlled?
- Was host DNA removed when relevant?
- Is the assembly highly fragmented?
- Were very short contigs excluded from binning?
- Were reads mapped back to estimate coverage?
- Was individual assembly compared with co-assembly?
- Were multiple binning tools tested?
- Were bins refined?
- Are contamination estimates driven by duplicated marker genes?
- Do GC content and coverage patterns look consistent?
- Are taxonomic assignments coherent?
- Is the MAG good enough for the biological question?
This workflow is more useful than simply rerunning the same binning command with different parameters.
Final thoughts
Low-quality MAGs are common in shotgun metagenomics, especially when working with complex communities, low-abundance organisms, fragmented assemblies, or closely related strains.
The problem is rarely caused by one step alone.
MAG quality reflects the entire workflow: sampling, sequencing, preprocessing, assembly, binning, quality control and interpretation.
A good metagenomics analysis should therefore evaluate not only how many bins were recovered, but whether those bins are complete, clean, taxonomically coherent, functionally interpretable and appropriate for the biological question.
If you are working with shotgun metagenomic data and need help with assembly, binning, MAG quality assessment, functional annotation or biological interpretation, Tailoredomics offers metagenomics services for microbial research projects.
FAQ
What is a MAG in metagenomics?
A MAG, or metagenome-assembled genome, is a draft genome reconstructed from shotgun metagenomic sequencing data rather than from an isolated pure culture.
What makes a MAG low quality?
A MAG is usually considered low quality if it has low completeness, high contamination, fragmented contigs, inconsistent taxonomy, abnormal genome size or poor marker gene recovery.
Can low-quality MAGs still be useful?
Yes, but they should be interpreted cautiously. They may be useful for exploratory analysis, but less suitable for detailed metabolic reconstruction, comparative genomics or strong publication claims.
How can I improve MAG quality?
Improve read preprocessing, increase sequencing depth when appropriate, reduce host contamination, optimize assembly strategy, compare binning tools, refine bins and evaluate completeness and contamination carefully.
Is high completeness more important than low contamination?
Both matter. For many analyses, a clean but moderately complete MAG may be more reliable than a highly complete but contaminated MAG.
Rubén Javier López
Rubén holds a microbiology PhD degree granted by the University of Bergen (Norway). He is proficient in bacterial metagenomics, genomics, transcriptomics and transcriptomics. He has hands-on experience and data analysis expertise in Illumina, Nanopore and PacBio sequencing technologies and has collaborated with scientists and labs all over the world. Moreover, he has been associated with biomedicine research groups, analyzing microbiome and mycobiome data.
- Low-Quality MAGs: Common Causes and Fixes June 3, 2026
- Kraken2 vs Kaiju vs MetaPhlAn: Which Taxonomic Profiler Should You Use? May 27, 2026
- Common DESeq2 Mistakes and How to Avoid Them May 20, 2026
- Why Is My Metagenome Assembly So Fragmented? Common Causes and Fixes May 15, 2026
- Common Metagenomics Mistakes and How to Avoid Them May 4, 2026
Our Fact Checking Process
We prioritize accuracy and integrity in our content. Here's how we maintain high standards:
- Expert Review: All articles are reviewed by subject matter experts.
- Source Validation: Information is backed by credible, up-to-date sources.
- Transparency: We clearly cite references and disclose potential conflicts.
Our Review Board
Our content is carefully reviewed by experienced professionals to ensure accuracy and relevance.
- Qualified Experts: Each article is assessed by specialists with field-specific knowledge.
- Up-to-date Insights: We incorporate the latest research, trends, and standards.
- Commitment to Quality: Reviewers ensure clarity, correctness, and completeness.
Look for the expert-reviewed label to read content you can trust.