Anvi’o visualisations for refinement of Eukaryotic MAGs
toolsIn the coming months we will be expanding MGnify Genomes to include Eukaryotic genomes. Take an early look at our plans to enable users to explore Eukaryotic genomes using Anvi’o.
Eukaryotic metagenome-assembled genome (MAG) generation relies on assemblies of metagenomic data, typically performed by metaSPAdes (Nurk et al. (2017)) or MEGAHIT (Li et al. (2015)). These tools are not biased toward specific organisms, however compared to prokaryotes, eukaryotic genomes are large, have more compositional variation and contain more repetitive regions, meaning they are complex to both assemble (with many fragments) and to generate MAGs. For example, during binning we have found that a single eukaryotic genome can be split across two MAG bins. To alleviate some of these challenges, EukCC2 (Saary et al. (2022)) contains a bin-merging functionality that connects bins from the same taxa, using measures of genome completeness and having paired-end reads connecting the two bins to validate merging. Others have used co-assembly and co-binning of multiple samples to overcome this problem (Delmont et al. (2022)), but related samples are not always available, hence MGnify has adopted single-sample approaches. Nevertheless, the challenges posed by eukaryotic MAG generation afford the need for users to inspect the quality of our automatically generated eukaryotic MAGs.
As part of the BlueRemediomics project, we have been investigating how to leverage the Anvi’o (Eren et al. (2021)) interactive interface for visualising genomes and their derived sample composition. Here, we share our prototype of Anvi’o visualisations of MGnify-generated eukaryotic MAGs. The eukaryotic MAGs were generated for the same datasets used in the Marine v2.0 prokaryotic genome catalogue. Here the MAGs were generated for each study in the dataset, on a sample by sample basis. Subsequently, all the MAGs were de-replicated together to remove redundancy, resulting in 12 species-level clusters, 8 of which comprised more than one genome. These 8 species-representative genomes were taxonomically classified into two families of green algae from the order Mamiellales. For these 8 species clusters, we mapped the read sequences pertaining to each sample that produced a genome in the species cluster, onto the species-representative genome and compiled them into Anvi’o profiles.
Figure 1. Hierarchical clustering of sample reads mapped to contig regions in genomes visualised with Anvi’o. The images are generated from Anvi’o profiles compiled from samples which contributed to 8 genomes. Use arrows to scroll to the next image. Access the interactive version for one example genome.
Visual representation of the hierarchical clustering of contigs in a eukaryotic genome by sample composition can allow a user to identify contigs which map uniquely to a subset of samples - indicating potentially contaminated contigs. More information on refinement can be found in the Anvi’o blogs.
The Anvi’o profiles for all 8 marine eukaryotic genomes can be found on the MGnify FTP server. To visualise them you will need to install Docker and run the commands below in the directory with the desired input genome:
docker pull meren/anvio:8
# start container
docker run --rm -it -v `pwd`:`pwd` -w `pwd` -p 8080:8080 meren/anvio:8
# in container
anvi-interactive -p ${GENOME_NAME}_MERGED/PROFILE.db -c ${GENOME_NAME}.db
This Anvi’o feature is still in development, and we welcome any feedback and suggestions on the prototypes via our MGnify helpdesk or @MGnifyDB. We aim to provide such Anvi’o eukaryotic MAG visualisations alongside eukaryotic genome catalogues.