Modules¶
This page lists all available nf-core modules in the Microbiome Informatics (EBI-Metagenomics) repository.
antismash¶
antiSMASH allows the rapid genome-wide identification, annotation and analysis | of secondary metabolite biosynthesis gene clusters in bacterial and fungal genomes.
binette¶
Binette binning refinement tool
blast¶
- blastn - Queries a BLAST DNA database
- blastp - BLASTP (Basic Local Alignment Search Tool- Protein) compares an amino acid (protein) query sequence against a protein database
- makeblastdb - Builds a BLAST database
bmtagger¶
- bmtagger - Bmtagger tools
- indexreference - Create indexed reference DB for bmtagger
bwamem2¶
bwamem2decontnobams¶
Decontamination module using bwamem2 and samtools that generates fastq files on the fly
catpack¶
- contigs - Taxonomic classification of long DNA sequences and metagenome assembled genomes (e.g. contigs, MAGs / bins).
- prepare - Creates a CAT_pack database based on input FASTAs
checkm2¶
- checkm2 - Rapid assessment of genome bin quality using machine learning
- db - Download DB for checkm2
cmsearchtbloutdeoverlap¶
Perl script to remove lower scoring overlaps from cmsearch --tblout files.
colabfold¶
- colabfoldbatch - Perform protein folding predictions with ColabFold (current version queries the ColabFold server -with its limitations)
combinedgenecaller¶
- merge - MGnify merging script for combined gene caller. The merged output contains all the gene predictions from Pyrodigal, along with genes predicted by FragGeneScanRS that do not overlap with any Pyrodigal gene. If mask file is provided, it masks (removes) genes that overlap with regions from a masking file.
crisprcasfinder¶
crisprcasfinder
cutadapt¶
Trim adapters and primers from sequencing reads
dada2¶
Infer Amplicon Sequence Variants (ASVs) from amplicon reads using DADA2 package
dbcan¶
- dbcan - CAZyme annotation of proteins
- dbcandb - Download and decompress the dbCAN reference database
deeptmhmm¶
A Deep Learning Model for Transmembrane Topology Prediction and Classification
diamond¶
downloadfromfire¶
Downloads files from EBI FIRE S3 storage using FTP paths
dram¶
- distill - Produces summary files with DRAM distill
easel¶
- eslsfetch - Extract fasta sequences by name from a cmsearchdeoverlap result
eggnogmapper¶
Fast genome-wide functional annotation through orthology assignment.
extractcoords¶
Process output from easel-sfetch to extract SSU and LSU sequences
fastp¶
Perform adapter/quality trimming on sequencing reads
fastqsuffixheadercheck¶
Sanity check for FASTQ suffixes and headers
fastqutils¶
fastq_utils is a set of Linux utilities to validate and manipulate fastq files.
fetchtool¶
- assembly - Microbiome Informatics ENA fetch tool
filterpaf¶
Module that uses awk to filter alignments in a PAF file based on query coverage and percentage identity (PID).
fraggenescan¶
FragGeneScan is an application for finding (fragmented) genes in short reads. It can also be applied to predict prokaryotic genes in incomplete assemblies or complete genomes.
fraggenescanrs¶
FragGeneScanRs: faster gene prediction for short reads
generategaf¶
Script that generates a GO Annotation File (GAF) out of an InterProScan result tsv file.
genomeproperties¶
Genome properties is an annotation system whereby functional attributes can be assigned to a genome, based on the presence of a defined set of protein signatures within that genome.
hhsuite¶
- buildhhdb - create an HH database to use for hhblits searches
hifiadapterfilt¶
Convert .bam to .fastq and remove reads with remnant PacBio adapter sequences
infernal¶
- cmscan - RNA secondary structure/sequence profiles for homology search and alignment
- cmsearch - RNA secondary structure/sequence profiles for homology search and alignment
interproscan¶
Produces protein annotations and predictions from a FASTA file
kegg-pathways-completeness¶
This tool computes the completeness of each KEGG pathway module for given set of KEGG orthologues (KOs) based on their presence/absence.
krona¶
- ktimporttext - Creates a Krona chart from text files listing quantities and lineages.
librarystrategycheck¶
Uses base-conservation vectors to assess whether a run is AMPLICON or not
mapseq¶
Perform taxonomic classification of rRNA reads using reference databases
mapseq2biom¶
Process MAPseq output into biom and krona-txt formats
mgnifypipelinestoolkit¶
- kronatxtfromcatclassification - Use diamond output file to create a table with Rhea and CHEBI reaction annotation for every protein
- rheachebiannotation - Use diamond output file to create a table with Rhea and CHEBI reaction annotation for every protein
- summarisegoslims - Script that generates counts from an InterProScan output and a GO Annotation File (GAF) for both GO terms and GO-Slim terms.
minimap2¶
- align - A versatile pairwise aligner for genomic and spliced nucleotide sequences
owltools¶
OWLTools is convenience java API on top of the OWL API, used here for mapping GO terms to GO-slims
pimento¶
- generatebcv - Generate Base-Conservation Vectors (BCV) in a stepwise and windowed manner for a fastq file.
prodigal¶
Prodigal (Prokaryotic Dynamic Programming Genefinding Algorithm) is a microbial (bacterial and archaeal) gene finding program
proovframe¶
- fix - frame-shift correction for long read (meta)genomics - fix frameshifts in reads
- map - frame-shift correction for long read (meta)genomics - maps proteins to reads
pyrodigal¶
Pyrodigal is a Python module that provides bindings to Prodigal, a fast, reliable protein-coding gene prediction for prokaryotic genomes.
samtools¶
- bam2fq - The module uses bam2fq method from samtools to convert a SAM, BAM or CRAM file to FASTQ format
sanntis¶
Runs SanntiS to identify biosynthetic gene clusters.
seqfu¶
- check - Evaluates the integrity of DNA FASTQ files
seqkit¶
- grep - Select sequences from a large file based on name/ID
seqtk¶
- seq - Common transformation operations on FASTA or FASTQ files.
taxonkit¶
- reformat - Reformat lineage in canonical ranks