Modules¶

This page lists all available nf-core modules in the Microbiome Informatics (EBI-Metagenomics) repository.

amrintegrator ¶

Module that integrates the outputs of ARG annotation tools into a single GFF file

antismash¶

antismash - antiSMASH allows the rapid genome-wide identification, annotation and analysis | of secondary metabolite biosynthesis gene clusters in bacterial and fungal genomes.
json2gff - Transforms antismash json output file into GFF format

bbmap¶

standardise - De-interleave interleaved paired-end reads and standardize FASTQ format using BBMap's reformat.sh tool.

binette ¶

Binette binning refinement tool

blast¶

blastn - Queries a BLAST DNA database
blastp - BLASTP (Basic Local Alignment Search Tool- Protein) compares an amino acid (protein) query sequence against a protein database
makeblastdb - Builds a BLAST database

bmtagger¶

bmtagger - Bmtagger tools
indexreference - Create indexed reference DB for bmtagger

bwamem2¶

index - Create BWA-mem2 index for reference genome

bwamem2decontnobams ¶

Decontamination module using bwamem2 and samtools that generates fastq files on the fly

catpack¶

contigs - Taxonomic classification of long DNA sequences and metagenome assembled genomes (e.g. contigs, MAGs / bins).
prepare - Creates a CAT_pack database based on input FASTAs

checkm2¶

checkm2 - Rapid assessment of genome bin quality using machine learning
db - Download DB for checkm2

cmsearchtbloutdeoverlap ¶

Perl script to remove lower scoring overlaps from cmsearch --tblout files.

colabfold¶

colabfoldbatch - Perform protein folding predictions with ColabFold (current version queries the ColabFold server -with its limitations)

combinedgenecaller¶

merge - MGnify merging script for combined gene caller. The merged output contains all the gene predictions from Pyrodigal, along with genes predicted by FragGeneScanRS that do not overlap with any Pyrodigal gene. If mask file is provided, it masks (removes) genes that overlap with regions from a masking file.

crisprcasfinder ¶

crisprcasfinder

cutadapt ¶

Trim adapters and primers from sequencing reads

dada2 ¶

Infer Amplicon Sequence Variants (ASVs) from amplicon reads using DADA2 package

dbcan¶

dbcan - CAZyme annotation of proteins
dbcandb - Download and decompress the dbCAN reference database

deeptmhmm ¶

A Deep Learning Model for Transmembrane Topology Prediction and Classification

diamond¶

blastp - Queries a DIAMOND database using blastp mode
makedb - Builds a DIAMOND database

downloadfromfire ¶

Downloads files from EBI FIRE S3 storage using FTP paths

dram¶

distill - Produces summary files with DRAM distill

easel¶

eslsfetch - Extract fasta sequences by name from a cmsearchdeoverlap result

eggnogmapper ¶

Fast genome-wide functional annotation through orthology assignment.

extractcoords ¶

Process output from easel-sfetch to extract SSU and LSU sequences

fastqsuffixheadercheck ¶

Sanity check for FASTQ suffixes and headers

fastqutils ¶

fastq_utils is a set of Linux utilities to validate and manipulate fastq files.

fetchtool¶

assembly - Microbiome Informatics ENA fetch tool

filterpaf ¶

Module that uses awk to filter alignments in a PAF file based on query coverage and percentage identity (PID).

fraggenescan ¶

FragGeneScan is an application for finding (fragmented) genes in short reads. It can also be applied to predict prokaryotic genes in incomplete assemblies or complete genomes.

fraggenescanrs ¶

FragGeneScanRs: faster gene prediction for short reads

generategaf ¶

Script that generates a GO Annotation File (GAF) out of an InterProScan result tsv file.

genomeproperties ¶

Genome properties is an annotation system whereby functional attributes can be assigned to a genome, based on the presence of a defined set of protein signatures within that genome.

hhsuite¶

buildhhdb - create an HH database to use for hhblits searches

hifiadapterfilt ¶

Convert .bam to .fastq and remove reads with remnant PacBio adapter sequences

infernal¶

cmscan - RNA secondary structure/sequence profiles for homology search and alignment
cmsearch - RNA secondary structure/sequence profiles for homology search and alignment

interproscan ¶

Produces protein annotations and predictions from a FASTA file

kegg-pathways-completeness ¶

This tool computes the completeness of each KEGG pathway module for given set of KEGG orthologues (KOs) based on their presence/absence.

krona¶

ktimporttext - Creates a Krona chart from text files listing quantities and lineages.

mgnifypipelinestoolkit¶

kronatxtfromcatclassification - Use diamond output file to create a table with Rhea and CHEBI reaction annotation for every protein
rheachebiannotation - Use diamond output file to create a table with Rhea and CHEBI reaction annotation for every protein
summarisegoslims - Script that generates counts from an InterProScan output and a GO Annotation File (GAF) for both GO terms and GO-Slim terms.

minimap2¶

align - A versatile pairwise aligner for genomic and spliced nucleotide sequences

owltools ¶

OWLTools is convenience java API on top of the OWL API, used here for mapping GO terms to GO-slims

pathofact2¶

downloaddata - Database downloader from zenodo. Returns the pathofact models database
extractfasta - Extract the fasta file of the proteins predicted by Pathofact2 for annotation using rpsblast vs CDD
integrator - Module to integrate Pathofact2 results with CDD annotations into a single GFF file
toxins - Pathofact2 - Machine Learning tool to predict toxins in protein sequences
virulence - Pathofact2 - Machine Learning tool to predict virulence factors in protein sequences

pimento¶

generatebcv - Generate Base-Conservation Vectors (BCV) in a stepwise and windowed manner for a fastq file.

prodigal ¶

Prodigal (Prokaryotic Dynamic Programming Genefinding Algorithm) is a microbial (bacterial and archaeal) gene finding program

rgi¶

downloaddb - Module to download and decompress CARD database for RGI tool

sanntis ¶

Runs SanntiS to identify biosynthetic gene clusters.

seqfu¶

check - Evaluates the integrity of DNA FASTQ files

seqkit¶

grep - Select sequences from a large file based on name/ID

taxonkit¶

reformat - Reformat lineage in canonical ranks