Skip to content

Modules

This page lists all available nf-core modules in the Microbiome Informatics (EBI-Metagenomics) repository.

antismash

antiSMASH allows the rapid genome-wide identification, annotation and analysis | of secondary metabolite biosynthesis gene clusters in bacterial and fungal genomes.

binette

Binette binning refinement tool

blast

  • blastn - Queries a BLAST DNA database
  • blastp - BLASTP (Basic Local Alignment Search Tool- Protein) compares an amino acid (protein) query sequence against a protein database
  • makeblastdb - Builds a BLAST database

bmtagger

bwamem2

  • index - Create BWA-mem2 index for reference genome
  • MEM - Map reads to reference genome

bwamem2decontnobams

Decontamination module using bwamem2 and samtools that generates fastq files on the fly

catpack

  • contigs - Taxonomic classification of long DNA sequences and metagenome assembled genomes (e.g. contigs, MAGs / bins).
  • prepare - Creates a CAT_pack database based on input FASTAs

checkm2

  • checkm2 - Rapid assessment of genome bin quality using machine learning
  • db - Download DB for checkm2

cmsearchtbloutdeoverlap

Perl script to remove lower scoring overlaps from cmsearch --tblout files.

colabfold

  • colabfoldbatch - Perform protein folding predictions with ColabFold (current version queries the ColabFold server -with its limitations)

combinedgenecaller

  • merge - MGnify merging script for combined gene caller. The merged output contains all the gene predictions from Pyrodigal, along with genes predicted by FragGeneScanRS that do not overlap with any Pyrodigal gene. If mask file is provided, it masks (removes) genes that overlap with regions from a masking file.

crisprcasfinder

crisprcasfinder

cutadapt

Trim adapters and primers from sequencing reads

dada2

Infer Amplicon Sequence Variants (ASVs) from amplicon reads using DADA2 package

dbcan

  • dbcan - CAZyme annotation of proteins
  • dbcandb - Download and decompress the dbCAN reference database

deeptmhmm

A Deep Learning Model for Transmembrane Topology Prediction and Classification

diamond

  • blastp - Queries a DIAMOND database using blastp mode
  • makedb - Builds a DIAMOND database

downloadfromfire

Downloads files from EBI FIRE S3 storage using FTP paths

dram

  • distill - Produces summary files with DRAM distill

easel

  • eslsfetch - Extract fasta sequences by name from a cmsearchdeoverlap result

eggnogmapper

Fast genome-wide functional annotation through orthology assignment.

extractcoords

Process output from easel-sfetch to extract SSU and LSU sequences

fastp

Perform adapter/quality trimming on sequencing reads

fastqsuffixheadercheck

Sanity check for FASTQ suffixes and headers

fastqutils

fastq_utils is a set of Linux utilities to validate and manipulate fastq files.

fetchtool

  • assembly - Microbiome Informatics ENA fetch tool

filterpaf

Module that uses awk to filter alignments in a PAF file based on query coverage and percentage identity (PID).

fraggenescan

FragGeneScan is an application for finding (fragmented) genes in short reads. It can also be applied to predict prokaryotic genes in incomplete assemblies or complete genomes.

fraggenescanrs

FragGeneScanRs: faster gene prediction for short reads

generategaf

Script that generates a GO Annotation File (GAF) out of an InterProScan result tsv file.

genomeproperties

Genome properties is an annotation system whereby functional attributes can be assigned to a genome, based on the presence of a defined set of protein signatures within that genome.

hhsuite

  • buildhhdb - create an HH database to use for hhblits searches

hifiadapterfilt

Convert .bam to .fastq and remove reads with remnant PacBio adapter sequences

infernal

  • cmscan - RNA secondary structure/sequence profiles for homology search and alignment
  • cmsearch - RNA secondary structure/sequence profiles for homology search and alignment

interproscan

Produces protein annotations and predictions from a FASTA file

kegg-pathways-completeness

This tool computes the completeness of each KEGG pathway module for given set of KEGG orthologues (KOs) based on their presence/absence.

krona

  • ktimporttext - Creates a Krona chart from text files listing quantities and lineages.

librarystrategycheck

Uses base-conservation vectors to assess whether a run is AMPLICON or not

mapseq

Perform taxonomic classification of rRNA reads using reference databases

mapseq2biom

Process MAPseq output into biom and krona-txt formats

mgnifypipelinestoolkit

  • kronatxtfromcatclassification - Use diamond output file to create a table with Rhea and CHEBI reaction annotation for every protein
  • rheachebiannotation - Use diamond output file to create a table with Rhea and CHEBI reaction annotation for every protein
  • summarisegoslims - Script that generates counts from an InterProScan output and a GO Annotation File (GAF) for both GO terms and GO-Slim terms.

minimap2

  • align - A versatile pairwise aligner for genomic and spliced nucleotide sequences

owltools

OWLTools is convenience java API on top of the OWL API, used here for mapping GO terms to GO-slims

pimento

  • generatebcv - Generate Base-Conservation Vectors (BCV) in a stepwise and windowed manner for a fastq file.

prodigal

Prodigal (Prokaryotic Dynamic Programming Genefinding Algorithm) is a microbial (bacterial and archaeal) gene finding program

proovframe

  • fix - frame-shift correction for long read (meta)genomics - fix frameshifts in reads
  • map - frame-shift correction for long read (meta)genomics - maps proteins to reads

pyrodigal

Pyrodigal is a Python module that provides bindings to Prodigal, a fast, reliable protein-coding gene prediction for prokaryotic genomes.

samtools

  • bam2fq - The module uses bam2fq method from samtools to convert a SAM, BAM or CRAM file to FASTQ format

sanntis

Runs SanntiS to identify biosynthetic gene clusters.

seqfu

  • check - Evaluates the integrity of DNA FASTQ files

seqkit

  • grep - Select sequences from a large file based on name/ID

seqtk

  • seq - Common transformation operations on FASTA or FASTQ files.

taxonkit

  • reformat - Reformat lineage in canonical ranks