🧬 N. gonorrhoeae Comprehensive Analysis Pipeline

Comprehensive, Flexible Bioinformatics Pipeline for Genomic Surveillance and AMR Detection

Pipeline Statistics

16 Workflows
55+ Modules/Processes
11 Analysis Types
4 QC Stages

Pipeline Overview

This Nextflow pipeline provides comprehensive genomic analysis for Neisseria gonorrhoeae outbreak investigation and surveillance. It processes raw sequencing reads through quality control, assembly, variant calling, phylogenetic analysis, and antimicrobial resistance (AMR) profiling to generate actionable clinical insights.

Key Capabilities

Main Workflows

Downsample Reads Downsample Reads to desired coverage and reduce run time
Reads QC FastP-based quality filtering and trimming
Assembly SPAdes assembly with statistics generation
Assembly QC Coverage and assembly quality checks
MASH Pre-screen (Reads) Tiered contamination detection on raw reads using 4-tier screening: Neisseria genomes, plasmids, respiratory pathogens, and common contaminants
MASH Post-screen (Assembly) High-resolution species typing and plasmid detection on assembled contigs for confirmation and detailed characterization
Variant Calling Snippy with caching, core alignment, and Gubbins
Phylogeny RAxML-NG phylogenetic tree construction
Outbreak Detection SNP distance-based cluster identification
Recombination Functional annotation of recombinant regions
MLST Multi-locus sequence typing and clustering
AMR Profiler Chromosomal and HGT resistance detection
AMR Typing NG-MAST/NG-STAR strain typing
Clinical Treatment recommendations and priority classification
Downsampling Optional read depth normalization
Final QC Post-assembly comprehensive quality filtering
Reports Comprehensive manifest generation

Pipeline Architecture

The pipeline is organized into 16 main workflows (including MASH pre-screen and post-screen) and 2 subworkflows that orchestrate over 55 specialized processes. Each workflow is designed to be modular and can be enabled/disabled via command-line parameters, allowing flexible execution based on analysis needs.

Quality Control Strategy

The pipeline implements a three-stage QC approach:

MASH Screening: Tiered Contamination Detection

The pipeline implements a sophisticated two-stage MASH screening system for comprehensive quality control and contamination detection:

Stage 1: MASH Pre-screen (Raw Reads)

Analyzes raw sequencing reads using a 4-tier screening approach:

Detection Thresholds:

Output: QC status (PASS/WARN/FAIL) with detailed contamination reports and species identification

Filtering: Optional automatic exclusion of failed samples from downstream analysis

Stage 2: MASH Post-screen (Assembly)

High-resolution characterization of assembled genomes:

Output: Consolidated HTML report with species typing, plasmid content, and quality metrics

Clinical Significance

Caching and Performance

The pipeline implements intelligent caching for computationally expensive operations, particularly Snippy variant calling. It supports five separate cache directories:

Smart cache filtering ensures downsampled samples are re-analyzed while leveraging existing results for unchanged samples.

Technology Stack

Built with: