Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

bwameth.py Drop-In Mapping

bwa-mem3 --meth is designed to produce output that is equivalent to the bwameth.py pipeline for the standard paired-end case. This page explains what changes between the two approaches and what stays the same.

Command comparison

bwameth.py pipeline (multi-step)

# Step 1: build a doubled reference with bwameth.py
bwameth.py index ref.fa                # writes ref.fa.bwameth.c2t + bwa-mem2 FMI

# Step 2: align (bwameth.py converts reads, calls bwa-mem2, post-processes)
bwameth.py map --bwa-mem2 -t 16 ref.fa R1.fq.gz R2.fq.gz \
  | samtools sort -o out.bam
samtools index out.bam

bwa-mem3 –meth (single binary)

# Step 1: build the doubled reference with bwa-mem3
bwa-mem3 index --meth ref.fa           # same ref.fa.bwameth.c2t layout as bwameth.py

# Step 2: align (inline c2t conversion + post-processing, no Python)
bwa-mem3 mem --meth -t 16 ref.fa R1.fq.gz R2.fq.gz \
  | samtools sort -o out.bam
samtools index out.bam

The index files produced by bwa-mem3 index --meth and bwameth.py index are identical in layout: the same ref.fa.bwameth.c2t doubled-reference FASTA followed by the bwa-mem2 FM-index files (.bwt.2bit.64, .0123, .pac, .amb, .ann).

What is gained

No Python or bwameth.py dependency. The entire pipeline — read conversion, alignment, and BAM post-processing — runs inside a single bwa-mem3 process. This simplifies deployment: one binary, no virtual environment, no version pinning of bwameth.py.

No intermediate files. bwameth.py writes a converted FASTQ (or pipes it) before handing off to the aligner. bwa-mem3 --meth performs the C→T / G→A conversion in-memory on each read batch before passing it to the alignment kernel. No temporary FASTQ is written and no extra pipe stage is needed.

Inline BAM post-processing. Header rewriting, YD:Z: tagging, chimera QC, and QC-fail propagation all happen inside the same process and the same pass over the alignments. There is no separate post-processing step. Output is written as uncompressed BAM (wb0) — a near-zero-cost format that downstream samtools sort reads natively.

Same flag defaults. --meth applies -B 2 -L 10 -U 100 -T 40 -CM automatically, matching bwameth.py’s default scoring. All parameters can be overridden.

What stays the same

The output BAM is field-compatible with bwameth.py output for the standard methylation tag set, flags, and SEQ representation (the @PG provenance line intentionally differs — see below):

Fieldbwameth.pybwa-mem3 –meth
@SQ headersOne per real chromosomeOne per real chromosome
YS:Z:Pre-c2t original sequenceSame
YC:Z:Conversion direction (CT or GA)Same
YD:Z:Strand (f or r)Same
@PGID:bwamethID:bwa-mem3-meth
Chimera QC thresholdLongest M < 44% of readSame (44%)
Chimera QC flags0x200, clear 0x2, MAPQ ≤ 1Same
SEQ fieldPre-c2t bases (RC-flipped when is_rev)Same

The @PG ID: is intentionally different so provenance is unambiguous. All downstream tools that rely on YS:Z:, YC:Z:, YD:Z:, and the QC flags behave identically.

Info — End-to-end regression coverage

PR #13 includes a three-layer regression test that verifies 100% chrom+pos match, 100% CIGAR match, and byte-identical SEQ across 92,684 paired-end records compared to a bwameth.py reference run.

When to prefer bwameth.py

If your workflow requires bwameth.py-specific features (e.g. bwameth.py markduplicates or non-standard bwameth.py post-processors), continue using bwameth.py. bwa-mem3 --meth targets the indexing + alignment + standard post-processing path only.


See also: Overview · Conversion details · SAM tags: YS, YC, YD · Chimera QC and header rewriting · Related Projects: bwameth.py