Quick start: methylation alignment
bwa-mem3 supports bisulfite-converted (WGBS/RRBS/EM-seq) read alignment through a single --meth
flag on both index and mem. No Python interpreter, no piped preprocessor, and no separate
postprocessing step are required.
Note — Drop-in replacement for bwameth.py
bwa-mem3 with
--methis a single-binary drop-in replacement for thebwameth.pypipeline. The output BAM is byte-compatible for the standard tags used by methylation callers (Bismark, MethylDackel, PileOMeth, etc.).
Index the reference for methylation
Build the c2t doubled reference once:
bwa-mem3 index --meth ref.fa
This writes two additional files next to the standard index:
| File | Description |
|---|---|
ref.fa.bwameth.c2t | C→T converted reference (forward strand) with G→A reverse complement interleaved |
ref.fa.bwameth.c2t.* | FM-index files for the c2t reference |
The c2t index is separate from the standard index produced by bwa-mem3 index ref.fa. You need
both if you intend to run standard and methylation alignments against the same reference.
Align bisulfite-converted reads
bwa-mem3 mem --meth -t 16 ref.fa R1.fq.gz R2.fq.gz \
| samtools sort -o out.bam
samtools index out.bam
Pass the original (unconverted) reference path, not the .bwameth.c2t file. bwa-mem3
auto-appends .bwameth.c2t to the reference path when --meth is active.
What --meth does
--meth activates a pipeline of in-process transformations that would otherwise require
external tools:
-
Inline c2t read conversion. R1 reads have every
Cconverted toTbefore alignment; R2 reads have everyGconverted toA. The original unconverted sequence is preserved in theYS:Z:SAM tag. The conversion direction for each read is recorded inYC:Z:(valueCTorGA), matching the bwameth.py convention. -
bwameth.py-equivalent scoring defaults.
--methsets-B 2 -L 10 -U 100 -T 40 -CMautomatically. These match the defaults used by bwameth.py and are optimized for bisulfite-converted reads where C→T mismatches carry no penalty. Any of these values can be overridden on the command line. -
Inline BAM post-processing. After alignment, bwa-mem3 rewrites the SAM stream in-process:
@SQheaders withf/rprefixes (e.g.fchr1,rchr1) are collapsed back to one entry per real chromosome (chr1). Read-levelRNAMEfields are rewritten to match.- Each mapped record gains a
YD:Z:tag (ffor forward-strand,rfor reverse-strand) indicating which converted strand the read aligned to. - Chimera QC: reads whose longest
M/=/Xrun is less than 44% of the read length are flagged0x200(QC-fail), have flag0x2(proper pair) cleared, and have MAPQ capped at 1. - Pair-level QC-fail propagation: if one mate is QC-failed, the other mate is also flagged.
- A
@PG ID:bwa-mem3-methprogram record is appended to the header.
-
Uncompressed BAM output. The post-processed stream is written as uncompressed BAM (
wb0) rather than SAM text. This eliminates text serialization overhead and allows downstreamsamtools sortto read BAM natively. The stream is still fully readable by any htslib-based tool.
For full details on each tag, the chimera QC heuristic, and the --set-as-failed and
--do-not-penalize-chimeras flags, see the Methylation Reference.
See also: Methylation Reference — Overview · Methylation Reference — SAM tags · Best Practices — Methylation defaults · CLI Reference — mem