Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Output: SAM/BAM, headers, tags

bwa-mem3 writes output in either SAM (default) or BAM (--bam) format. This page covers the header structure and every non-standard SAM tag emitted by bwa-mem3.

Output format

By default, bwa-mem3 mem writes SAM to stdout. Pass --bam (or --bam=N for a specific compression level) to write BAM. Level 0 (uncompressed) is the default when --bam is given without an argument, which is optimal when piping to a downstream samtools sort.

# SAM (default)
bwa-mem3 mem -t 16 ref.fa R1.fq.gz R2.fq.gz > out.sam

# Uncompressed BAM — best for piping
bwa-mem3 mem --bam -t 16 ref.fa R1.fq.gz R2.fq.gz | samtools sort -@ 8 -o out.bam -

# Compressed BAM — useful when the output is the final file
bwa-mem3 mem --bam=6 -t 16 ref.fa R1.fq.gz R2.fq.gz > out.bam

SAM header

@HD

A default @HD VN:1.6 SO:unsorted line is emitted unless the user supplies one via -H. The sort order is unsorted because bwa-mem3 writes records in input read order; downstream sorting is always a separate step.

@SQ

One @SQ line is written per reference sequence, with the sequence name (SN:) and length (LN:) derived from the FM-index. If the index was built with a .dict or .hdr file that supplies @SQ records, those records are used instead of the auto-generated ones.

In methylation mode (--meth), the doubled reference contains sequences with an f or r prefix in their names. The inline BAM post-processor collapses these back to canonical chromosome names so that the output @SQ lines match a standard non-methylation alignment. See Chimera QC and header rewriting.

@PG

One @PG entry is written in standard mode:

IDDescription
bwa-mem3The alignment step. VN: is the bwa-mem3 version string; CL: is the full command line.

In methylation mode (--meth), a second @PG entry is appended:

IDDescription
bwa-mem3-methThe inline post-processor. VN: carries the version with -meth suffix; CL: is the full command line.

The bwa-mem3-meth entry follows immediately after the bwa-mem3 entry and records the post-processing step as a distinct pipeline node, matching the convention of separate-tool pipelines.

Tags emitted by bwa-mem3

Standard tags

bwa-mem3 emits the same standard tags as bwa-mem2 (NM:i, MD:Z, AS:i, XS:i, SA:Z, RG:Z, XA:Z, MC:Z, etc.). These are documented in the SAM specification and are not described further here.

bwa-mem3 additionally emits MQ:i on paired-end records — the mate’s mapping quality, set alongside MC:Z (the mate’s CIGAR) so callers that key off the mate’s MAPQ don’t need to look at the mate record. Both SAM and --bam output paths emit it. Backported from lh3/bwa PR #330 in fg-labs PR #35.

The XA:Z field set widens from chr,pos,CIGAR,NM to chr,pos,CIGAR,NM,score,mapq when -u (a.k.a. the upstream “XB” toggle) is passed; the tag name itself remains XA:Z for downstream compatibility. Tools that parse XA:Z need to be aware of the two possible field widths.

HN:i — total alignment hit count

HN:i:<count>

The total number of primary alignments (both reported and suppressed) that the aligner found for this read, before the -h supplementary cap is applied. Useful for distinguishing “uniquely mapped” from “multi-mapped” reads without relying solely on MAPQ.

HN:i is emitted on the primary alignment record only.

Methylation-only tags

The following Bismark-compatible tags are emitted only when --meth is active. See SAM tags: XR, XG, XM for the full per-tag reference, including the XM:Z character alphabet and the XG:Z strand-pick semantics.

TagTypeDescription
XR:ZstringRead conversion direction: CT (R1 / SE) or GA (R2)
XG:ZstringGenome strand of the alignment: CT (OT) or GA (OB)
XM:ZstringPer-base methylation call string (length = SEQ)

The bwameth-style YS:Z / YC:Z tags exist only as an internal carrier on bseq1_t.comment for SEQ restoration and XR:Z derivation; they are suppressed at BAM emit and never appear in output. The bwameth YD:Z strand tag has been replaced by Bismark XG:Z and is not emitted.

MAPQ semantics

MAPQ semantics are inherited from bwa-mem2 and follow the same scoring model. In methylation mode, alignments identified as chimeras (longest M/=/X run covering less than 44% of the read length) have their MAPQ capped at 1 and the 0x200 (QC fail) flag set. See Chimera QC and header rewriting.


See also: Aligning short reads (mem) · Methylation Reference: SAM tags · Methylation Reference: post-processing · CLI Reference: mem · Best Practices: output format