Chimera QC and Header Rewriting
After the alignment kernel produces mem_aln_t records, bwa-mem3 --meth
applies a set of post-processing steps before writing BAM output. These steps
are implemented in src/meth_bam.cpp and run in the same process, in the same
pass over the aligned records.
@SQ header consolidation
The doubled reference (ref.fa.bwameth.c2t) contains two contigs for each
chromosome:
fchr1,fchr2, … — C→T projections of each chromosome.rchr1,rchr2, … — G→A projections of each chromosome.
If the raw alignment header were written directly, every downstream tool would
see twice as many sequences as there are real chromosomes, with unfamiliar
f/r-prefixed names. meth_bam_writer_open instead builds a consolidated
header using the meth_chrom_map_t:
meth_chrom_map_build_from_bnsiterates overbns->annsand strips the leadingf/rfrom each contig name.- The first contig with a given stripped name registers that name in the output list; subsequent contigs with the same stripped name map to the same output index.
- The BAM
@SQlines are written from the consolidated list — oneSN:per real chromosome.
RNAME, RNEXT, and SA/XA tag contig references in every record are rewritten
through cmap->out_tid and cmap->output_names so they reference the
consolidated names. The mapping from internal (doubled-ref) contig index to
output contig index is cmap->out_tid[p.rid].
Note — TLEN computation uses consolidated TIDs
Template length (TLEN) is computed using the consolidated output TIDs, not the internal
p.ridvalues. Two mates that rescue ontofchr1andrchr1respectively both map to outputchr1, so TLEN is reported as a non-zero distance rather than zero (which would happen if the mismatched internal TIDs were used).
Chimera QC heuristic (opt-in)
bwameth.py applies a heuristic to flag reads that look like chimeric fragments: if the longest contiguous alignment run (sum of M/=/X CIGAR operations) covers less than 44 % of the read length, the read is considered a potential chimera. Bismark does not apply this kind of heuristic.
bwa-mem3 --meth makes this opt-in via --chimera-qc (default off, so
the runtime posture matches Bismark). When enabled, the check inside
meth_mem_aln_to_bam does:
if (100 * longest_M_run < 44 * l_seq):
flag |= 0x200 # set QC fail
flag &= ~0x2 # clear proper pair
mapq = min(mapq, 1)
The threshold constant is MIN_LONGEST_M_PCT = 44 (defined at the top
of src/meth_bam.cpp). The longest run is computed by
cigar_longest_m_mem from src/cigar_util.cpp, which counts M, =,
and X operations.
The chimera heuristic is only applied to mapped records (!(flag & 0x4) && direction != 0). Unmapped records are not touched.
See Flags for when to use --chimera-qc (PBAT / scBS-Seq;
bwameth.py-equivalence runs).
--set-as-failed strand filtering
Before the chimera check, meth_mem_aln_to_bam checks whether
opt->meth_set_as_failed is set and matches the record’s strand direction:
if (meth_set_as_failed != 0 && meth_set_as_failed == direction):
flag |= 0x200
This unconditionally marks all alignments to the specified strand (f or r)
as QC-failed before chimera logic runs. The chimera check then applies on top
of the already-set fail flag.
Pair-level QC-fail propagation
Once per read group (all records sharing the same query name), after individual records have been processed:
meth_bam_group_propagate_qcfail(group, n)
This function scans all records in the group. If any record has 0x200 set, it
propagates that flag to every other record in the group and clears 0x2
(proper pair) on all of them. This ensures that a chimeric or strand-filtered
primary alignment also marks its supplementary alignments and the mate as
QC-failed, preventing inconsistent flag states in the output BAM.
@PG ID:bwa-mem3-meth insertion
meth_bam_writer_open appends a @PG line to the header after the original
bwa-mem3 @PG entry:
@PG ID:bwa-mem3-meth PN:bwa-mem3-meth VN:<version>-meth CL:<command line>
The <command line> field is the full bwa-mem3 mem --meth ... invocation with
embedded tab characters replaced by spaces (htslib does not permit literal tabs
in @PG CL: fields). This records the exact parameters used for provenance
and reproducibility.
Tip — Verifying the header
After alignment, confirm consolidation and provenance with:
samtools view -H out.bam | grep -E '^@SQ|^@PG'You should see one @SQ line per chromosome (no f/r prefixes) and both
@PG ID:bwa-mem3and@PG ID:bwa-mem3-methentries.
See also: Overview · SAM tags: XR, XG, XM · Flags: –set-as-failed, –chimera-qc · Conversion details · User Guide → Output: SAM/BAM, headers, tags