Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

SAM Tags: YS, YC, YD

bwa-mem3 --meth emits three methylation-specific auxiliary tags that carry the information downstream methylation callers need. Two of these (YS:Z: and YC:Z:) are set during FASTQ ingest and pass through the alignment kernel unchanged. The third (YD:Z:) is set during BAM post-processing based on the contig name of the alignment.

Tag reference

YS:Z: — original (pre-conversion) sequence

PropertyValue
TypeZ (NUL-terminated string)
LengthEqual to l_seq (full read length)
Set byFASTQ ingest (src/fastmap.cpp meth_mode block)
Emitted onAll records (mapped and unmapped)

YS:Z: holds the original base sequence of the read before the C→T or G→A conversion. The value is the ASCII string of bases as read from the FASTQ, in read order (not reverse-complemented).

This tag serves two purposes:

  1. SEQ restoration. meth_mem_aln_to_bam copies the YS:Z: payload back into the BAM SEQ field (with reverse-complement when is_rev is set) so that methylation callers see real cytosines. Without this restoration the SEQ field would show only Ts where Cs existed in the original read.

  2. Downstream inspection. Tools that need to examine the unconverted sequence independently of the BAM SEQ field can read YS:Z: directly.

Note — Format inside the comment buffer

Internally, the ingest code stores the value as YS:Z:<bases>\tYC:Z:<dir> starting at offset 0 of bseq1_t.comment. meth_mem_aln_to_bam locates the payload at comment + 5 (past the YS:Z: prefix). The two tags are always co-emitted in this order.

YC:Z: — conversion direction

PropertyValue
TypeZ (NUL-terminated string)
ValuesCT (R1, C→T) or GA (R2, G→A)
Set byFASTQ ingest (src/fastmap.cpp meth_mode block)
Emitted onAll records

YC:Z: records which conversion was applied to the read:

  • CT — C→T conversion applied; this is an R1 read (or a single-end read).
  • GA — G→A conversion applied; this is an R2 read.

bwameth.py uses YC:Z: for the same purpose and with the same values. Tools such as MethylDackel use YC:Z: to determine which cytosines to call as methylated. YC:Z:CT records are candidates for CpG methylation on the top strand; YC:Z:GA records are candidates on the bottom strand.

YD:Z: — strand hypothesis

PropertyValue
TypeZ (NUL-terminated string)
Valuesf (forward / top strand) or r (reverse / bottom strand)
Set bymeth_mem_aln_to_bam (src/meth_bam.cpp)
Emitted onMapped records only (not unmapped)

YD:Z: records which strand of the doubled reference the read aligned to. The value is derived from the f/r prefix on the internal contig name via the meth_chrom_map_t.direction array. Unmapped reads do not receive YD:Z:.

  • f — the read aligned to an f-prefixed contig (the C→T projection of the top strand).
  • r — the read aligned to an r-prefixed contig (the G→A projection of the bottom strand).

This tag is used by --set-as-failed (see Flags) and is also consumed by downstream methylation callers to confirm which strand each alignment supports.

Tag emission summary

TagRecordsSource
YS:Z:AllFASTQ ingest (comment buffer)
YC:Z:AllFASTQ ingest (comment buffer)
YD:Z:Mapped onlymeth_mem_aln_to_bam from chrom map

Tip — Checking tags with samtools

To inspect these tags on a BAM file:

samtools view out.bam | cut -f12- | grep -oP 'Y[SCD]:Z:[^\t]+'

Or use samtools view -H to confirm the @PG ID:bwa-mem3-meth entry is present and the @SQ lines are consolidated (no f/r prefixes).


See also: Overview · Conversion details · Chimera QC and header rewriting · Flags: –set-as-failed, –do-not-penalize-chimeras · User Guide → Output: SAM/BAM, headers, tags