Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Quick start: align paired-end FASTQs

This page walks through the two-command workflow: index the reference once, then align reads.

Index the reference

bwa-mem3 index ref.fa

This produces five index files alongside ref.fa:

FileDescription
ref.fa.bwt.2bit.64FM-index in 2-bit packed format
ref.fa.01232-bit packed reference sequence
ref.fa.ambAmbiguous base positions
ref.fa.annSequence name and length annotations
ref.fa.pacPacked 4-bit reference sequence

Indexing hg38 takes roughly 2-3 minutes and requires approximately 60 GB of peak disk space during creation (including temporary/intermediate files); the final FM-index stored on disk is roughly 28 GB. The index is read once per mem invocation; for workloads that align many samples, load it into shared memory first (see Quick start: shared-memory index).

Align paired-end reads

bwa-mem3 mem -t 16 ref.fa r1.fq.gz r2.fq.gz > out.sam

-t 16 sets the thread count to 16. bwa-mem3 scales well up to the number of physical CPU cores; hyperthreading provides diminishing returns above that point. See User Guide — Threading and resource use for recommendations at different core counts.

The default output is uncompressed SAM on stdout. To write compressed BAM directly, use the --bam flag:

bwa-mem3 mem --bam -t 16 ref.fa r1.fq.gz r2.fq.gz \
  | samtools sort -@ 8 -o out.bam -
samtools index out.bam

Tip — Prefer BAM output in production

Piping BAM (--bam) to samtools sort avoids the text formatting and parsing overhead of SAM on both sides of the pipe. For large cohorts this yields a measurable wall-clock reduction. See Best Practices — Output format for the recommended pipeline and a discussion of when SAM is still useful.

Read group tagging

For downstream tools that require a @RG header (most variant callers), pass -R:

bwa-mem3 mem -t 16 \
  -R '@RG\tID:sample1\tSM:sample1\tPL:ILLUMINA\tLB:lib1' \
  ref.fa r1.fq.gz r2.fq.gz > out.sam

The value is a tab-delimited string following BWA conventions. Every aligned record receives an RG:Z: tag matching the ID field of the read-group header.

Output tags

bwa-mem3 emits standard SAM tags plus the HN:i: tag introduced by the fork:

TagTypeDescription
NM:iintEdit distance to the reference
MD:ZstringMismatch and deletion string
AS:iintAlignment score
XS:iintSuboptimal alignment score
SA:ZstringSupplementary alignment chain
HN:iintTotal number of primary alignments (reported and suppressed) found for the read, before the -h supplementary cap is applied

For the methylation-specific tags (YS:Z, YC:Z, YD:Z), see Methylation Reference — SAM tags.


See also: Quick start: methylation alignment · Quick start: shared-memory index · User Guide — Aligning short reads · CLI Reference — mem