Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

mem

bwa-mem3 mem aligns short DNA reads against an indexed reference genome using the BWA-MEM algorithm. It accepts one or two FASTQ files (single-end or paired-end) and writes alignments to stdout in SAM or BAM format. It is the primary alignment subcommand; nearly all bwa-mem3 usage flows through it.

Synopsis

Usage: bwa-mem3 mem [options] <idxbase> <in1.fq> [in2.fq]
Options:
  Algorithm options:
    -o STR        Output SAM file name
    --bam[=N]     Emit BAM instead of SAM text. N=0 (default) = uncompressed;
                  1..9 = BGZF deflate levels. Writes to stdout; redirect with `>`.
    -t INT        number of threads [1]
    -k INT        minimum seed length [19]
    -w INT        band width for banded alignment [100]
    -d INT        off-diagonal X-dropoff [100]
    -r FLOAT      look for internal seeds inside a seed longer than {-k} * FLOAT [1.5]
    -y INT        seed occurrence for the 3rd round seeding [20]
    -c INT        skip seeds with more than INT occurrences [500]
    -D FLOAT      drop chains shorter than FLOAT fraction of the longest overlapping chain [0.50]
    -W INT        discard a chain if seeded bases shorter than INT [0]
    -m INT        perform at most INT rounds of mate rescues for each read [50]
    -S            skip mate rescue
    -P            skip pairing; mate rescue performed unless -S also in use
Scoring options:
   -A INT        score for a sequence match, which scales options -TdBOELU unless overridden [1]
   -B INT        penalty for a mismatch [4]
   -O INT[,INT]  gap open penalties for deletions and insertions [6,6]
   -E INT[,INT]  gap extension penalty; a gap of size k cost '{-O} + {-E}*k' [1,1]
   -L INT[,INT]  penalty for 5'- and 3'-end clipping [5,5]
   -U INT        penalty for an unpaired read pair [17]
Input/output options:
   -p            smart pairing (ignoring in2.fq)
   -R STR        read group header line such as '@RG\tID:foo\tSM:bar' [null]
   -H STR/FILE   insert STR to header if it starts with @; or insert lines in FILE [null]
   -j            treat ALT contigs as part of the primary assembly (i.e. ignore <idxbase>.alt file)
   -5            for split alignment, take the alignment with the smallest coordinate as primary
   -q            don't modify mapQ of supplementary alignments
   -K INT        process INT input bases in each batch regardless of nThreads (for reproducibility) []
   -v INT        verbose level: 1=error, 2=warning, 3=message, 4+=debugging [3]
   -T INT        minimum score to output [30]
   -h INT[,INT]  if there are <INT hits with score >80.00% of the max score, output all in XA [5,200]
   -z FLOAT      the fraction of the max score to use with -h [0.80]
   -u            output XB instead of XA; XB is XA with the alignment score and mapping quality added
   -a            output all alignments for SE or unpaired PE
   -C            append FASTA/FASTQ comment to SAM output
   -V            output the reference FASTA header in the XR tag
   -Y            use soft clipping for supplementary alignments
   -M            mark shorter split hits as secondary
   -I FLOAT[,FLOAT[,INT[,INT]]]
                 specify the mean, standard deviation (10% of the mean if absent), max
                 (4 sigma from the mean if absent) and min of the insert size distribution.
                 FR orientation only. [inferred]
Bisulfite (--meth) options:
   --meth        enable inline bwameth-style C→T/G→A read conversion + meth-aware BAM
                 emission. Implies --bam. Requires the reference to have been built
                 with `bwa-mem3 index --meth` (emits ref.fa.bwameth.c2t).
   --set-as-failed f|r
                 flag alignments to the matching strand ('f' or 'r') as QC-fail (0x200)
   --chimera-qc
                 enable the bwameth.py-style longest-match <44% chimera heuristic
                 (sets 0x200, clears 0x2, caps MAPQ at 1). Off by default; not in Bismark.
Supplementary MAPQ rescoring (fg-labs extension):
   --supp-rep-hard-cap INT
                 force MAPQ=0 for supplementary alignments whose chain contains any seed
                 with >=INT genome occurrences (i.e. the supp region is repetitive on its
                 own). 0 disables (default). Typical values 5-20; lower = more aggressive.
                 Primary MAPQ is unaffected.
Help:
   --help        print this help message and exit
Note: Please read the man page for detailed description of the command line and options.

Common usage

Paired-end alignment, 16 threads, SAM to stdout:

bwa-mem3 mem -t 16 ref.fa R1.fq.gz R2.fq.gz > out.sam

Paired-end alignment, emit uncompressed BAM, pipe directly to samtools sort:

bwa-mem3 mem --bam -t 16 ref.fa R1.fq.gz R2.fq.gz \
  | samtools sort -@ 8 -o out.bam -
samtools index out.bam

Paired-end methylation alignment with a read group header:

bwa-mem3 mem --meth -t 16 \
  -R '@RG\tID:lib1\tSM:sample1\tPL:ILLUMINA' \
  ref.fa R1.fq.gz R2.fq.gz \
  | samtools sort -o out.bam -

Flag reference

Input / output

-o STR — output file

Write output to STR instead of stdout. Honored for both SAM and --bam output; the path is opened lazily so BAM mode can hand it to htslib instead of truncating it as a SAM-text file. Stdout redirection (>) remains an alternative.

--bam[=N] — emit BAM

Emit BAM instead of SAM. N controls BGZF compression: 0 (default when --bam is used without =) writes uncompressed BAM, which costs almost no CPU and is the recommended mode for piping to samtools sort. Values 19 select increasing BGZF deflate levels; use --bam=6 or --bam=9 only when writing directly to final storage without a downstream sort step.

Tip — Prefer –bam for production pipelines

Uncompressed BAM (--bam or --bam=0) eliminates the text-formatting cost on the aligner side and the text-parse cost on the samtools sort side. For any pipeline that immediately sorts or processes the output, this is faster than SAM at no quality cost.

-R STR — read group header

Injects a @RG header line and tags every alignment with RG:Z:<ID>. The value is a tab-separated @RG line with literal \t escapes, for example:

-R '@RG\tID:run1\tSM:HG001\tPL:ILLUMINA\tLB:lib1'

bwa-mem3 escapes any literal tab characters inside -R values before writing them to the @PG CL: field, preventing header corruption (fix for issue #45).

-H STR/FILE — extra header lines

If STR begins with @, it is injected verbatim as a header line. Otherwise STR is treated as a path and every line in the file is injected. Useful for adding @CO comments or custom @RG / @PG entries.

-p — smart pairing

Reads interleaved paired-end data from a single FASTQ file (in1.fq) rather than two separate files. The second positional argument (in2.fq) is ignored.

-5 — leftmost-coordinate primary

For split alignments, designates the alignment with the smallest genomic coordinate as primary, rather than the longest alignment. Useful for some downstream tools that expect the leftmost alignment to be primary.

-q — preserve supplementary MAPQ

By default, bwa-mem3 may downgrade the MAPQ of supplementary alignments. -q suppresses that adjustment.

-K INT — fixed batch size

Forces each thread batch to process exactly INT input bases regardless of the number of threads. Useful when you need bit-for-bit reproducible output across runs with different -t values: fix -K to the same value and the output is deterministic.

-v INT — verbosity

Controls stderr diagnostic output: 1 = errors only, 2 = warnings, 3 = informational messages (default), 4+ = debugging.

-a — all alignments

Output all alignments for single-end or unpaired paired-end reads, including secondary alignments. Equivalent to enabling secondary-alignment reporting.

-C — append FASTA/FASTQ comment

Appends the comment field from the FASTA/FASTQ header to the SAM output as an additional column. Useful when the comment carries barcodes or UMIs.

-V — reference header in XR tag

Emits the reference FASTA header line for each alignment position as an XR SAM tag.

-Y — soft-clip supplementary alignments

Uses soft clipping instead of hard clipping for supplementary alignments. Some downstream tools require this.

-M — mark shorter split hits as secondary

Marks the shorter alignment in a split read as secondary (sets 0x100 flag) rather than supplementary. Required for compatibility with tools that do not handle supplementary alignments (e.g. Picard’s duplicate-marking before certain versions).

-j — treat ALT contigs as primary

Treats ALT contigs as part of the primary assembly by ignoring the <idxbase>.alt file. Use when your workflow does not include ALT-aware postprocessing.

Scoring

All scoring flags accept integer values. Changing -A (match score) scales the penalty flags that default to multiples of -A; explicit overrides of individual flags are unaffected.

FlagDefaultMeaning
-A INT1Score for a sequence match. Scales -T, -d, -B, -O, -E, -L, -U unless overridden.
-B INT4Mismatch penalty.
-O INT[,INT]6,6Gap open penalty for deletions and insertions respectively.
-E INT[,INT]1,1Gap extension penalty per base. A gap of length k costs -O + -E * k.
-L INT[,INT]5,5Clipping penalty for 5’ and 3’ ends.
-U INT17Penalty for an unpaired read pair (affects mate-rescue scoring).
-T INT30Minimum alignment score to output. Alignments below this threshold are not reported.

Note — –meth overrides scoring defaults

When --meth is active, bwa-mem3 applies bwameth.py-compatible defaults: -B 2 -L 10 -U 100 -T 40 -CM. Any of these can still be overridden by passing the flag explicitly after --meth.

Paired-end

-I FLOAT[,FLOAT[,INT[,INT]]] — insert size distribution

Specifies the mean, standard deviation (default: 10% of mean), maximum (default: 4 sigma above mean), and minimum of the insert size distribution for FR-orientation paired-end reads. By default bwa-mem3 infers these parameters from the first batch of reads. Provide them explicitly for speed or when the reference is short and inference may be inaccurate.

-m INT — mate rescue rounds

Maximum number of mate-rescue attempts per read. Reduce to speed up alignment on data where the default (50) wastes time on unrescuable pairs.

-S — skip mate rescue

Disables mate rescue entirely. Faster but may reduce sensitivity for discordant pairs.

-P — skip pairing

Skips the pairing step; mate rescue still runs unless -S is also given.

Filtering

-c INT — skip repetitive seeds

Seeds with more than INT occurrences in the reference are skipped. Lowering this (e.g. to 50) speeds up alignment of highly repetitive reads but may reduce sensitivity. Raising it increases sensitivity in repeat-heavy regions at a cost in runtime.

-D FLOAT — chain length fraction

Drops chains shorter than FLOAT times the longest overlapping chain. The default (0.50) discards chains that are less than half the length of the best chain.

-W INT — minimum seeded bases

Discards chains with fewer than INT seeded bases. Raising this filters out very short, low-confidence chains.

-h INT[,INT] — secondary alignment reporting

If there are fewer than INT hits with score exceeding FLOAT (see -z) times the maximum score, all of them are output in the XA auxiliary tag. The second integer is a hard cap on the number of XA entries. Defaults: 5, 200.

-z FLOAT — secondary score fraction

Fraction of the maximum alignment score used as the threshold for secondary hit reporting with -h. Default: 0.80.

-u — emit XB instead of XA

Outputs XB in place of XA. XB is an extension of XA that also carries the alignment score and mapping quality for each secondary hit.

Methylation (--meth)

--meth — enable bisulfite alignment mode

Activates inline C→T (R1) and G→A (R2) read conversion, bwameth-compatible scoring defaults, inline BAM post-processing, and forces --bam output. The reference must have been indexed with bwa-mem3 index --meth.

Pass the original FASTA prefix as <idxbase> — the .bwameth.c2t suffix is appended automatically. If <idxbase> already ends in .bwameth.c2t (interop with an external c2t converter), the auto-append is skipped.

See Methylation Reference for the full treatment.

--set-as-failed {f|r} — strand QC-fail flag

Forces the QC-fail bit (0x200) on all alignments to the forward (f) or reverse (r) bisulfite strand. Used when one strand is known to be unreliable for a given library preparation.

--chimera-qc — opt in to bwameth.py-style chimera heuristic

Off by default (matches Bismark, which has no equivalent heuristic). When set, mapped records whose longest M/=/X CIGAR run is less than 44 % of the read length get 0x200 set, 0x2 cleared, and MAPQ capped at 1. Useful for PBAT / scBS-Seq libraries where intra-fragment chimerism is common, or when reproducing bwameth.py output bit-for-bit.

Threading

-t INT — number of threads

Number of worker threads. Defaults to 1. Set to the number of physical cores available to this job. Scaling is workload- and hardware-dependent: on typical machines the curve flattens around 16–32 threads (FM-index bandwidth and I/O contention dominate); on high-memory / fast-I/O servers the aligner can keep scaling toward ~64 threads on hg38 before saturating. See the threading guide for measured guidance and per-machine recommendations.

See User Guide — Threading and resource use for guidance on thread counts at various machine sizes.

Supplementary MAPQ rescoring

--supp-rep-hard-cap INT — cap MAPQ for repetitive supplementary alignments

Forces MAPQ=0 for supplementary alignments whose chain contains any seed with at least INT occurrences in the genome. This targets supplementary alignments anchored in repetitive regions that upstream MAPQ scoring may overestimate. 0 disables the cap (default). Typical values are 5–20; lower values are more aggressive. Primary alignment MAPQ is unaffected.

Debug

-k INT — minimum seed length

Minimum exact-match seed length. Shorter seeds increase sensitivity but raise runtime. The default (19) is calibrated for 100–150 bp Illumina reads.

-w INT — band width

Band width for the banded Smith-Waterman extension. Wider bands can recover alignments with long indels at greater CPU cost.

-d INT — X-dropoff

Off-diagonal X-dropoff for the Z-drop heuristic. Controls how far an alignment extension continues after a score drop.

-r FLOAT — re-seeding factor

Seeds longer than -k * FLOAT are re-seeded internally to find sub-seeds. Lowering this produces more seeds and higher sensitivity at greater cost.

-y INT — third-round seed occurrence threshold

Seed occurrence threshold for the third round of seeding. Rarely needs adjustment outside highly repetitive genomes.

Notes / Gotchas

Warning — –meth requires a –meth index

Running bwa-mem3 mem --meth against a standard (non-c2t) index produces incorrect alignments without an error. Confirm that the index was built with bwa-mem3 index --meth before aligning bisulfite data.

Note — SIMD variant printed to stderr at startup

When mem starts it prints a banner (Executing in AVX512 mode!! etc.) to stderr. This is informational and does not affect stdout output.


See also: User Guide — Aligning short reads · User Guide — Output: SAM/BAM, headers, tags · CLI Reference — index · Methylation Reference — Overview · Best Practices — Output format