Changelog

Release notes from 0.2.1 onward are generated automatically by release-please from the commit history (see Release process) and mirror the GitHub Releases page.

0.6.0 (2026-07-16)

Features

version: report whether mimalloc is the active allocator (#217) (2cd9fe9)

Performance

bam: cut per-record work in the –bam writer (mem_aln_to_bam) (#212) (837f40e)
bsw: drop dead per-row H1/H2 setup stores in the SW batch wrappers (#211) (039b09f)
dedup: drop provably-dead exact-duplicate passes in mem_sort_dedup_patch (#205) (c544258)
fmi: arm64 lockstep for third-pass reseeding (bwtSeedStrategy) (#215) (c5f69ec)
kswv: remove write-only Hmax scratch buffer from batched SW kernels (#214) (b14ef49)
mem: add unique-mapper fast paths to mem_chain_flt and mem_mark_primary_se (#209) (1700aab)
mem: drop redundant per-call scratch allocations in mem_gen_alt (#216) (3418525)
mem: pool per-read scratch in get_sa_entries_prefetch and mem_reg2aln (#208) (5747a13)
mem: remove dead/instrumentation work from the extension hot path (#206) (439be97)
seed: replace sortSMEMs qsort with a counting sort by rid (#207) (4356b5c)
seed: vectorize backwardExt occ-counting on arm64 (NEON) + hoist invariants (#210) (64a55a2)

Refactoring

mem: static cleanup — remove dead code, silence int64 format warnings (#213) (8c57b7e)

Documentation

retitle LICENSE for BWA-MEM3 and add Fulcrum copyright (#203) (089a956)

0.5.0 (2026-07-04)

Features

add opt-in –seed-order seed reordering (default off, byte-identical) (#186) (04749a1)
add opt-in –smem-dedup (dedup identical SMEMs before chaining) (#187) (1384972)
mem: add –adaptive-band (chain-geometry adaptive banding) for long reads (#194) (4fe92a6)
mem: add –extend-mate-concordant; fix –fast –meth placement regression (#195) (c9ffef1)
mem: add –fast speed preset (#189) (a946af8)
mem: add –max-extend-chains and bundle it into –fast (#193) (e39b3d4)
mem: add –skip-contained-ext and enable it under –fast (#192) (2d2b2b4)

Bug Fixes

bandedSWA: 8-bit SW drops query-end gscore/gtle on zero-score-row exit (#198) (611e21b)
bandedSWA: getScores{8,16} must not scribble padding past numPairs (#199) (9aae808)

Performance

bandedSWA: gate the getScores overshoot guard to sub-slice callers (#201) (162e909)

Refactoring

kswv: drop duplicate F warm-up prefetch in kswv512_16 (#191) (6e4cf2b)

Documentation

changelog: backfill the 0.4.0 breaking-change notice (#183) (e1c381a)
changelog: render the live changelog, not the frozen NEWS.md (#184) (12a46d8)
contributing: document breaking-change commit footers (#181) (4ebd122)
release: describe the release-please flow, not manual tagging (#182) (fc0c4b6)

0.4.0 (2026-06-27)

⚠ BREAKING CHANGES

index: bwa-mem3 index no longer writes the unpacked .0123 reference file by default (#177). bwa-mem3 mem now reconstructs reference bases from the packed .pac on demand (“pac-fetch”) and ignores any .0123 present, so the file is redundant for bwa-mem3 itself. External tools that read .0123 directly — most notably sharing a single index with bwa-mem2 — will break, since the expected file is now absent. To restore the old on-disk layout, re-run indexing with the new opt-in flag index --emit-unpacked-ref. Alignment output is byte-for-byte identical; the change is purely to the index artifact set.

Features

mem: –min-ext-len opt-in filter to skip extension of short seeds (#169) (13db252)
meth: native bisulfite (BS-seq) alignment via –meth (D3) (#174) (a0296b1)

Performance

mem: pac-fetch the reference from .pac instead of loading/building .0123 (#177) (9c4bbf2)
meth: batched (SIMD) asymmetric mate rescue (closes #173) (#175) (f146a18)

Documentation

book: document the libdeflate build prerequisite (incl. AL2023) (#172) (c2b6ec7)
book: recommend -y 0 (drop 3rd-round seeding) as an opt-in speed knob (#171) (ca9ac1f)
collapse the mdbook sidebar into nested, foldable sections (#180) (c6ac47b)
deep mdbook cleanup — dedup, consolidate, and tighten (#179) (54d6d11)
meth: disclose collapsed-mode placement drift vs bwameth.py (#178) (96f29e2)
settings-profiles: note repeat-aggregating downstream caveat for -m 10 (#168) (38fd1ec)

0.3.0 (2026-06-21)

Features

bsw: make the 8-bit h0-prefix seed unsigned [0,255] (#151) (9f51c5f)
bsw: recover the 8-bit banded Smith–Waterman path for reads ≥128 bp (#140) (155a916)
kswv: AVX2 16-bit mate-rescue kernel (kswv256_16) (#162) (9107b82)
meth: carry original-reference @SQ M5/UR and @CO/@PG into –meth headers (#139) (e94ad8b)
prof: off-by-default –profile stage-timing instrumentation (#152) (83cf7ab)
reader: content-detecting FASTQ reader fast path (libdeflate BGZF) (#128) (cdd71bf)

Bug Fixes

bsw: bound getScores8/16 prefetch reads to the padding contract (#150) (87ed5d4)
fmi: widen mem_lim to int64 and guard SA-entry allocations (#156) (2d18c1e)
kthread: drive kt_for with a persistent worker pool (#154) (26b24e7)
meth: emit -R read group as @RG header in –meth mode (#137) (ccd1fc5)
seeding: widen SMEM read positions from int16_t to int32_t (#142) (037c418)
test: make meth layer-2 FAIL diagnostics reachable under set -e (#133) (d2d6688)

Performance

bsw: AVX2 SIMD tuning for the Smith-Waterman kernels (#161) (458b216)
bsw: NEON SIMD tuning for the Smith-Waterman kernels (#160) (d971ff0)
bsw: prefetch next batch’s ref/query in the AVX2 8-bit wrapper (#163) (e6082a0)
bsw: short-circuit the inert per-row re-baseline scan (#147) (8e284e0)
bsw: vectorize the per-row epilogue side-channel loop (#149) (403aeb7)
fmi: size SA-entry staging buffers to the exact write count (#157) (aa0fe33)
read: vendored zlib-ng inflate + chunk cap + 3rd pipeline worker (#153) (5cf89e3)
sw: reassociate affine-gap recurrences on NEON (kswv + bandedSWA) (#166) (a02fcb4)

Refactoring

bsw: derive extension gaps from H (standard Gotoh), not M (#141) (f715fbd)
bsw: drop the dead qlen[] parameter from the 8-bit kernels (#143) (42321df)
bsw: remove dead SW code paths (SORT_PAIRS, non-CORE macros, SSE2 polyfill) (#148) (ad8937e)

Documentation

add memory budgeting and data-type tuning guide (#145) (7127d80)
add situational –supp-rep-hard-cap 20 note for SV-aware pipelines (#134) (e20bc0e)
perf: refresh reference-architecture table to v0.3.0 (a02fcb4) (#167) (2cd7bfa)
perf: what drives the speedup, full perf-PR catalog, and fix the stale RTD build (#155) (0b01fb7)
recommend -s 0 for –meth Pass-2 re-seeding in settings profiles (#132) (45e02e0)
recommend a recent compiler on aarch64, with measured NEON numbers (#165) (b591684)
settings profiles (bwa drop-in vs recommended) (#131) (4d845c9)

0.2.2 (2026-06-08)

Bug Fixes

lto: pass explicit -flto=N on GCC to bypass jobserver under sandboxes (#122) (c6240a7)
smem: free lockstep SMEM caches at thread exit (closes #116) (#117) (9454f10)

Performance

sort: stabilize alnreg tie-breaks + drop in pdqsort at dedup-patch sort sites (#123) (85f8542)

Documentation

bench: inject generated divergence catalog + per-release concordance table (#126) (fea1c94)
correct concordance claims and document supplementary-alignment divergence (#125) (8b2dc69)
document bwa-mem3<->bwa-mem2 non-bit-identity + auditable PR list (#124) (bffae5a)

0.2.1 (2026-05-17)

Bug Fixes

changelog: strip preamble so release-please owns the file (#112) (56e580c)
mapq: propagate SMEM SA-count to seed n_hits so –supp-rep-hard-cap works (#101) (cca9d4f)
smem: track enc_qdb byte capacity separately from wsize_mem (#100) (ab922b6)

Documentation

readme: add bioconda badges and install instructions (#106) (830276c)

Earlier releases and upstream history

The 0.2.0 and 0.1.0-pre fg-labs notes, plus the upstream bwa-mem2 release history, are preserved below as a frozen archive.

Release 0.2.0 (2026-05-13)

Operational / packaging

Single-binary SIMD dispatch on x86 (#83). The previous multi-binary build (make multi producing five bwa-mem3.<tier> ISA variants plus a runsimd.cpp launcher that execv’d the matching tier) is replaced by a single binary that contains compiled kernels for every supported tier (sse41 / sse42 / avx / avx2 / avx512bw) and selects one in process at startup via __builtin_cpu_supports. Install size drops from ~120 MB to ~25 MB; per-call overhead is one indirect branch (~0.3 ns after BTB warm-up). No .<tier> companion files are produced or needed. See docs/src/developer-guide/launcher.md.
BWAMEM3_FORCE_TIER=<tier> and BWAMEM3_DEBUG_SIMD=1 env vars (#83). BWAMEM3_FORCE_TIER is downgrade-only and replaces the prior “exec the bwa-mem3.sse41 binary” A/B-testing pattern; up-tier or unrecognized requests are rejected with a stderr warning.
BASELINE_ARCH=avx2 is the new default for non-kernel translation units on x86 (#84, supersedes the SSE4.1 floor that PR #83 originally shipped with). Override via make BASELINE_ARCH=<tier>. AVX-512BW hosts using BASELINE_ARCH=avx512bw see a small additional speedup on Zen 4 with -mprefer-vector-width=256 (#86) and roughly flat results on Sapphire Rapids — see docs/src/whats-different/avx512-baseline.md for the characterization.
Host-floor precheck (#95). bwa-mem3 mem, bwa-mem3 index, and bwa-mem3 shm refuse to run with exit code 2 and an [E::bwamem3] stderr message when the host CPU does not meet the build’s compile-time SIMD floor, instead of SIGILL-ing deep in alignment. bwa-mem3 version, --help, and -h are exempt and always succeed.
bwa-mem3 version now prints SIMD floor: (build’s required minimum) and SIMD runtime: (resolved tier) lines on stdout, plus a [W::bwa-mem3] warning on stderr (exit 0) if the host is below the floor. See docs/src/getting-started/host-requirements.md.
bwa-mem3 shm performs a statvfs("/dev/shm") capacity preflight (#86). When /dev/shm is too small for the index, the stage aborts with an [E::bwa_shm_stage] message naming /dev/shm, the required size, and a mount -o remount,size=... hint — replacing the prior [fread] Bad address failure mode. statvfs failures (no /dev/shm, restricted sandbox) are non-fatal and the stage proceeds.
bwa-mem3 shm /bwactl registry RMW is now serialized via a POSIX named semaphore (#82, closes #66). Concurrent shm stage / shm drop invocations across processes no longer race when updating the registry; the prior best-effort flock was per-open and did not cover the read-modify-write window.

Methylation

mem --meth emits Bismark-compatible auxiliary tags XR:Z (read conversion CT/GA), XG:Z (genome strand CT/GA), and XM:Z (per-base methylation call string) (#90). These replace the prior bwameth-style YS:Z / YC:Z / YD:Z on output (still used internally for SEQ restoration). The reference-annotation XR:Z from -V is suppressed under --meth to avoid colliding with the Bismark semantics. Downstream tools that previously read YS:Z / YC:Z / YD:Z must be pointed at the corresponding XR:Z / XG:Z and the per-base XM:Z. See docs/src/methylation/tags.md.

Correctness

Fixed SIGSEGV in mem_matesw on shm-backed ref_string (#85). ksw_align2 mutates its reference slice in place; when the slice pointed into a read-only shm segment, this faulted. Now copies the slice before passing it in.
FMI_search sampled-SA prefetch: parenthesized SA_COMPX_MASK precedence so the masked offset is computed against the correct operand (#73). The unparenthesized form was silently producing wrong-but-harmless prefetch addresses; no alignment output was affected.
bntseq .alt parser bounds the line buffer to prevent a stack-overflow on malicious or malformed .alt files (#74).
display_stats clamps the per-thread bucket count to LIM_C so --profile with -t greater than the compiled-in limit no longer writes past the end of the stats array (#81).

Performance

x86 wall-time improvements on the bench (vs the 0.1.0-pre baseline): AVX2 (c6a) −17 to −22%, AVX-512 AMD Zen4 (c7a) −16 to −24%, AVX-512 Intel SPR (c7i) −28 to −30% across wgs / wes / panel-twist 5M-read samples. Concordance vs upstream bwa-mem2 v2.2.1 remains 100.0000% on all non-methylation cells. arm64 (c7g / c8g) is flat (within ±2%). The wins are attributable primarily to (a) capping AVX-512BW auto-vectorization at 256-bit on the avx512bw target (#86) and (b) inlining FMI_search::backwardExt to recover a gcc 12+ wall-clock regression (#88). See docs/src/performance/overview.md for the reference numbers across architectures.
Smaller contributions in the release window: per-strip L1 prefetches across all kswv u8/u16 kernels (#70); SMEM_LOCKSTEP_N bumped from 8 to 16 (#75); closed-form ungapped HIT path when total_mis == 0 (#77); ksort switched to an on-stack buffer for small n to drop a per-call malloc (#78); libsais_build skips a wasted zero-init pass on its unpack and SA buffers, trimming index-build time (#80).

Release 0.1.0-pre (2026-04-28)

Project renamed from bwa-mem2 to bwa-mem3. The new project tracks Fulcrum Genomics’ performance and feature work on top of the upstream bwa-mem2 codebase.
Default branch renamed from fg-main to main.
Binary renamed from bwa-mem2 to bwa-mem3. Arch-suffixed variants (bwa-mem3.sse41, .sse42, .avx, .avx2, .avx512bw, .arm64, .pgo, .profile, .lto) renamed to match.
@PG SAM header tags now read ID:bwa-mem3 PN:bwa-mem3 (and bwa-mem3-meth for --meth mode).
Test binaries renamed: bwa_mem2_tests_unit → bwa_mem3_tests_unit, bwa_mem2_tests_integration → bwa_mem3_tests_integration.
.bwt.2bit.64 index file format unchanged — bwa-mem3 reads indexes built by bwa-mem2 index without re-indexing.

Release 2.2.1 (17 March 2021)

Hotfix for v2.2: Fixed the bug mentioned in #135.

Release 2.2 (8 March 2021)

Changes since the last release (2.1):

Passed the validation test on ~88 billions reads (Credits: Keiran Raine, CASM division, Sanger Institute)
Fixed bugs reported in #109 causing mismatch between bwa-mem and bwa-mem2
Fixed the issue (# 112) causing crash due to corrupted thread id
Using all the SSE flags to create optimized SSE41 and SSE42 binaries

Release 2.1 (16 October 2020)

Release 2.1 of BWA-MEM2.

Changes since the last release (2.0):

Smaller index: the index size on disk is down by 8 times and in memory by 4 times due to moving to only one type of FM-index (2bit.64 instead of 2bit.64 and 8bit.32) and 8x compression of suffix array. For example, for human genome, index size on disk is down to ~10GB from ~80GB and memory footprint is down to ~10GB from ~40GB. There is a substantial decrease in index IO time due to the reduction and hardly any performance impact on read mapping.
Added support for 2 more execution modes: sse4.2 and avx.
Fixed multiple bugs including those reported in Issues #71, #80 and #85.
Merged multiple pull requests.

Release 2.0 (9 July 2020)

This is the first production release of BWA-MEM2.

Changes since the last release:

Made the source code more secure with more than 300 changes all across it.
Added support for memory re-allocations in case the pre-allocated fixed memory is insufficient.
Added support for MC flag in the sam file and support for -5, -q flags in the command line.
The output is now identical to the output of bwa-mem-0.7.17.
Merged index building code with FMI_Search class.
Added support for different ways to input read files, now, it is same as bwa-mem.
Fixed a bug in AVX512 sam processing part, which was leading to incorrect output.

Release 2.0pre2 (4 February 2020)

Miscellaneous changes:

Changed the license from GPL to MIT.
IMPORTANT: the index structure has changed since commit 6743183. Please rebuild the index if you are using a later commit or the new release.
Added charts in README.md comparing the performance of bwa-mem2 with bwa-mem.

Major code changes:

Fixed working for variable length reads.
Fixed a bug involving reads of length greater than 250bp.
Added support for allocation of more memory in small chunks if large pre-allocated fixed memory is insufficient. This is needed very rarely (thus, having no impact on performance) but prevents asserts from failing (code from crashing) in that scenario.
Fixed a memory leak due to not releasing the memory allocated for seeds after smem.
Fixed a segfault due to non-alignment of small allocated memory in the optimized banded Smith-Waterman.
Enabled working with genomes larger than 7-8 billion nucleotides (e.g. Wheat genome).
Fixed a segfault occuring (with gcc compiler) while reading the index.

Keyboard shortcuts

bwa-mem3

0.6.0 (2026-07-16)

0.5.0 (2026-07-04)

0.4.0 (2026-06-27)

0.3.0 (2026-06-21)

0.2.2 (2026-06-08)

0.2.1 (2026-05-17)