GGUF + pure-C++ runtime in CrispASR — FireRedASR2-AED
#2
by cstr - opened
We've added FireRedASR2-AED to CrispASR as the firered-asr backend. C++ binary, GGUF — no Python.
src/firered_asr.cpp — Conformer 16L (d=1280, 20 heads, rel-PE, macaron FFN) + AED Transformer decoder (16L, GELU FFN, cross-attn). Two FireRed-specific things bit hard:
ConformerFeedForwardhas a hidden internal residual. Reading just the block-level codeout = 0.5*x + 0.5*ffn(x)misses thatffnitself doesoutput + residualinternally, so the actual maths isx + 0.5*net(x), not0.5*x + 0.5*net(x). Our FFN1 went from 0.3 max error to 0.0003 after the fix.- Relative positional encoding
_rel_shiftmapsshifted[h, tq, tk] = original[h, tq, T-1-tq+tk]— the sign oftq-tkis flipped vs the natural reading. Verified with a T=5 example.
We also did the FireRed decoder ggml-native Q4_K quantisation (LEARNINGS "FireRed decoder ggml native Q4_K — 6.3x speedup"). And we ship the companion FireRedVAD GGUF (cstr/firered-vad-GGUF) — --vad -vm firered is recommended for Mandarin.
Pre-quantised GGUFs (Apache-2.0): cstr/firered-asr2-aed-GGUF
./build/bin/crispasr --backend firered-asr -m firered-asr2-aed-q4_k.gguf \
-f audio.wav --vad -vm firered
Beam search wired (default beam_size=4); word timestamps via forced alignment; LID via FireRedLID GGUF (120 languages, also CrispASR-built).