GGUF + pure-C++ runtime in CrispASR — FireRedASR2-AED

#2
by cstr - opened

We've added FireRedASR2-AED to CrispASR as the firered-asr backend. C++ binary, GGUF — no Python.

src/firered_asr.cpp — Conformer 16L (d=1280, 20 heads, rel-PE, macaron FFN) + AED Transformer decoder (16L, GELU FFN, cross-attn). Two FireRed-specific things bit hard:

  1. ConformerFeedForward has a hidden internal residual. Reading just the block-level code out = 0.5*x + 0.5*ffn(x) misses that ffn itself does output + residual internally, so the actual maths is x + 0.5*net(x), not 0.5*x + 0.5*net(x). Our FFN1 went from 0.3 max error to 0.0003 after the fix.
  2. Relative positional encoding _rel_shift maps shifted[h, tq, tk] = original[h, tq, T-1-tq+tk] — the sign of tq-tk is flipped vs the natural reading. Verified with a T=5 example.

We also did the FireRed decoder ggml-native Q4_K quantisation (LEARNINGS "FireRed decoder ggml native Q4_K — 6.3x speedup"). And we ship the companion FireRedVAD GGUF (cstr/firered-vad-GGUF) — --vad -vm firered is recommended for Mandarin.

Pre-quantised GGUFs (Apache-2.0): cstr/firered-asr2-aed-GGUF

./build/bin/crispasr --backend firered-asr -m firered-asr2-aed-q4_k.gguf \
    -f audio.wav --vad -vm firered

Beam search wired (default beam_size=4); word timestamps via forced alignment; LID via FireRedLID GGUF (120 languages, also CrispASR-built).

Sign up or log in to comment