DeCRED-base

This is a 39M encoder-decoder Ebranchformer model trained on 6,000 hours of open-source normalised English data.

Architecture details, training hyperparameters, and a description of the proposed technique will be added soon.

Disclaimer: The model currently hallucinates on segments containing silence only, as it was previously not trained on such data. The fix will be added soon.

The model can be used with the pipeline class to transcribe audio files of arbitrary length.

from transformers import pipeline

model_id = "BUT-FIT/ED-small"
pipe = pipeline("automatic-speech-recognition", model=model_id, feature_extractor=model_id, trust_remote_code=True)
# In newer versions of transformers (>4.31.0), there is a bug in the pipeline inference type.
# The warning can be ignored.
pipe.type = "seq2seq"

# Run beam search decoding with joint CTC-attention scorer
result_beam = pipe("audio.wav")

# Run greedy decoding without joint CTC-attention scorer
pipe.model.generation_config.ctc_weight = 0.0
pipe.model.generation_config.num_beams = 1

result_greedy = pipe("audio.wav")

Downloads last month: 3

Safetensors

Model size

38.5M params

Tensor type

F32

Datasets used to train BUT-FIT/ED-small

Collection including BUT-FIT/ED-small

DeCRED

Collection

This collection showcases DeCRED (Decoder-Centric Regularisation in Encoder-Decoder) for ASR. • 11 items • Updated Mar 2

Evaluation results

Test WER on LibriSpeech (clean)
test set self-reported

3.400
Test WER on LibriSpeech (other)
test set self-reported

7.700
Test WER on tedlium-v3
test set self-reported

5.500
Test WER on Vox Populi
test set self-reported

8.300
Test WER on Mozilla Common Voice 13.0
test set self-reported

16.100
Test WER on FLEURS
test set self-reported

9.900
Test WER on Switchboard
self-reported

12.500
Test WER on Wall Street Journal
self-reported

2.400