Instructions to use SPRINGLab/k2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- K2
How to use SPRINGLab/k2 with K2:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
| tags: | |
| - speech-recognition | |
| - ASR | |
| - k2 | |
| - sherpa | |
| - PyTorch | |
| license: cc-by-4.0 | |
| library_name: icefall | |
| datasets: | |
| - librispeech | |
| inference: false | |
| -1. Create your own virtualenv | |
| # Install CUDA and cuDNN | |
| 0. Run the following command: | |
| ```nvidia-smi | head -n 4``` | |
| Install CUDA <= Cuda Version mentioned. | |
| 1. Install CUDA (I am installing CUDA 12.1) | |
| ``` | |
| wget https://developer.download.nvidia.com/compute/cuda/12.1.0/local_installers/cuda_12.1.0_530.30.02_linux.run | |
| ``` | |
| ``` | |
| chmod +x cuda_12.1.0_530.30.02_linux.run | |
| ``` | |
| (change the 'installpath') | |
| ``` | |
| ./cuda_12.1.0_530.30.02_linux.run \ | |
| --silent \ | |
| --toolkit \ | |
| --installpath=/speech/hasan/software/cuda-12.1.0 \ | |
| --no-opengl-libs \ | |
| --no-drm \ | |
| --no-man-page | |
| ``` | |
| ## Install cuDNN for CUDA 12.1 | |
| ``` | |
| wget https://huggingface.co/csukuangfj/cudnn/resolve/main/cudnn-linux-x86_64-8.9.5.29_cuda12-archive.tar.xz | |
| ``` | |
| ``` | |
| tar xvf cudnn-linux-x86_64-8.9.5.29_cuda12-archive.tar.xz --strip-components=1 -C /speech/hasan/software/cuda-12.1.0 | |
| ``` | |
| Create a file `activate-cuda-12.1.sh`, copy the following code and then run `source activate-cuda-12.1.sh` | |
| ``` | |
| export CUDA_HOME=/speech/hasan/software/cuda-12.1.0 | |
| export PATH=$CUDA_HOME/bin:$PATH | |
| export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH | |
| export LD_LIBRARY_PATH=$CUDA_HOME/lib:$LD_LIBRARY_PATH | |
| export LD_LIBRARY_PATH=$CUDA_HOME/extras/CUPTI/lib64:$LD_LIBRARY_PATH | |
| export CUDAToolkit_ROOT_DIR=$CUDA_HOME | |
| export CUDAToolkit_ROOT=$CUDA_HOME | |
| export CUDA_TOOLKIT_ROOT_DIR=$CUDA_HOME | |
| export CUDA_TOOLKIT_ROOT=$CUDA_HOME | |
| export CUDA_BIN_PATH=$CUDA_HOME | |
| export CUDA_PATH=$CUDA_HOME | |
| export CUDA_INC_PATH=$CUDA_HOME/targets/x86_64-linux | |
| export CFLAGS=-I$CUDA_HOME/targets/x86_64-linux/include:$CFLAGS | |
| export CUDAToolkit_TARGET_DIR=$CUDA_HOME/targets/x86_64-linux | |
| ``` | |
| Check your installation by running: | |
| ``` | |
| which nvcc | |
| ``` | |
| Desired output: | |
| ``` | |
| /speech/hasan/software/cuda-12.1.0/bin/nvcc | |
| ``` | |
| ``` | |
| nvcc --version | |
| ``` | |
| Desired output: | |
| ``` | |
| nvcc: NVIDIA (R) Cuda compiler driver | |
| Copyright (c) 2005-2023 NVIDIA Corporation | |
| Built on Tue_Feb__7_19:32:13_PST_2023 | |
| Cuda compilation tools, release 12.1, V12.1.66 | |
| Build cuda_12.1.r12.1/compiler.32415258_0 | |
| ``` | |
| [Reference](https://k2-fsa.github.io/k2/installation/cuda-cudnn.html) | |
| # Install Torch and TorchAudio | |
| torch==2.2.1 and torchaudio==2.2.1 are compatible, [reference](https://pytorch.org/get-started/previous-versions/#linux-and-windows-1), so I'll install that | |
| ``` | |
| pip install torch==2.2.1+cu121 torchaudio==2.2.1+cu121 -f https://download.pytorch.org/whl/torch_stable.html | |
| ``` | |
| Verify Installation | |
| ``` | |
| python3 -c "import torch; print(torch.__version__)" | |
| python3 -c "import torchaudio; print(torchaudio.__version__)" | |
| ``` | |
| Desired output: | |
| ``` | |
| 2.3.0+cu121 | |
| ``` | |
| ## Install k2 | |
| ``` | |
| pip install k2==1.24.4.dev20240425+cuda12.1.torch2.2.1 -f https://k2-fsa.github.io/k2/cuda.html | |
| ``` | |
| Verify Installation | |
| ``` | |
| python3 -m k2.version | |
| ``` | |
| ## Install lhotse | |
| ``` | |
| pip install git+https://github.com/lhotse-speech/lhotse | |
| ``` | |
| Verify Installation: | |
| ``` | |
| python3 -c "import lhotse; print(lhotse.__version__)" | |
| ``` | |
| Desired output: | |
| ``` | |
| 1.24.0.dev+git.4d57d53.clean | |
| ``` | |
| ## Install icefall | |
| ``` | |
| git clone https://github.com/k2-fsa/icefall | |
| cd icefall/ | |
| pip install -r ./requirements.txt | |
| ``` | |
| Export the path where you cloned icefall | |
| ``` | |
| export PYTHONPATH=/speech/hasan/icefall_install/icefall:$PYTHONPATH | |
| cd egs/yesno/ASR/ | |
| ``` | |
| Test your Installation | |
| ``` | |
| ./prepare.sh | |
| ``` | |
| export CUDA_VISIBLE_DEVICES="" | |
| ./tdnn/train.py | |
| ``` | |
| ``` | |
| ./tdnn/decode.py | |
| ``` | |
| ## Congrats! | |
| [Reference](https://icefall.readthedocs.io/en/latest/installation/index.html) | |
| ## install kaldi feat | |
| pip install kaldifeat==1.25.4.dev20240425+cpu.torch2.3.0 -f https://csukuangfj.github.io/kaldifeat/cpu.html | |
| ## install sherpa | |
| pip install k2_sherpa==1.3.dev20240227+cpu.torch2.2.1 -f https://k2-fsa.github.io/sherpa/cpu.html | |
| ## training | |
| python3 egs/<dataset_name>/ASR/zipformer/train.py \ | |
| --world-size <number_of_gpus> \ | |
| --num-epochs <number_of_epochs> \ | |
| --start-epoch <starting_epoch> \ | |
| --exp-dir <experiment_directory> \ | |
| --max-duration <max_duration_per_batch> \ | |
| --num-workers <number_of_data_workers> \ | |
| --on-the-fly-feats <True_or_False> \ | |
| --manifest-dir <manifest_directory> \ | |
| --num-buckets <number_of_buckets> \ | |
| --bpe-model <path_to_bpe_model> \ | |
| --train-cuts <path_to_training_cuts> \ | |
| --valid-cuts <path_to_validation_cuts> \ | |
| --causal <1_or_0> \ | |
| --master-port <port_number> | |
| Parameter Reference: | |
| --world-size: Number of GPUs or processes to use for distributed training. | |
| --num-epochs: Total number of epochs to run the training. | |
| --start-epoch: Epoch to start training from (helpful when resuming). | |
| --exp-dir: Path to the directory where experiment logs and model checkpoints will be saved. | |
| --max-duration: Maximum duration of audio samples per batch (in seconds or milliseconds, depending on the setup). | |
| --num-workers: Number of workers for loading data. | |
| --on-the-fly-feats: Whether to compute features on-the-fly during training (True or False). | |
| --manifest-dir: Directory containing the manifest files (JSON) for training and validation data. | |
| --num-buckets: Number of buckets used for bucketing data by sequence length. | |
| --bpe-model: Path to the Byte-Pair Encoding model for text tokenization. | |
| --train-cuts: Path to the JSONL file containing the training cuts. | |
| --valid-cuts: Path to the JSONL file containing the validation cuts. | |
| --causal: Set to 1 for causal training (useful for certain model architectures like Zipformer). | |
| --master-port: Port number for distributed training communication | |
| # sample decode file | |
| Streaming ASR Decoding with Zipformer | |
| This script facilitates the streaming decoding of ASR models using Zipformer in the Icefall framework. It supports greedy search decoding along with the configuration for chunked streaming. | |
| ./streaming_decode.py --epoch <EPOCH_NUMBER> \ | |
| --avg <AVERAGE_NUMBER> \ | |
| --exp-dir <EXPERIMENT_DIR> \ | |
| --decoding-method <DECODING_METHOD> \ | |
| --manifest-dir <MANIFEST_DIR> \ | |
| --cut-set-name <CUT_SET_NAME> \ | |
| --bpe-model <BPE_MODEL_PATH> \ | |
| --causal <CAUSAL_FLAG> \ | |
| --chunk-size <CHUNK_SIZE> \ | |
| --left-context-frames <LEFT_CONTEXT_FRAMES> \ | |
| --on-the-fly-feats <ON_THE_FLY_FEATS_FLAG> \ | |
| --use-averaged-model <AVERAGED_MODEL_FLAG> \ | |
| --num-workers <NUM_WORKERS> \ | |
| --max-duration <MAX_DURATION> \ | |
| --num-decode-streams <NUM_DECODE_STREAMS> \ | |
| --context-size <CONTEXT_SIZE> | |
| Parameters | |
| --epoch: Specifies which training epoch to use for decoding. A higher epoch number means the model has undergone more training. | |
| --avg: Number of checkpoints to average. For example, --avg 4 means the last 4 checkpoints will be averaged for decoding. | |
| --exp-dir: Directory where the model's experimental data, such as checkpoints and logs, are stored. | |
| --decoding-method: Decoding strategy to be used. Common methods include greedy_search, beam_search, etc. | |
| --manifest-dir: Directory containing manifest files for the datasets to be decoded. | |
| --cut-set-name: Specifies which cut set to use for decoding, typically indicating the subset of data like test_1, test_2, etc. | |
| --bpe-model: Path to the BPE model to be used for tokenization during decoding. | |
| --causal: Indicates whether causal convolution should be used. Set 1 for causal and 0 for non-causal. | |
| --chunk-size: The size of each chunk to be processed during streaming. | |
| --left-context-frames: Number of frames from the left context to be included during chunked decoding. | |
| --on-the-fly-feats: If set to True, feature extraction is performed on-the-fly, without precomputing the features. | |
| --use-averaged-model: If True, the model will use averaged parameters from multiple epochs or checkpoints. | |
| --num-workers: Number of workers to be used for data loading during decoding. | |
| --max-duration: The maximum duration (in seconds) of audio files to decode in one batch. | |
| --num-decode-streams: Number of parallel decoding streams to process. | |
| --context-size: The size of the right context to be used during chunk-based streaming decoding. | |
| Sherpa Online WebSocket Server | |
| This script sets up a WebSocket server for real-time ASR decoding using the Sherpa framework. It supports GPU-based decoding, different decoding methods, and tokenized models. | |
| sherpa-online-websocket-server --use-gpu=<USE_GPU_FLAG> \ | |
| --tokens=<TOKENS_FILE_PATH> \ | |
| --port=<PORT_NUMBER> \ | |
| --doc-root=<DOCUMENT_ROOT> \ | |
| --nn-model=<MODEL_PATH> \ | |
| --decoding-method=<DECODING_METHOD> | |
| Parameters | |
| --use-gpu: Set this flag to True for GPU-based decoding, or False for CPU-based decoding. | |
| --tokens: Path to the file containing the token list (e.g., BPE tokens) required for decoding. | |
| --port: Port number for the WebSocket server. Ensure this port is open and not blocked by firewalls. | |
| --doc-root: The root directory for the server's documentation or web resources. This is the directory that serves files when accessed via a browser. | |
| --nn-model: Path to the neural network model to be used for decoding. The model is usually a jit_script file trained for speech recognition. | |
| --decoding-method: The decoding strategy to use. Common methods include greedy_search, beam_search, etc. Choose based on your model and application needs. | |
| Example | |
| sherpa-online-websocket-server --use-gpu=True \ | |
| --tokens=/path/to/tokens.txt \ | |
| --port=8003 \ | |
| --doc-root=/path/to/web/document/root \ | |
| --nn-model=/path/to/jit_script_model.pt \ | |
| --decoding-method=greedy_search | |
| Notes | |
| GPU support: If using GPU, ensure that CUDA is properly set up on the system. | |
| Token file: The token file should correspond to the language and tokenization scheme used when training the neural network model. | |
| Neural Network Model: The model provided should be compatible with the decoding method specified (e.g., chunk-based decoding for streaming models). | |