Instructions to use SPRINGLab/EZ-VC with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- F5-TTS
How to use SPRINGLab/EZ-VC with F5-TTS:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
| license: cc-by-nc-4.0 | |
| pipeline_tag: audio-to-audio | |
| library_name: f5-tts | |
| extra_gated_prompt: "You agree to not use the model to generate, share, or promote content that is illegal, harmful, deceptive, or intended to impersonate real individuals without their informed consent." | |
| extra_gated_fields: | |
| Affiliation: text | |
| Country: country | |
| I agree to use this model for non-commercial use ONLY: checkbox | |
| # EZ-VC: Easy Zero-shot Any-to-Any Voice Conversion | |
| [](https://github.com/EZ-VC/EZ-VC) | |
| [](https://aclanthology.org/2025.findings-emnlp.1077/) | |
| [](https://ez-vc.github.io/EZ-VC-Demo/) | |
| [](https://asr.iitm.ac.in/) | |
| <!-- <img src="https://github.com/user-attachments/assets/12d7749c-071a-427c-81bf-b87b91def670" alt="Watermark" style="width: 40px; height: auto"> --> | |
| ### Our paper has been published in the Findings of EMNLP 2025! | |
| ## Installation | |
| ### Create a separate environment if needed | |
| ```bash | |
| # Create a python 3.10 conda env (you could also use virtualenv) | |
| conda create -n ez-vc python=3.10 | |
| conda activate ez-vc | |
| ``` | |
| ### Local installation | |
| ```bash | |
| git clone https://github.com/EZ-VC/EZ-VC | |
| cd EZ-VC | |
| git submodule update --init --recursive | |
| pip install -e . | |
| # Install espnet for xeus (Exactly this version) | |
| pip install 'espnet @ git+https://github.com/wanchichen/espnet.git@ssl' | |
| ``` | |
| ## Inference | |
| We have provided a Jupyter notebook for inference in "src/f5_tts/infer/infer.ipynb". | |
| Open [Inference notebook](https://github.com/EZ-VC/EZ-VC/blob/main/src/f5_tts/infer/infer.ipynb). | |
| Run all. | |
| The converted audio will be available at the last cell. | |
| ## Acknowledgements | |
| - [F5-TTS](https://arxiv.org/abs/2410.06885) for opensourcing their code which has made EZ-VC possible. | |
| ## Citation | |
| If our work and codebase is useful for you, please cite as: | |
| ``` | |
| @inproceedings{joglekar-etal-2025-ez, | |
| title = "{EZ}-{VC}: Easy Zero-shot Any-to-Any Voice Conversion", | |
| author = "Joglekar, Advait and | |
| Singh, Divyanshu and | |
| Bhatia, Rooshil Rohit and | |
| Umesh, Srinivasan", | |
| editor = "Christodoulopoulos, Christos and | |
| Chakraborty, Tanmoy and | |
| Rose, Carolyn and | |
| Peng, Violet", | |
| booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2025", | |
| month = nov, | |
| year = "2025", | |
| address = "Suzhou, China", | |
| publisher = "Association for Computational Linguistics", | |
| url = "https://aclanthology.org/2025.findings-emnlp.1077/", | |
| doi = "10.18653/v1/2025.findings-emnlp.1077", | |
| pages = "19768--19774", | |
| ISBN = "979-8-89176-335-7", | |
| abstract = "Voice Conversion research in recent times has increasingly focused on improving the zero-shot capabilities of existing methods. Despite remarkable advancements, current architectures still tend to struggle in zero-shot cross-lingual settings. They are also often unable to generalize for speakers of unseen languages and accents. In this paper, we adopt a simple yet effective approach that combines discrete speech representations from self-supervised models with a non-autoregressive Diffusion-Transformer based conditional flow matching speech decoder. We show that this architecture allows us to train a voice-conversion model in a purely textless, self-supervised fashion. Our technique works without requiring multiple encoders to disentangle speech features. Our model also manages to excel in zero-shot cross-lingual settings even for unseen languages. We provide our code, model checkpoint and demo samples here: https://github.com/ez-vc/ez-vc" | |
| } | |
| ``` | |
| ## License | |
| Our code is released under MIT License. The pre-trained models are licensed under the CC-BY-NC license. Sorry for any inconvenience this may cause. |