Text-to-Speech
Transformers
Safetensors
qwen3
text-generation
speech
tts
voice
text-generation-inference

Training / Fine Tuning Script?

#1
by snehmehta - opened

It's quite astonishing how much little data and training went, I would like to push more to see if it can be taken to production.
Can you help with training or fine tuning scripts

Same bro i am also looking for fine tuning Scripts but i think fine tune script is not available as open source soon i will share the script with you.

Use the speech codec to convert your audio data into speech codes. Map those codes to the token format and then you can do simple SFT training just like for text.

Thank you very much for your response Dear,
but dear developer I have a doubt about how to add emotion tags like sad, angry,happy etc. did you add these as special tokens are you applied any conditioning. please give the response i love to see your respons. One possibility is that u fine-tune on labeled data like collected the emtion wise audio and added emotic tag in vocab.

@rumourscape please give the response. I am really thinking about more ideas if you will tell that will be a big help. I am also thinkding to add non-verbal ssml tags(laugh,sighs,chuckles,giggle,clears throat with intensity level.)

Sign up or log in to comment