PlotweaverModel commited on
Commit
0ca353f
·
verified ·
1 Parent(s): a1e429b

Upload 3 files

Browse files
Files changed (3) hide show
  1. README.md +67 -7
  2. app.py +505 -0
  3. requirements.txt +3 -0
README.md CHANGED
@@ -1,13 +1,73 @@
1
  ---
2
- title: Live Commentary Streaming App
3
- emoji: 🌖
4
- colorFrom: yellow
5
- colorTo: green
6
  sdk: gradio
7
- sdk_version: 6.14.0
8
- python_version: '3.13'
9
  app_file: app.py
10
  pinned: false
 
11
  ---
12
 
13
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Live Football Commentary Translator
3
+ emoji:
4
+ colorFrom: green
5
+ colorTo: blue
6
  sdk: gradio
7
+ sdk_version: 4.44.0
 
8
  app_file: app.py
9
  pinned: false
10
+ license: apache-2.0
11
  ---
12
 
13
+ # Live Football Commentary Translator
14
+
15
+ Speak (or upload) commentary in one language, hear it spoken in another.
16
+
17
+ ## What this is
18
+
19
+ A proof-of-concept HuggingFace Space that takes short audio clips of football
20
+ commentary and returns the same commentary in a different language, spoken with
21
+ appropriate energy.
22
+
23
+ - **Sources:** English, Scottish English, German, Spanish, Arabic
24
+ - **Targets:** English, Scottish English, German, Spanish, Arabic, Swahili,
25
+ Amharic, Afrikaans
26
+
27
+ ## How it works
28
+
29
+ Two pipelines, routed by target language:
30
+
31
+ | Target language | Pipeline | Cost |
32
+ |---|---|---|
33
+ | English, Scottish-EN, German, Spanish, Arabic | Single Qwen-Omni call: audio in → translated speech out | 1 API call |
34
+ | Swahili, Amharic, Afrikaans | Qwen-Omni (audio → translated text), then YourVoic (text → speech) | 2 API calls |
35
+
36
+ Qwen-Omni is `qwen3.5-omni-plus` on DashScope International. YourVoic is the
37
+ fallback for languages Qwen-Omni doesn't cover natively. This split exists
38
+ because Qwen-Omni does not produce intelligible speech in Swahili, Amharic,
39
+ or Afrikaans on its own.
40
+
41
+ ## Deploy
42
+
43
+ 1. Create a new HuggingFace Space, SDK = Gradio
44
+ 2. Upload `app.py`, `requirements.txt`, and this `README.md`
45
+ 3. Add secrets in **Settings → Variables and secrets**:
46
+ - `DASHSCOPE_API_KEY` (required) — get one from DashScope International
47
+ - `YOURVOIC_API_KEY` (required for Swahili/Amharic/Afrikaans only)
48
+ 4. (Recommended) Set hardware to **ZeroGPU** if you have access. CPU also works
49
+ but will be slower on the audio-decode steps.
50
+
51
+ ## Expected latency
52
+
53
+ On free ZeroGPU, expect 3-8 seconds from end-of-speech to start-of-output. The
54
+ demo is designed to feel "live-ish" but not simultaneous-interpretation grade.
55
+ Speak in short bursts — one play, one tackle, one moment — rather than long
56
+ monologues.
57
+
58
+ ## Known limitations
59
+
60
+ - "Scottish English" is treated as accented English in the system prompt rather
61
+ than a separate language. Qwen-Omni's Scottish accent is decent but not
62
+ authentic.
63
+ - YourVoic voice support per language is sparsely documented. The code falls
64
+ back to a universal voice ("Peter") if the primary choice fails.
65
+ - Arabic voice cloning is intentionally not exposed — the underlying
66
+ `qwen3-tts-vc` model doesn't support Arabic.
67
+ - Free-tier ZeroGPU has cold-start delays. First call after idle is slower.
68
+
69
+ ## Files
70
+
71
+ - `app.py` — Gradio UI and pipeline
72
+ - `requirements.txt` — Python dependencies
73
+ - `README.md` — this file (also the Space metadata header)
app.py ADDED
@@ -0,0 +1,505 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Live Football Commentary Translator
3
+ ====================================
4
+ Audio in (live commentator) -> Translate -> Audio out (target language).
5
+
6
+ Architecture:
7
+ - Qwen-Omni (qwen3.5-omni-plus) handles audio-in -> translated-speech-out
8
+ in ONE call for languages it covers (English, German, Spanish, Arabic,
9
+ Scottish-accented English).
10
+ - For African target languages (Swahili, Amharic, Afrikaans), Qwen-Omni
11
+ does audio -> translated text, then YourVoic does text -> speech.
12
+
13
+ Deploy as a Hugging Face Space (SDK: Gradio). Add these secrets:
14
+ - DASHSCOPE_API_KEY (required, for Qwen-Omni)
15
+ - YOURVOIC_API_KEY (required for Swahili/Amharic/Afrikaans targets)
16
+ """
17
+
18
+ import os
19
+ import base64
20
+ import json
21
+ import struct
22
+ import subprocess
23
+ import tempfile
24
+ import time
25
+ import uuid
26
+
27
+ import gradio as gr
28
+ import requests as http_requests
29
+ from openai import OpenAI
30
+
31
+ # ==========================================
32
+ # CONFIGURATION
33
+ # ==========================================
34
+ OMNI_MODEL = "qwen3.5-omni-plus"
35
+ DASHSCOPE_BASE_URL = "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
36
+
37
+ YOURVOIC_TTS_URL = "https://yourvoic.com/api/v1/tts/generate"
38
+
39
+ # ==========================================
40
+ # LANGUAGES
41
+ # ==========================================
42
+ # Sources: what the commentator speaks
43
+ SOURCE_LANGUAGES = {
44
+ "English": {"code": "en", "omni_hint": "English"},
45
+ "Scottish English": {"code": "en-scot", "omni_hint": "Scottish-accented English"},
46
+ "German": {"code": "de", "omni_hint": "German"},
47
+ "Spanish": {"code": "es", "omni_hint": "Spanish"},
48
+ "Arabic": {"code": "ar", "omni_hint": "Arabic"},
49
+ }
50
+
51
+ # Targets: routed by engine
52
+ # "qwen" -> Qwen-Omni does audio-in -> translated-speech-out in one call
53
+ # "yourvoic" -> Qwen-Omni does audio-in -> translated-text, YourVoic speaks it
54
+ TARGET_LANGUAGES = {
55
+ "English": {"engine": "qwen", "omni_hint": "English"},
56
+ "Scottish English": {"engine": "qwen", "omni_hint": "Scottish-accented English"},
57
+ "German": {"engine": "qwen", "omni_hint": "German"},
58
+ "Spanish": {"engine": "qwen", "omni_hint": "Spanish"},
59
+ "Arabic": {"engine": "qwen", "omni_hint": "Arabic"},
60
+ "Swahili": {"engine": "yourvoic", "omni_hint": "Swahili", "yourvoic_lang": "sw-KE"},
61
+ "Amharic": {"engine": "yourvoic", "omni_hint": "Amharic", "yourvoic_lang": "am-ET"},
62
+ "Afrikaans": {"engine": "yourvoic", "omni_hint": "Afrikaans", "yourvoic_lang": "af-ZA"},
63
+ }
64
+
65
+ # Voice options for Qwen-Omni targets (sticking to ones that read well for commentary)
66
+ QWEN_VOICES = [
67
+ "Ethan -- Warm, energetic (good default)",
68
+ "Ryan -- Dramatic, rhythmic (good for live action)",
69
+ "Cherry -- Sunny, friendly",
70
+ "Jennifer -- Cinematic narrator",
71
+ "Vincent -- Rich, theatrical",
72
+ "Bellona -- Strong, commanding",
73
+ ]
74
+
75
+ # YourVoic voices per target language (best effort; YourVoic docs are sparse,
76
+ # so 'Peter' is kept as a universal fallback as in the reference codebase).
77
+ YOURVOIC_VOICE_MAP = {
78
+ "Swahili": ["Peter"],
79
+ "Amharic": ["Peter"],
80
+ "Afrikaans": ["Peter"],
81
+ }
82
+
83
+ YOURVOIC_MODEL = "aura-prime" # balanced quality/speed
84
+
85
+ # ==========================================
86
+ # HELPERS
87
+ # ==========================================
88
+ def voice_name(label: str) -> str:
89
+ return label.split("--")[0].strip()
90
+
91
+
92
+ def base64_to_wav(b64_data: str, output_path: str) -> None:
93
+ """Qwen-Omni returns base64 PCM. Wrap it in a WAV container."""
94
+ audio_bytes = base64.b64decode(b64_data)
95
+ sr, nc, bps = 24000, 1, 16
96
+ br = sr * nc * bps // 8
97
+ ba = nc * bps // 8
98
+ ds = len(audio_bytes)
99
+ with open(output_path, "wb") as f:
100
+ f.write(b"RIFF")
101
+ f.write(struct.pack("<I", 36 + ds))
102
+ f.write(b"WAVE")
103
+ f.write(b"fmt ")
104
+ f.write(struct.pack("<I", 16))
105
+ f.write(struct.pack("<H", 1))
106
+ f.write(struct.pack("<H", nc))
107
+ f.write(struct.pack("<I", sr))
108
+ f.write(struct.pack("<I", br))
109
+ f.write(struct.pack("<H", ba))
110
+ f.write(struct.pack("<H", bps))
111
+ f.write(b"data")
112
+ f.write(struct.pack("<I", ds))
113
+ f.write(audio_bytes)
114
+
115
+
116
+ def normalize_audio_input(input_path: str, out_dir: str) -> str:
117
+ """Convert mic/upload input to 16kHz mono WAV (what Omni expects).
118
+ Returns path to normalized file."""
119
+ out_path = os.path.join(out_dir, f"in_{uuid.uuid4().hex[:8]}.wav")
120
+ subprocess.run(
121
+ ["ffmpeg", "-y", "-i", input_path,
122
+ "-ar", "16000", "-ac", "1", "-acodec", "pcm_s16le", out_path],
123
+ capture_output=True, check=True,
124
+ )
125
+ return out_path
126
+
127
+
128
+ def audio_file_to_data_uri(path: str) -> str:
129
+ b64 = base64.b64encode(open(path, "rb").read()).decode()
130
+ return f"data:audio/wav;base64,{b64}"
131
+
132
+
133
+ # ==========================================
134
+ # CORE: Qwen-Omni audio -> translated speech (one call)
135
+ # ==========================================
136
+ def omni_audio_to_speech(client: OpenAI,
137
+ audio_path: str,
138
+ source_hint: str,
139
+ target_hint: str,
140
+ voice: str,
141
+ out_dir: str) -> tuple:
142
+ """Qwen-Omni: take source-language audio, output translated-language speech.
143
+ Returns (wav_path, transcript_text, error_or_None)."""
144
+
145
+ audio_uri = audio_file_to_data_uri(audio_path)
146
+
147
+ sys_prompt = (
148
+ f"You are a live football commentary translator. "
149
+ f"The user will speak in {source_hint}. "
150
+ f"Listen carefully and respond by speaking the equivalent commentary in {target_hint}. "
151
+ f"Match the energy and excitement of live football commentary. "
152
+ f"Keep the same meaning. Do NOT add commentary of your own. "
153
+ f"Respond ONLY with the spoken {target_hint} translation."
154
+ )
155
+
156
+ try:
157
+ completion = client.chat.completions.create(
158
+ model=OMNI_MODEL,
159
+ messages=[
160
+ {"role": "system", "content": sys_prompt},
161
+ {"role": "user", "content": [
162
+ {"type": "input_audio",
163
+ "input_audio": {"data": audio_uri, "format": "wav"}},
164
+ {"type": "text",
165
+ "text": f"Translate this commentary into {target_hint} and speak it."},
166
+ ]},
167
+ ],
168
+ modalities=["text", "audio"],
169
+ audio={"voice": voice, "format": "wav"},
170
+ stream=True,
171
+ stream_options={"include_usage": True},
172
+ )
173
+
174
+ audio_parts, text_parts = [], []
175
+ for event in completion:
176
+ if not event.choices:
177
+ continue
178
+ delta = event.choices[0].delta
179
+ if hasattr(delta, "content") and delta.content:
180
+ text_parts.append(delta.content)
181
+ if hasattr(delta, "audio") and delta.audio:
182
+ if isinstance(delta.audio, dict) and "data" in delta.audio:
183
+ audio_parts.append(delta.audio["data"])
184
+ elif hasattr(delta.audio, "data") and delta.audio.data:
185
+ audio_parts.append(delta.audio.data)
186
+
187
+ transcript = "".join(text_parts).strip()
188
+ if not audio_parts:
189
+ return None, transcript, "No audio received from Qwen-Omni"
190
+
191
+ out_wav = os.path.join(out_dir, f"out_{uuid.uuid4().hex[:8]}.wav")
192
+ base64_to_wav("".join(audio_parts), out_wav)
193
+ return out_wav, transcript, None
194
+
195
+ except Exception as e:
196
+ return None, "", f"Qwen-Omni error: {e}"
197
+
198
+
199
+ # ==========================================
200
+ # CORE: Qwen-Omni audio -> translated text (for YourVoic pipeline)
201
+ # ==========================================
202
+ def omni_audio_to_text(client: OpenAI,
203
+ audio_path: str,
204
+ source_hint: str,
205
+ target_hint: str) -> tuple:
206
+ """Audio in source language -> text in target language."""
207
+ audio_uri = audio_file_to_data_uri(audio_path)
208
+
209
+ sys_prompt = (
210
+ f"You are a translator. The user will speak in {source_hint}. "
211
+ f"Translate what they say into {target_hint}. "
212
+ f"Output ONLY the {target_hint} translation as plain text. No commentary, no quotes."
213
+ )
214
+
215
+ try:
216
+ completion = client.chat.completions.create(
217
+ model=OMNI_MODEL,
218
+ messages=[
219
+ {"role": "system", "content": sys_prompt},
220
+ {"role": "user", "content": [
221
+ {"type": "input_audio",
222
+ "input_audio": {"data": audio_uri, "format": "wav"}},
223
+ {"type": "text",
224
+ "text": f"Translate into {target_hint}."},
225
+ ]},
226
+ ],
227
+ modalities=["text"],
228
+ )
229
+ text = completion.choices[0].message.content.strip()
230
+ return text, None
231
+ except Exception as e:
232
+ return "", f"Qwen-Omni translation error: {e}"
233
+
234
+
235
+ # ==========================================
236
+ # CORE: YourVoic text -> speech
237
+ # ==========================================
238
+ def yourvoic_speak(text: str,
239
+ target_language: str,
240
+ target_config: dict,
241
+ api_key: str,
242
+ out_dir: str) -> tuple:
243
+ """Call YourVoic to synthesize speech for African target languages.
244
+ Returns (wav_path, error_or_None)."""
245
+ yourvoic_lang = target_config["yourvoic_lang"]
246
+ voices_to_try = list(YOURVOIC_VOICE_MAP.get(target_language, ["Peter"]))
247
+ if "Peter" not in voices_to_try:
248
+ voices_to_try.append("Peter") # universal fallback
249
+
250
+ last_error = None
251
+ for voice in voices_to_try:
252
+ payload = {
253
+ "text": text,
254
+ "voice": voice,
255
+ "language": yourvoic_lang,
256
+ "model": YOURVOIC_MODEL,
257
+ "speed": 1.0,
258
+ }
259
+ try:
260
+ resp = http_requests.post(
261
+ YOURVOIC_TTS_URL,
262
+ json=payload,
263
+ headers={"X-API-Key": api_key, "Content-Type": "application/json"},
264
+ timeout=60,
265
+ )
266
+ if resp.status_code != 200:
267
+ last_error = f"YourVoic {resp.status_code}: {resp.text[:200]}"
268
+ # Try next voice only if it's a voice-name issue
269
+ if "voice" in resp.text.lower() or resp.status_code == 400:
270
+ continue
271
+ return None, last_error
272
+
273
+ # Save audio (MP3 or WAV), then normalize to WAV
274
+ ext = "mp3" if "mp3" in resp.headers.get("Content-Type", "").lower() else "wav"
275
+ raw_path = os.path.join(out_dir, f"yv_{uuid.uuid4().hex[:8]}.{ext}")
276
+
277
+ ctype = resp.headers.get("Content-Type", "")
278
+ if "application/json" in ctype:
279
+ data = resp.json()
280
+ audio_url = data.get("audio_url") or data.get("url")
281
+ if not audio_url:
282
+ return None, f"No audio URL in YourVoic response"
283
+ audio_resp = http_requests.get(audio_url, timeout=60)
284
+ with open(raw_path, "wb") as f:
285
+ f.write(audio_resp.content)
286
+ else:
287
+ with open(raw_path, "wb") as f:
288
+ f.write(resp.content)
289
+
290
+ wav_path = os.path.join(out_dir, f"yv_{uuid.uuid4().hex[:8]}.wav")
291
+ subprocess.run(
292
+ ["ffmpeg", "-y", "-i", raw_path,
293
+ "-ar", "24000", "-ac", "1", "-acodec", "pcm_s16le", wav_path],
294
+ capture_output=True, check=True,
295
+ )
296
+ return wav_path, None
297
+
298
+ except Exception as e:
299
+ last_error = f"YourVoic exception: {e}"
300
+ continue
301
+
302
+ return None, last_error or "YourVoic failed for all candidate voices"
303
+
304
+
305
+ # ==========================================
306
+ # PIPELINE
307
+ # ==========================================
308
+ def translate_chunk(audio_input,
309
+ source_language: str,
310
+ target_language: str,
311
+ qwen_voice_label: str):
312
+ """Main pipeline. Takes an audio file (mic or upload), returns translated audio.
313
+ Yields (audio_path, status_markdown, transcript_text) so the UI updates as work happens."""
314
+
315
+ t0 = time.time()
316
+
317
+ if audio_input is None:
318
+ yield None, "**Status:** no audio provided.", ""
319
+ return
320
+
321
+ ds_key = os.environ.get("DASHSCOPE_API_KEY", "")
322
+ if not ds_key:
323
+ yield None, "**Error:** `DASHSCOPE_API_KEY` not set in Space secrets.", ""
324
+ return
325
+
326
+ src_config = SOURCE_LANGUAGES[source_language]
327
+ tgt_config = TARGET_LANGUAGES[target_language]
328
+
329
+ client = OpenAI(api_key=ds_key, base_url=DASHSCOPE_BASE_URL)
330
+ work_dir = tempfile.mkdtemp(prefix="commentary_")
331
+
332
+ try:
333
+ yield None, "**Status:** normalizing input audio...", ""
334
+ norm_path = normalize_audio_input(audio_input, work_dir)
335
+
336
+ engine = tgt_config["engine"]
337
+
338
+ if engine == "qwen":
339
+ # Single Omni call: audio in -> translated speech out
340
+ yield None, f"**Status:** translating {source_language} -> {target_language} (Qwen-Omni, one call)...", ""
341
+ voice = voice_name(qwen_voice_label)
342
+ out_wav, transcript, err = omni_audio_to_speech(
343
+ client, norm_path,
344
+ src_config["omni_hint"], tgt_config["omni_hint"],
345
+ voice, work_dir,
346
+ )
347
+ if err:
348
+ yield None, f"**Error:** {err}", transcript
349
+ return
350
+
351
+ elapsed = time.time() - t0
352
+ yield out_wav, (
353
+ f"**Done in {elapsed:.1f}s** — {source_language} → {target_language} "
354
+ f"via Qwen-Omni (voice: {voice})"
355
+ ), transcript
356
+
357
+ elif engine == "yourvoic":
358
+ # Two-step: Omni audio->text, then YourVoic text->speech
359
+ yv_key = os.environ.get("YOURVOIC_API_KEY", "")
360
+ if not yv_key:
361
+ yield None, "**Error:** `YOURVOIC_API_KEY` not set (required for Swahili/Amharic/Afrikaans).", ""
362
+ return
363
+
364
+ yield None, f"**Status:** {source_language} audio → {target_language} text via Qwen-Omni...", ""
365
+ translated_text, err = omni_audio_to_text(
366
+ client, norm_path,
367
+ src_config["omni_hint"], tgt_config["omni_hint"],
368
+ )
369
+ if err or not translated_text:
370
+ yield None, f"**Error:** {err or 'empty translation'}", translated_text
371
+ return
372
+
373
+ yield None, f"**Status:** speaking {target_language} via YourVoic...", translated_text
374
+ out_wav, yv_err = yourvoic_speak(
375
+ translated_text, target_language, tgt_config, yv_key, work_dir,
376
+ )
377
+ if yv_err:
378
+ yield None, f"**Error:** {yv_err}", translated_text
379
+ return
380
+
381
+ elapsed = time.time() - t0
382
+ yield out_wav, (
383
+ f"**Done in {elapsed:.1f}s** — {source_language} → {target_language} "
384
+ f"via Qwen-Omni (translate) + YourVoic (speak)"
385
+ ), translated_text
386
+
387
+ else:
388
+ yield None, f"**Error:** unknown engine '{engine}' for {target_language}", ""
389
+
390
+ except subprocess.CalledProcessError as e:
391
+ yield None, f"**Error:** ffmpeg failed: {e.stderr.decode()[:300] if e.stderr else e}", ""
392
+ except Exception as e:
393
+ yield None, f"**Error:** {e}", ""
394
+
395
+
396
+ # ==========================================
397
+ # GRADIO UI
398
+ # ==========================================
399
+ DESCRIPTION = """
400
+ # Live Football Commentary Translator
401
+
402
+ Speak (or upload) commentary in one language — hear it in another.
403
+
404
+ **Sources:** English, Scottish English, German, Spanish, Arabic
405
+ **Targets:** all of the above + Swahili, Amharic, Afrikaans
406
+
407
+ Latency on free ZeroGPU: ~3-8 seconds per utterance. Speak in short bursts (one play at a time).
408
+ """
409
+
410
+
411
+ def on_target_change(target_lang_choice):
412
+ """Show Qwen voice picker only for Qwen-target languages."""
413
+ cfg = TARGET_LANGUAGES.get(target_lang_choice, {})
414
+ if cfg.get("engine") == "qwen":
415
+ return gr.update(visible=True)
416
+ return gr.update(visible=False)
417
+
418
+
419
+ with gr.Blocks(title="Live Football Commentary Translator") as demo:
420
+ gr.Markdown(DESCRIPTION)
421
+
422
+ with gr.Row():
423
+ with gr.Column(scale=1):
424
+ gr.Markdown("### 1. Languages")
425
+ source_lang = gr.Dropdown(
426
+ choices=list(SOURCE_LANGUAGES.keys()),
427
+ value="English",
428
+ label="Source (what the commentator speaks)",
429
+ )
430
+ target_lang = gr.Dropdown(
431
+ choices=list(TARGET_LANGUAGES.keys()),
432
+ value="Swahili",
433
+ label="Target (what you want to hear)",
434
+ )
435
+ qwen_voice = gr.Dropdown(
436
+ choices=QWEN_VOICES,
437
+ value=QWEN_VOICES[0],
438
+ label="Voice (Qwen targets only)",
439
+ visible=False,
440
+ )
441
+
442
+ gr.Markdown("### 2. Input")
443
+ with gr.Tabs():
444
+ with gr.Tab("Live microphone"):
445
+ mic_input = gr.Audio(
446
+ sources=["microphone"],
447
+ type="filepath",
448
+ label="Speak your commentary (short bursts, 5-15s each)",
449
+ )
450
+ mic_btn = gr.Button("Translate microphone clip", variant="primary")
451
+ with gr.Tab("Upload file"):
452
+ file_input = gr.Audio(
453
+ sources=["upload"],
454
+ type="filepath",
455
+ label="Upload an audio clip (.wav, .mp3, .m4a, etc.)",
456
+ )
457
+ file_btn = gr.Button("Translate uploaded clip", variant="primary")
458
+
459
+ with gr.Column(scale=1):
460
+ gr.Markdown("### 3. Translated commentary")
461
+ status = gr.Markdown(value="*Waiting for input...*")
462
+ audio_output = gr.Audio(
463
+ label="Translated audio",
464
+ type="filepath",
465
+ autoplay=True,
466
+ )
467
+ transcript = gr.Textbox(
468
+ label="Translated text (when available)",
469
+ lines=4,
470
+ interactive=False,
471
+ )
472
+
473
+ # Show/hide Qwen voice picker based on target language
474
+ target_lang.change(
475
+ fn=on_target_change,
476
+ inputs=target_lang,
477
+ outputs=qwen_voice,
478
+ )
479
+ demo.load(
480
+ fn=on_target_change,
481
+ inputs=target_lang,
482
+ outputs=qwen_voice,
483
+ )
484
+
485
+ mic_btn.click(
486
+ fn=translate_chunk,
487
+ inputs=[mic_input, source_lang, target_lang, qwen_voice],
488
+ outputs=[audio_output, status, transcript],
489
+ )
490
+ file_btn.click(
491
+ fn=translate_chunk,
492
+ inputs=[file_input, source_lang, target_lang, qwen_voice],
493
+ outputs=[audio_output, status, transcript],
494
+ )
495
+
496
+ gr.Markdown(
497
+ "---\n"
498
+ "**Architecture note:** Qwen-Omni (`qwen3.5-omni-plus`) handles audio-in → "
499
+ "translated-speech-out in a single call for languages it covers. "
500
+ "For Swahili / Amharic / Afrikaans, Omni translates to text and YourVoic speaks it."
501
+ )
502
+
503
+
504
+ if __name__ == "__main__":
505
+ demo.launch()
requirements.txt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ gradio>=4.44.0
2
+ openai>=1.40.0
3
+ requests>=2.31.0