Spaces:

PlotweaverModel
/

Live_Commentary_Streaming_App

Sleeping

App Files Files Community

PlotweaverModel commited on 13 days ago

Commit

0ca353f

verified ·

1 Parent(s): a1e429b

Upload 3 files

Browse files

Files changed (3) hide show

README.md +67 -7
app.py +505 -0
requirements.txt +3 -0

README.md CHANGED Viewed

@@ -1,13 +1,73 @@
 ---
-title: Live Commentary Streaming App
-emoji: 🌖
-colorFrom: yellow
-colorTo: green
 sdk: gradio
-sdk_version: 6.14.0
-python_version: '3.13'
 app_file: app.py
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: Live Football Commentary Translator
+emoji: ⚽
+colorFrom: green
+colorTo: blue
 sdk: gradio
+sdk_version: 4.44.0
 app_file: app.py
 pinned: false
+license: apache-2.0
 ---
+# Live Football Commentary Translator
+Speak (or upload) commentary in one language, hear it spoken in another.
+## What this is
+A proof-of-concept HuggingFace Space that takes short audio clips of football
+commentary and returns the same commentary in a different language, spoken with
+appropriate energy.
+- **Sources:** English, Scottish English, German, Spanish, Arabic
+- **Targets:** English, Scottish English, German, Spanish, Arabic, Swahili,
+  Amharic, Afrikaans
+## How it works
+Two pipelines, routed by target language:
+| Target language | Pipeline | Cost |
+|---|---|---|
+| English, Scottish-EN, German, Spanish, Arabic | Single Qwen-Omni call: audio in → translated speech out | 1 API call |
+| Swahili, Amharic, Afrikaans | Qwen-Omni (audio → translated text), then YourVoic (text → speech) | 2 API calls |
+Qwen-Omni is `qwen3.5-omni-plus` on DashScope International. YourVoic is the
+fallback for languages Qwen-Omni doesn't cover natively. This split exists
+because Qwen-Omni does not produce intelligible speech in Swahili, Amharic,
+or Afrikaans on its own.
+## Deploy
+1. Create a new HuggingFace Space, SDK = Gradio
+2. Upload `app.py`, `requirements.txt`, and this `README.md`
+3. Add secrets in **Settings → Variables and secrets**:
+   - `DASHSCOPE_API_KEY` (required) — get one from DashScope International
+   - `YOURVOIC_API_KEY` (required for Swahili/Amharic/Afrikaans only)
+4. (Recommended) Set hardware to **ZeroGPU** if you have access. CPU also works
+   but will be slower on the audio-decode steps.
+## Expected latency
+On free ZeroGPU, expect 3-8 seconds from end-of-speech to start-of-output. The
+demo is designed to feel "live-ish" but not simultaneous-interpretation grade.
+Speak in short bursts — one play, one tackle, one moment — rather than long
+monologues.
+## Known limitations
+- "Scottish English" is treated as accented English in the system prompt rather
+  than a separate language. Qwen-Omni's Scottish accent is decent but not
+  authentic.
+- YourVoic voice support per language is sparsely documented. The code falls
+  back to a universal voice ("Peter") if the primary choice fails.
+- Arabic voice cloning is intentionally not exposed — the underlying
+  `qwen3-tts-vc` model doesn't support Arabic.
+- Free-tier ZeroGPU has cold-start delays. First call after idle is slower.
+## Files
+- `app.py` — Gradio UI and pipeline
+- `requirements.txt` — Python dependencies
+- `README.md` — this file (also the Space metadata header)

app.py ADDED Viewed

	@@ -0,0 +1,505 @@

+"""
+Live Football Commentary Translator
+====================================
+Audio in (live commentator) -> Translate -> Audio out (target language).
+Architecture:
+  - Qwen-Omni (qwen3.5-omni-plus) handles audio-in -> translated-speech-out
+    in ONE call for languages it covers (English, German, Spanish, Arabic,
+    Scottish-accented English).
+  - For African target languages (Swahili, Amharic, Afrikaans), Qwen-Omni
+    does audio -> translated text, then YourVoic does text -> speech.
+Deploy as a Hugging Face Space (SDK: Gradio). Add these secrets:
+  - DASHSCOPE_API_KEY  (required, for Qwen-Omni)
+  - YOURVOIC_API_KEY   (required for Swahili/Amharic/Afrikaans targets)
+"""
+import os
+import base64
+import json
+import struct
+import subprocess
+import tempfile
+import time
+import uuid
+import gradio as gr
+import requests as http_requests
+from openai import OpenAI
+# ==========================================
+# CONFIGURATION
+# ==========================================
+OMNI_MODEL = "qwen3.5-omni-plus"
+DASHSCOPE_BASE_URL = "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
+YOURVOIC_TTS_URL = "https://yourvoic.com/api/v1/tts/generate"
+# ==========================================
+# LANGUAGES
+# ==========================================
+# Sources: what the commentator speaks
+SOURCE_LANGUAGES = {
+    "English":              {"code": "en", "omni_hint": "English"},
+    "Scottish English":     {"code": "en-scot", "omni_hint": "Scottish-accented English"},
+    "German":               {"code": "de", "omni_hint": "German"},
+    "Spanish":              {"code": "es", "omni_hint": "Spanish"},
+    "Arabic":               {"code": "ar", "omni_hint": "Arabic"},
+}
+# Targets: routed by engine
+# "qwen"     -> Qwen-Omni does audio-in -> translated-speech-out in one call
+# "yourvoic" -> Qwen-Omni does audio-in -> translated-text, YourVoic speaks it
+TARGET_LANGUAGES = {
+    "English":           {"engine": "qwen",     "omni_hint": "English"},
+    "Scottish English":  {"engine": "qwen",     "omni_hint": "Scottish-accented English"},
+    "German":            {"engine": "qwen",     "omni_hint": "German"},
+    "Spanish":           {"engine": "qwen",     "omni_hint": "Spanish"},
+    "Arabic":            {"engine": "qwen",     "omni_hint": "Arabic"},
+    "Swahili":           {"engine": "yourvoic", "omni_hint": "Swahili",   "yourvoic_lang": "sw-KE"},
+    "Amharic":           {"engine": "yourvoic", "omni_hint": "Amharic",   "yourvoic_lang": "am-ET"},
+    "Afrikaans":         {"engine": "yourvoic", "omni_hint": "Afrikaans", "yourvoic_lang": "af-ZA"},
+}
+# Voice options for Qwen-Omni targets (sticking to ones that read well for commentary)
+QWEN_VOICES = [
+    "Ethan -- Warm, energetic (good default)",
+    "Ryan -- Dramatic, rhythmic (good for live action)",
+    "Cherry -- Sunny, friendly",
+    "Jennifer -- Cinematic narrator",
+    "Vincent -- Rich, theatrical",
+    "Bellona -- Strong, commanding",
+]
+# YourVoic voices per target language (best effort; YourVoic docs are sparse,
+# so 'Peter' is kept as a universal fallback as in the reference codebase).
+YOURVOIC_VOICE_MAP = {
+    "Swahili":   ["Peter"],
+    "Amharic":   ["Peter"],
+    "Afrikaans": ["Peter"],
+}
+YOURVOIC_MODEL = "aura-prime"  # balanced quality/speed
+# ==========================================
+# HELPERS
+# ==========================================
+def voice_name(label: str) -> str:
+    return label.split("--")[0].strip()
+def base64_to_wav(b64_data: str, output_path: str) -> None:
+    """Qwen-Omni returns base64 PCM. Wrap it in a WAV container."""
+    audio_bytes = base64.b64decode(b64_data)
+    sr, nc, bps = 24000, 1, 16
+    br = sr * nc * bps // 8
+    ba = nc * bps // 8
+    ds = len(audio_bytes)
+    with open(output_path, "wb") as f:
+        f.write(b"RIFF")
+        f.write(struct.pack("<I", 36 + ds))
+        f.write(b"WAVE")
+        f.write(b"fmt ")
+        f.write(struct.pack("<I", 16))
+        f.write(struct.pack("<H", 1))
+        f.write(struct.pack("<H", nc))
+        f.write(struct.pack("<I", sr))
+        f.write(struct.pack("<I", br))
+        f.write(struct.pack("<H", ba))
+        f.write(struct.pack("<H", bps))
+        f.write(b"data")
+        f.write(struct.pack("<I", ds))
+        f.write(audio_bytes)
+def normalize_audio_input(input_path: str, out_dir: str) -> str:
+    """Convert mic/upload input to 16kHz mono WAV (what Omni expects).
+    Returns path to normalized file."""
+    out_path = os.path.join(out_dir, f"in_{uuid.uuid4().hex[:8]}.wav")
+    subprocess.run(
+        ["ffmpeg", "-y", "-i", input_path,
+         "-ar", "16000", "-ac", "1", "-acodec", "pcm_s16le", out_path],
+        capture_output=True, check=True,
+    )
+    return out_path
+def audio_file_to_data_uri(path: str) -> str:
+    b64 = base64.b64encode(open(path, "rb").read()).decode()
+    return f"data:audio/wav;base64,{b64}"
+# ==========================================
+# CORE: Qwen-Omni audio -> translated speech (one call)
+# ==========================================
+def omni_audio_to_speech(client: OpenAI,
+                         audio_path: str,
+                         source_hint: str,
+                         target_hint: str,
+                         voice: str,
+                         out_dir: str) -> tuple:
+    """Qwen-Omni: take source-language audio, output translated-language speech.
+    Returns (wav_path, transcript_text, error_or_None)."""
+    audio_uri = audio_file_to_data_uri(audio_path)
+    sys_prompt = (
+        f"You are a live football commentary translator. "
+        f"The user will speak in {source_hint}. "
+        f"Listen carefully and respond by speaking the equivalent commentary in {target_hint}. "
+        f"Match the energy and excitement of live football commentary. "
+        f"Keep the same meaning. Do NOT add commentary of your own. "
+        f"Respond ONLY with the spoken {target_hint} translation."
+    )
+    try:
+        completion = client.chat.completions.create(
+            model=OMNI_MODEL,
+            messages=[
+                {"role": "system", "content": sys_prompt},
+                {"role": "user", "content": [
+                    {"type": "input_audio",
+                     "input_audio": {"data": audio_uri, "format": "wav"}},
+                    {"type": "text",
+                     "text": f"Translate this commentary into {target_hint} and speak it."},
+                ]},
+            ],
+            modalities=["text", "audio"],
+            audio={"voice": voice, "format": "wav"},
+            stream=True,
+            stream_options={"include_usage": True},
+        )
+        audio_parts, text_parts = [], []
+        for event in completion:
+            if not event.choices:
+                continue
+            delta = event.choices[0].delta
+            if hasattr(delta, "content") and delta.content:
+                text_parts.append(delta.content)
+            if hasattr(delta, "audio") and delta.audio:
+                if isinstance(delta.audio, dict) and "data" in delta.audio:
+                    audio_parts.append(delta.audio["data"])
+                elif hasattr(delta.audio, "data") and delta.audio.data:
+                    audio_parts.append(delta.audio.data)
+        transcript = "".join(text_parts).strip()
+        if not audio_parts:
+            return None, transcript, "No audio received from Qwen-Omni"
+        out_wav = os.path.join(out_dir, f"out_{uuid.uuid4().hex[:8]}.wav")
+        base64_to_wav("".join(audio_parts), out_wav)
+        return out_wav, transcript, None
+    except Exception as e:
+        return None, "", f"Qwen-Omni error: {e}"
+# ==========================================
+# CORE: Qwen-Omni audio -> translated text (for YourVoic pipeline)
+# ==========================================
+def omni_audio_to_text(client: OpenAI,
+                       audio_path: str,
+                       source_hint: str,
+                       target_hint: str) -> tuple:
+    """Audio in source language -> text in target language."""
+    audio_uri = audio_file_to_data_uri(audio_path)
+    sys_prompt = (
+        f"You are a translator. The user will speak in {source_hint}. "
+        f"Translate what they say into {target_hint}. "
+        f"Output ONLY the {target_hint} translation as plain text. No commentary, no quotes."
+    )
+    try:
+        completion = client.chat.completions.create(
+            model=OMNI_MODEL,
+            messages=[
+                {"role": "system", "content": sys_prompt},
+                {"role": "user", "content": [
+                    {"type": "input_audio",
+                     "input_audio": {"data": audio_uri, "format": "wav"}},
+                    {"type": "text",
+                     "text": f"Translate into {target_hint}."},
+                ]},
+            ],
+            modalities=["text"],
+        )
+        text = completion.choices[0].message.content.strip()
+        return text, None
+    except Exception as e:
+        return "", f"Qwen-Omni translation error: {e}"
+# ==========================================
+# CORE: YourVoic text -> speech
+# ==========================================
+def yourvoic_speak(text: str,
+                   target_language: str,
+                   target_config: dict,
+                   api_key: str,
+                   out_dir: str) -> tuple:
+    """Call YourVoic to synthesize speech for African target languages.
+    Returns (wav_path, error_or_None)."""
+    yourvoic_lang = target_config["yourvoic_lang"]
+    voices_to_try = list(YOURVOIC_VOICE_MAP.get(target_language, ["Peter"]))
+    if "Peter" not in voices_to_try:
+        voices_to_try.append("Peter")  # universal fallback
+    last_error = None
+    for voice in voices_to_try:
+        payload = {
+            "text": text,
+            "voice": voice,
+            "language": yourvoic_lang,
+            "model": YOURVOIC_MODEL,
+            "speed": 1.0,
+        }
+        try:
+            resp = http_requests.post(
+                YOURVOIC_TTS_URL,
+                json=payload,
+                headers={"X-API-Key": api_key, "Content-Type": "application/json"},
+                timeout=60,
+            )
+            if resp.status_code != 200:
+                last_error = f"YourVoic {resp.status_code}: {resp.text[:200]}"
+                # Try next voice only if it's a voice-name issue
+                if "voice" in resp.text.lower() or resp.status_code == 400:
+                    continue
+                return None, last_error
+            # Save audio (MP3 or WAV), then normalize to WAV
+            ext = "mp3" if "mp3" in resp.headers.get("Content-Type", "").lower() else "wav"
+            raw_path = os.path.join(out_dir, f"yv_{uuid.uuid4().hex[:8]}.{ext}")
+            ctype = resp.headers.get("Content-Type", "")
+            if "application/json" in ctype:
+                data = resp.json()
+                audio_url = data.get("audio_url") or data.get("url")
+                if not audio_url:
+                    return None, f"No audio URL in YourVoic response"
+                audio_resp = http_requests.get(audio_url, timeout=60)
+                with open(raw_path, "wb") as f:
+                    f.write(audio_resp.content)
+            else:
+                with open(raw_path, "wb") as f:
+                    f.write(resp.content)
+            wav_path = os.path.join(out_dir, f"yv_{uuid.uuid4().hex[:8]}.wav")
+            subprocess.run(
+                ["ffmpeg", "-y", "-i", raw_path,
+                 "-ar", "24000", "-ac", "1", "-acodec", "pcm_s16le", wav_path],
+                capture_output=True, check=True,
+            )
+            return wav_path, None
+        except Exception as e:
+            last_error = f"YourVoic exception: {e}"
+            continue
+    return None, last_error or "YourVoic failed for all candidate voices"
+# ==========================================
+# PIPELINE
+# ==========================================
+def translate_chunk(audio_input,
+                    source_language: str,
+                    target_language: str,
+                    qwen_voice_label: str):
+    """Main pipeline. Takes an audio file (mic or upload), returns translated audio.
+    Yields (audio_path, status_markdown, transcript_text) so the UI updates as work happens."""
+    t0 = time.time()
+    if audio_input is None:
+        yield None, "**Status:** no audio provided.", ""
+        return
+    ds_key = os.environ.get("DASHSCOPE_API_KEY", "")
+    if not ds_key:
+        yield None, "**Error:** `DASHSCOPE_API_KEY` not set in Space secrets.", ""
+        return
+    src_config = SOURCE_LANGUAGES[source_language]
+    tgt_config = TARGET_LANGUAGES[target_language]
+    client = OpenAI(api_key=ds_key, base_url=DASHSCOPE_BASE_URL)
+    work_dir = tempfile.mkdtemp(prefix="commentary_")
+    try:
+        yield None, "**Status:** normalizing input audio...", ""
+        norm_path = normalize_audio_input(audio_input, work_dir)
+        engine = tgt_config["engine"]
+        if engine == "qwen":
+            # Single Omni call: audio in -> translated speech out
+            yield None, f"**Status:** translating {source_language} -> {target_language} (Qwen-Omni, one call)...", ""
+            voice = voice_name(qwen_voice_label)
+            out_wav, transcript, err = omni_audio_to_speech(
+                client, norm_path,
+                src_config["omni_hint"], tgt_config["omni_hint"],
+                voice, work_dir,
+            )
+            if err:
+                yield None, f"**Error:** {err}", transcript
+                return
+            elapsed = time.time() - t0
+            yield out_wav, (
+                f"**Done in {elapsed:.1f}s** — {source_language} → {target_language} "
+                f"via Qwen-Omni (voice: {voice})"
+            ), transcript
+        elif engine == "yourvoic":
+            # Two-step: Omni audio->text, then YourVoic text->speech
+            yv_key = os.environ.get("YOURVOIC_API_KEY", "")
+            if not yv_key:
+                yield None, "**Error:** `YOURVOIC_API_KEY` not set (required for Swahili/Amharic/Afrikaans).", ""
+                return
+            yield None, f"**Status:** {source_language} audio → {target_language} text via Qwen-Omni...", ""
+            translated_text, err = omni_audio_to_text(
+                client, norm_path,
+                src_config["omni_hint"], tgt_config["omni_hint"],
+            )
+            if err or not translated_text:
+                yield None, f"**Error:** {err or 'empty translation'}", translated_text
+                return
+            yield None, f"**Status:** speaking {target_language} via YourVoic...", translated_text
+            out_wav, yv_err = yourvoic_speak(
+                translated_text, target_language, tgt_config, yv_key, work_dir,
+            )
+            if yv_err:
+                yield None, f"**Error:** {yv_err}", translated_text
+                return
+            elapsed = time.time() - t0
+            yield out_wav, (
+                f"**Done in {elapsed:.1f}s** — {source_language} → {target_language} "
+                f"via Qwen-Omni (translate) + YourVoic (speak)"
+            ), translated_text
+        else:
+            yield None, f"**Error:** unknown engine '{engine}' for {target_language}", ""
+    except subprocess.CalledProcessError as e:
+        yield None, f"**Error:** ffmpeg failed: {e.stderr.decode()[:300] if e.stderr else e}", ""
+    except Exception as e:
+        yield None, f"**Error:** {e}", ""
+# ==========================================
+# GRADIO UI
+# ==========================================
+DESCRIPTION = """
+# Live Football Commentary Translator
+Speak (or upload) commentary in one language — hear it in another.
+**Sources:** English, Scottish English, German, Spanish, Arabic
+**Targets:** all of the above + Swahili, Amharic, Afrikaans
+Latency on free ZeroGPU: ~3-8 seconds per utterance. Speak in short bursts (one play at a time).
+"""
+def on_target_change(target_lang_choice):
+    """Show Qwen voice picker only for Qwen-target languages."""
+    cfg = TARGET_LANGUAGES.get(target_lang_choice, {})
+    if cfg.get("engine") == "qwen":
+        return gr.update(visible=True)
+    return gr.update(visible=False)
+with gr.Blocks(title="Live Football Commentary Translator") as demo:
+    gr.Markdown(DESCRIPTION)
+    with gr.Row():
+        with gr.Column(scale=1):
+            gr.Markdown("### 1. Languages")
+            source_lang = gr.Dropdown(
+                choices=list(SOURCE_LANGUAGES.keys()),
+                value="English",
+                label="Source (what the commentator speaks)",
+            )
+            target_lang = gr.Dropdown(
+                choices=list(TARGET_LANGUAGES.keys()),
+                value="Swahili",
+                label="Target (what you want to hear)",
+            )
+            qwen_voice = gr.Dropdown(
+                choices=QWEN_VOICES,
+                value=QWEN_VOICES[0],
+                label="Voice (Qwen targets only)",
+                visible=False,
+            )
+            gr.Markdown("### 2. Input")
+            with gr.Tabs():
+                with gr.Tab("Live microphone"):
+                    mic_input = gr.Audio(
+                        sources=["microphone"],
+                        type="filepath",
+                        label="Speak your commentary (short bursts, 5-15s each)",
+                    )
+                    mic_btn = gr.Button("Translate microphone clip", variant="primary")
+                with gr.Tab("Upload file"):
+                    file_input = gr.Audio(
+                        sources=["upload"],
+                        type="filepath",
+                        label="Upload an audio clip (.wav, .mp3, .m4a, etc.)",
+                    )
+                    file_btn = gr.Button("Translate uploaded clip", variant="primary")
+        with gr.Column(scale=1):
+            gr.Markdown("### 3. Translated commentary")
+            status = gr.Markdown(value="*Waiting for input...*")
+            audio_output = gr.Audio(
+                label="Translated audio",
+                type="filepath",
+                autoplay=True,
+            )
+            transcript = gr.Textbox(
+                label="Translated text (when available)",
+                lines=4,
+                interactive=False,
+            )
+    # Show/hide Qwen voice picker based on target language
+    target_lang.change(
+        fn=on_target_change,
+        inputs=target_lang,
+        outputs=qwen_voice,
+    )
+    demo.load(
+        fn=on_target_change,
+        inputs=target_lang,
+        outputs=qwen_voice,
+    )
+    mic_btn.click(
+        fn=translate_chunk,
+        inputs=[mic_input, source_lang, target_lang, qwen_voice],
+        outputs=[audio_output, status, transcript],
+    )
+    file_btn.click(
+        fn=translate_chunk,
+        inputs=[file_input, source_lang, target_lang, qwen_voice],
+        outputs=[audio_output, status, transcript],
+    )
+    gr.Markdown(
+        "---\n"
+        "**Architecture note:** Qwen-Omni (`qwen3.5-omni-plus`) handles audio-in → "
+        "translated-speech-out in a single call for languages it covers. "
+        "For Swahili / Amharic / Afrikaans, Omni translates to text and YourVoic speaks it."
+    )
+if __name__ == "__main__":
+    demo.launch()

requirements.txt ADDED Viewed

	@@ -0,0 +1,3 @@

+gradio>=4.44.0
+openai>=1.40.0
+requests>=2.31.0