| # Step-by-Step Setup and Usage Guide |
|
|
| Author: algorembrant |
|
|
| --- |
|
|
| ## Prerequisites |
|
|
| | Requirement | Minimum Version | Notes | |
| |----------------------|-----------------|--------------------------------------------| |
| | Python | 3.8 | 3.10+ recommended | |
| | pip | 21.0 | | |
| | Anthropic API Key | -- | Required for clean and summarize commands | |
|
|
| You need an Anthropic API key to use the `clean`, `summarize`, and `pipeline` commands. |
| Obtain one at: https://console.anthropic.com |
|
|
| --- |
|
|
| ## Step 1 β Get the Code |
|
|
| **Option A: Git clone** |
| ```bash |
| git clone https://github.com/algorembrant/youtube-transcript-toolkit.git |
| cd youtube-transcript-toolkit |
| ``` |
|
|
| **Option B: Download ZIP** |
| Download and unzip, then open a terminal inside the project folder. |
|
|
| --- |
|
|
| ## Step 2 β Create a Virtual Environment |
|
|
| **macOS / Linux** |
| ```bash |
| python3 -m venv .venv |
| source .venv/bin/activate |
| ``` |
|
|
| **Windows (Command Prompt)** |
| ```cmd |
| python -m venv .venv |
| .venv\Scripts\activate.bat |
| ``` |
|
|
| **Windows (PowerShell)** |
| ```powershell |
| python -m venv .venv |
| .venv\Scripts\Activate.ps1 |
| ``` |
|
|
| You should see `(.venv)` at the start of your terminal prompt. |
|
|
| --- |
|
|
| ## Step 3 β Install Dependencies |
|
|
| ```bash |
| pip install -r requirements.txt |
| ``` |
|
|
| Verify: |
| ```bash |
| pip show anthropic |
| pip show youtube-transcript-api |
| ``` |
|
|
| --- |
|
|
| ## Step 4 β Set Your Anthropic API Key |
|
|
| **macOS / Linux (current session)** |
| ```bash |
| export ANTHROPIC_API_KEY="sk-ant-your-key-here" |
| ``` |
|
|
| **macOS / Linux (permanent β add to shell profile)** |
| ```bash |
| echo 'export ANTHROPIC_API_KEY="sk-ant-your-key-here"' >> ~/.zshrc |
| source ~/.zshrc |
| ``` |
|
|
| **Windows (Command Prompt)** |
| ```cmd |
| set ANTHROPIC_API_KEY=sk-ant-your-key-here |
| ``` |
|
|
| **Windows (PowerShell)** |
| ```powershell |
| $env:ANTHROPIC_API_KEY = "sk-ant-your-key-here" |
| ``` |
|
|
| **Windows (permanent via System Settings)** |
| 1. Search "Environment Variables" in Start Menu |
| 2. Click "Edit the system environment variables" |
| 3. Add a new variable: `ANTHROPIC_API_KEY` = your key |
|
|
| The `fetch` and `list` commands do NOT require an API key. |
| Only `clean`, `summarize`, and `pipeline` need it. |
|
|
| --- |
|
|
| ## Step 5 β Run Your First Commands |
|
|
| ### Fetch a raw transcript (no API key needed) |
|
|
| ```bash |
| python main.py fetch "https://www.youtube.com/watch?v=dQw4w9WgXcQ" |
| ``` |
|
|
| ### See what languages are available |
|
|
| ```bash |
| python main.py list dQw4w9WgXcQ |
| ``` |
|
|
| ### Clean the transcript into paragraphs |
|
|
| ```bash |
| python main.py clean dQw4w9WgXcQ |
| ``` |
|
|
| ### Summarize the transcript |
|
|
| ```bash |
| python main.py summarize dQw4w9WgXcQ -m brief |
| python main.py summarize dQw4w9WgXcQ -m detailed |
| python main.py summarize dQw4w9WgXcQ -m bullets |
| python main.py summarize dQw4w9WgXcQ -m outline |
| ``` |
|
|
| ### Run the full pipeline (fetch + clean + summarize) |
|
|
| ```bash |
| python main.py pipeline dQw4w9WgXcQ -m bullets |
| ``` |
|
|
| --- |
|
|
| ## Step 6 β Save Output to Files |
|
|
| ### Single video β specify a file path |
|
|
| ```bash |
| python main.py clean dQw4w9WgXcQ -o cleaned.txt |
| python main.py summarize dQw4w9WgXcQ -m detailed -o summary.txt |
| ``` |
|
|
| ### Pipeline β specify a directory (creates 3 files per video) |
|
|
| ```bash |
| python main.py pipeline dQw4w9WgXcQ -o ./output/ |
| ``` |
|
|
| Files created: |
| ``` |
| ./output/ |
| dQw4w9WgXcQ_transcript.txt |
| dQw4w9WgXcQ_cleaned.txt |
| dQw4w9WgXcQ_summary.txt |
| ``` |
|
|
| ### Batch β multiple videos at once |
|
|
| ```bash |
| python main.py pipeline VIDEO_ID_1 VIDEO_ID_2 VIDEO_ID_3 -o ./batch_output/ |
| ``` |
|
|
| --- |
|
|
| ## Step 7 β Advanced Options |
|
|
| ### Use the higher-quality model |
|
|
| ```bash |
| python main.py clean dQw4w9WgXcQ --quality |
| python main.py summarize dQw4w9WgXcQ -m detailed --quality |
| ``` |
|
|
| Default model: `claude-haiku-4-5` (fast, cost-efficient) |
| Quality model: `claude-sonnet-4-6` (better for complex or long transcripts) |
|
|
| ### Disable streaming (show output only after completion) |
|
|
| ```bash |
| python main.py clean dQw4w9WgXcQ --no-stream |
| ``` |
|
|
| ### Request a non-English transcript |
|
|
| ```bash |
| python main.py clean dQw4w9WgXcQ -l ja # Japanese only |
| python main.py clean dQw4w9WgXcQ -l es en # Spanish, fall back to English |
| ``` |
|
|
| ### Fetch raw transcript as SRT or JSON |
|
|
| ```bash |
| python main.py fetch dQw4w9WgXcQ -f srt -o captions.srt |
| python main.py fetch dQw4w9WgXcQ -f json -o transcript.json |
| python main.py fetch dQw4w9WgXcQ -f vtt -o captions.vtt |
| ``` |
|
|
| ### Fetch with timestamps |
|
|
| ```bash |
| python main.py fetch dQw4w9WgXcQ -t |
| python main.py pipeline dQw4w9WgXcQ -t -o ./output/ |
| ``` |
|
|
| ### Pipeline β skip individual steps |
|
|
| ```bash |
| # Fetch and summarize without cleaning |
| python main.py pipeline dQw4w9WgXcQ --skip-clean -m bullets |
| |
| # Fetch and clean without summarizing |
| python main.py pipeline dQw4w9WgXcQ --skip-summary |
| ``` |
|
|
| --- |
|
|
| ## Troubleshooting |
|
|
| | Symptom | Likely Cause | Fix | |
| |---------|-------------|-----| |
| | `TranscriptsDisabled` error | Video owner disabled captions | Use a different video | |
| | `VideoUnavailable` error | Private, deleted, or region-locked | Check URL; try VPN if region-locked | |
| | `NoTranscriptFound` | Requested language missing | Run `list` to see available languages | |
| | `AuthenticationError` | API key missing or wrong | Check `ANTHROPIC_API_KEY` env variable | |
| | `ModuleNotFoundError` | Dependencies not installed | Run `pip install -r requirements.txt` | |
| | Chunking messages in stderr | Transcript very long | Normal β multi-pass processing is automatic | |
| | Output cuts off mid-sentence | max_tokens limit hit | This is rare; open an issue if it occurs | |
| |
| --- |
| |
| ## Project File Reference |
| |
| ``` |
| main.py CLI entry point β all five commands |
| fetcher.py YouTube direct caption API (no scraping) |
| cleaner.py AI paragraph reformatter |
| summarizer.py AI summarizer (4 modes) |
| pipeline.py Orchestrates the full fetch -> clean -> summarize chain |
| ai_client.py Anthropic API wrapper with chunking and streaming |
| config.py Constants: model names, chunk size, summary modes |
| requirements.txt Two dependencies |
| README.md Full project documentation |
| GUIDE.md This file |
| LICENSE MIT License |
| ``` |
| |