jdopensource
/

JoyAI-LLM-Flash

Text Generation

joyai_llm_flash

Model card Files Files and versions

Mingke977 commited on 3 days ago

Commit

f4526ac

·

verified ·

1 Parent(s): 0fe2627

Update docs/deploy_guidance.md

Files changed (1) hide show

docs/deploy_guidance.md +2 -2

docs/deploy_guidance.md CHANGED Viewed

@@ -16,12 +16,12 @@ docker pull jdopensource/joyai-llm-vllm:v0.15.1-joyai_llm_flash
 2. launch JoyAI-LLM Flash model with dense MTP.
 ```bash
 # TP1 for memory efficiency
-vllm serve ${MODEL_PATH} --tp 1 --trust-remote-code \
    --tool-call-parser qwen3_coder --enable-auto-tool-choice \
    --speculative-config $'{"method": "mtp", "num_speculative_tokens": 3}'
 # TP8 for extreme speed and long context
-vllm serve ${MODEL_PATH} --tp 8 --trust-remote-code \
   --tool-call-parser qwen3_coder --enable-auto-tool-choice \
   --speculative-config $'{"method": "mtp", "num_speculative_tokens": 3}'
 ```

 2. launch JoyAI-LLM Flash model with dense MTP.
 ```bash
 # TP1 for memory efficiency
+vllm serve ${MODEL_PATH} -tp 1 --trust-remote-code \
    --tool-call-parser qwen3_coder --enable-auto-tool-choice \
    --speculative-config $'{"method": "mtp", "num_speculative_tokens": 3}'
 # TP8 for extreme speed and long context
+vllm serve ${MODEL_PATH} -tp 8 --trust-remote-code \
   --tool-call-parser qwen3_coder --enable-auto-tool-choice \
   --speculative-config $'{"method": "mtp", "num_speculative_tokens": 3}'
 ```