Mingke977 commited on
Commit
f4526ac
·
verified ·
1 Parent(s): 0fe2627

Update docs/deploy_guidance.md

Browse files
Files changed (1) hide show
  1. docs/deploy_guidance.md +2 -2
docs/deploy_guidance.md CHANGED
@@ -16,12 +16,12 @@ docker pull jdopensource/joyai-llm-vllm:v0.15.1-joyai_llm_flash
16
  2. launch JoyAI-LLM Flash model with dense MTP.
17
  ```bash
18
  # TP1 for memory efficiency
19
- vllm serve ${MODEL_PATH} --tp 1 --trust-remote-code \
20
  --tool-call-parser qwen3_coder --enable-auto-tool-choice \
21
  --speculative-config $'{"method": "mtp", "num_speculative_tokens": 3}'
22
 
23
  # TP8 for extreme speed and long context
24
- vllm serve ${MODEL_PATH} --tp 8 --trust-remote-code \
25
  --tool-call-parser qwen3_coder --enable-auto-tool-choice \
26
  --speculative-config $'{"method": "mtp", "num_speculative_tokens": 3}'
27
  ```
 
16
  2. launch JoyAI-LLM Flash model with dense MTP.
17
  ```bash
18
  # TP1 for memory efficiency
19
+ vllm serve ${MODEL_PATH} -tp 1 --trust-remote-code \
20
  --tool-call-parser qwen3_coder --enable-auto-tool-choice \
21
  --speculative-config $'{"method": "mtp", "num_speculative_tokens": 3}'
22
 
23
  # TP8 for extreme speed and long context
24
+ vllm serve ${MODEL_PATH} -tp 8 --trust-remote-code \
25
  --tool-call-parser qwen3_coder --enable-auto-tool-choice \
26
  --speculative-config $'{"method": "mtp", "num_speculative_tokens": 3}'
27
  ```