Instructions to use pcuenq/Hunyuan-7B-Instruct-tokenizer with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use pcuenq/Hunyuan-7B-Instruct-tokenizer with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="pcuenq/Hunyuan-7B-Instruct-tokenizer", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("pcuenq/Hunyuan-7B-Instruct-tokenizer", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use pcuenq/Hunyuan-7B-Instruct-tokenizer with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "pcuenq/Hunyuan-7B-Instruct-tokenizer" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "pcuenq/Hunyuan-7B-Instruct-tokenizer", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/pcuenq/Hunyuan-7B-Instruct-tokenizer
- SGLang
How to use pcuenq/Hunyuan-7B-Instruct-tokenizer with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "pcuenq/Hunyuan-7B-Instruct-tokenizer" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "pcuenq/Hunyuan-7B-Instruct-tokenizer", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "pcuenq/Hunyuan-7B-Instruct-tokenizer" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "pcuenq/Hunyuan-7B-Instruct-tokenizer", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use pcuenq/Hunyuan-7B-Instruct-tokenizer with Docker Model Runner:
docker model run hf.co/pcuenq/Hunyuan-7B-Instruct-tokenizer
pcuenq/Hunyuan-7B-Instruct-tokenizer
This is a transformers fast tokenizer for mlx-community/Hunyuan-7B-Instruct-3bit
Conversion
We used this code to convert the tokenizer from the original tiktoken format:
from huggingface_hub import snapshot_download
from tokenization_hy import *
from tokenizers import normalizers
from transformers import PreTrainedTokenizerFast
from transformers.convert_slow_tokenizer import TikTokenConverter
snapshot_download(
"mlx-community/Hunyuan-7B-Instruct-3bit",
local_dir=".",
allow_patterns=["hy.tiktoken", "tokenization_hy.py", "tokenizer_config.json", "special_tokens_map.json"]
)
original = HYTokenizer.from_pretrained(".")
converter = TikTokenConverter(
vocab_file="hy.tiktoken",
pattern=PAT_STR,
additional_special_tokens=[t[1] for t in SPECIAL_TOKENS],
)
converted = converter.converted()
converted.normalizer = normalizers.NFC()
t_fast = PreTrainedTokenizerFast(
tokenizer_object=converted,
model_input_names=original.model_input_names,
model_max_length=256*1024,
clean_up_tokenization_spaces=False,
)
t_fast.chat_template = original.chat_template
t_fast.push_to_hub("Hunyuan-7B-Instruct-tokenizer")
Verification
from datasets import load_dataset
from tqdm import tqdm
from tokenization_hy import HYTokenizer
from transformers import AutoTokenizer
original = HYTokenizer.from_pretrained("mlx-community/Hunyuan-7B-Instruct-3bit")
t_fast = AutoTokenizer.from_pretrained("pcuenq/Hunyuan-7B-Instruct-tokenizer")
# Testing on XNLI
xnli = load_dataset("xnli", "all_languages", split="validation")
def verify(lang, text):
encoded_original = original.encode(text)
encoded_fast = t_fast.encode(text)
assert encoded_fast == encoded_original, f"Fast encode error: {lang} - {text}"
decoded = original.decode(encoded_original)
decoded_fast = t_fast.decode(encoded_fast, skip_special_tokens=True)
assert decoded_fast == decoded, f"Fast decode error: {lang} - {text}"
for p in tqdm(xnli["premise"]):
for lang, text in p.items():
verify(lang, text)
# Testing on codeparrot subset
ds = load_dataset("codeparrot/github-code", streaming=True, trust_remote_code=True, split="train")
iterator = iter(ds)
for _ in tqdm(range(1000)):
item = next(iterator)
code = item["code"]
lang = item["language"]
verify(lang, code)
- Downloads last month
- 3