Instructions to use pcuenq/Hunyuan-7B-Instruct-tokenizer with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use pcuenq/Hunyuan-7B-Instruct-tokenizer with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="pcuenq/Hunyuan-7B-Instruct-tokenizer", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("pcuenq/Hunyuan-7B-Instruct-tokenizer", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use pcuenq/Hunyuan-7B-Instruct-tokenizer with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "pcuenq/Hunyuan-7B-Instruct-tokenizer"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "pcuenq/Hunyuan-7B-Instruct-tokenizer",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/pcuenq/Hunyuan-7B-Instruct-tokenizer

SGLang

How to use pcuenq/Hunyuan-7B-Instruct-tokenizer with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "pcuenq/Hunyuan-7B-Instruct-tokenizer" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "pcuenq/Hunyuan-7B-Instruct-tokenizer",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "pcuenq/Hunyuan-7B-Instruct-tokenizer" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "pcuenq/Hunyuan-7B-Instruct-tokenizer",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use pcuenq/Hunyuan-7B-Instruct-tokenizer with Docker Model Runner:
```
docker model run hf.co/pcuenq/Hunyuan-7B-Instruct-tokenizer
```

pcuenq/Hunyuan-7B-Instruct-tokenizer

This is a transformers fast tokenizer for mlx-community/Hunyuan-7B-Instruct-3bit

Conversion

We used this code to convert the tokenizer from the original tiktoken format:

from huggingface_hub import snapshot_download
from tokenization_hy import *
from tokenizers import normalizers
from transformers import PreTrainedTokenizerFast
from transformers.convert_slow_tokenizer import TikTokenConverter

snapshot_download(
    "mlx-community/Hunyuan-7B-Instruct-3bit",
    local_dir=".",
    allow_patterns=["hy.tiktoken", "tokenization_hy.py", "tokenizer_config.json", "special_tokens_map.json"]
)

original = HYTokenizer.from_pretrained(".")

converter = TikTokenConverter(
    vocab_file="hy.tiktoken",
    pattern=PAT_STR,
    additional_special_tokens=[t[1] for t in SPECIAL_TOKENS],
)
converted = converter.converted()
converted.normalizer = normalizers.NFC()

t_fast = PreTrainedTokenizerFast(
    tokenizer_object=converted,
    model_input_names=original.model_input_names,
    model_max_length=256*1024,
    clean_up_tokenization_spaces=False,
)
t_fast.chat_template = original.chat_template
t_fast.push_to_hub("Hunyuan-7B-Instruct-tokenizer")

Verification

from datasets import load_dataset
from tqdm import tqdm
from tokenization_hy import HYTokenizer
from transformers import AutoTokenizer

original = HYTokenizer.from_pretrained("mlx-community/Hunyuan-7B-Instruct-3bit")
t_fast = AutoTokenizer.from_pretrained("pcuenq/Hunyuan-7B-Instruct-tokenizer")

# Testing on XNLI

xnli = load_dataset("xnli", "all_languages", split="validation")

def verify(lang, text):
    encoded_original = original.encode(text)
    encoded_fast = t_fast.encode(text)
    assert encoded_fast == encoded_original, f"Fast encode error: {lang} - {text}"
    decoded = original.decode(encoded_original)
    decoded_fast = t_fast.decode(encoded_fast, skip_special_tokens=True)
    assert decoded_fast == decoded, f"Fast decode error: {lang} - {text}"

for p in tqdm(xnli["premise"]):
    for lang, text in p.items():
        verify(lang, text)


# Testing on codeparrot subset

ds = load_dataset("codeparrot/github-code", streaming=True, trust_remote_code=True, split="train")

iterator = iter(ds)
for _ in tqdm(range(1000)):
    item = next(iterator)
    code = item["code"]
    lang = item["language"]
    verify(lang, code)

Downloads last month: 3

pcuenq
/

Hunyuan-7B-Instruct-tokenizer

pcuenq/Hunyuan-7B-Instruct-tokenizer

Conversion

Verification

Space using pcuenq/Hunyuan-7B-Instruct-tokenizer 1