Instructions to use onnx-internal-testing/tiny-random-SolarOpenForCausalLM with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use onnx-internal-testing/tiny-random-SolarOpenForCausalLM with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="onnx-internal-testing/tiny-random-SolarOpenForCausalLM")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("onnx-internal-testing/tiny-random-SolarOpenForCausalLM")
model = AutoModelForCausalLM.from_pretrained("onnx-internal-testing/tiny-random-SolarOpenForCausalLM")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use onnx-internal-testing/tiny-random-SolarOpenForCausalLM with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "onnx-internal-testing/tiny-random-SolarOpenForCausalLM"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "onnx-internal-testing/tiny-random-SolarOpenForCausalLM",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/onnx-internal-testing/tiny-random-SolarOpenForCausalLM

SGLang

How to use onnx-internal-testing/tiny-random-SolarOpenForCausalLM with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "onnx-internal-testing/tiny-random-SolarOpenForCausalLM" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "onnx-internal-testing/tiny-random-SolarOpenForCausalLM",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "onnx-internal-testing/tiny-random-SolarOpenForCausalLM" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "onnx-internal-testing/tiny-random-SolarOpenForCausalLM",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use onnx-internal-testing/tiny-random-SolarOpenForCausalLM with Docker Model Runner:
```
docker model run hf.co/onnx-internal-testing/tiny-random-SolarOpenForCausalLM
```

Upload optimized ONNX model

by Xenova HF Staff - opened Mar 17

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+31

-6

Files changed (8) hide show

.gitattributes +2 -0
config.json +12 -2
onnx/model.onnx +3 -0
onnx/model.onnx_data +3 -0
onnx/model_fp16.onnx +3 -0
onnx/model_fp16.onnx_data +3 -0
tokenizer.json +2 -2
tokenizer_config.json +3 -2

.gitattributes CHANGED Viewed

@@ -34,3 +34,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
 tokenizer.json filter=lfs diff=lfs merge=lfs -text

 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
 tokenizer.json filter=lfs diff=lfs merge=lfs -text
+onnx/model.onnx_data filter=lfs diff=lfs merge=lfs -text
+onnx/model_fp16.onnx_data filter=lfs diff=lfs merge=lfs -text

config.json CHANGED Viewed

@@ -40,5 +40,15 @@
   "topk_group": 1,
   "transformers_version": "5.3.0.dev0",
   "use_cache": true,
-  "vocab_size": 196608
-}

   "topk_group": 1,
   "transformers_version": "5.3.0.dev0",
   "use_cache": true,
+  "vocab_size": 196608,
+  "transformers.js_config": {
+    "use_external_data_format": {
+      "model.onnx": 1,
+      "model_fp16.onnx": 1
+    },
+    "kv_cache_dtype": {
+      "q4f16": "float16",
+      "fp16": "float16"
+    }
+  }
+}

onnx/model.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:967dda2921a89875247566085ad30416d8528b13efa72b7bd0b6d7006952690f
+size 28131

onnx/model.onnx_data ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f46f084490f614f8ce52414af2d0f9aa618aa51d7ff5eedd79356826a3d05605
+size 109275136

onnx/model_fp16.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4f136d90bb4fa3d89e1403feae29f73400ff6828882c7303f2107699365a0538
+size 28458

onnx/model_fp16.onnx_data ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8396a6d9c4f5b387166df642a8985665b882a19b4993f7ae5d9299217674a9fa
+size 54636544

tokenizer.json CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:b7a8c81670e4651cc290fbe0ab397e0357abfa933964b9b6f56f508557c04c19
-size 16473485

 version https://git-lfs.github.com/spec/v1
+oid sha256:d59e17300101efe9dc2b75ca5af4f596d0f1ac73c89b4135f10b1c560218127e
+size 15653483

tokenizer_config.json CHANGED Viewed

@@ -13,5 +13,6 @@
   "padding_side": "left",
   "split_special_tokens": false,
   "tokenizer_class": "TokenizersBackend",
-  "unk_token": "<unk>"
-}

   "padding_side": "left",
   "split_special_tokens": false,
   "tokenizer_class": "TokenizersBackend",
+  "unk_token": "<unk>",
+  "chat_template": "{#- ======== Template Parameters ========  #}\n{%- set add_generation_prompt = add_generation_prompt if add_generation_prompt is defined else true %}\n{%- set default_system_prompt = default_system_prompt if default_system_prompt is defined else true %}\n{%- set reasoning_effort = reasoning_effort if reasoning_effort is defined else \"high\" %}\n{%- set think_render_option = think_render_option if think_render_option is defined else \"lastthink\" %}\n\n{#- ======== System Block State ========  #}\n{%- set sys_ns = namespace(is_first_block=true) -%}\n\n{#- ======== Find last user message index ========  #}\n{%- set last_user_idx = namespace(value=-1) -%}\n{%- for message in messages -%}\n    {%- if message.role == 'user' -%}\n        {%- set last_user_idx.value = loop.index0 -%}\n    {%- endif -%}\n{%- endfor -%}\n\n{#- ======== System messages renderers ========  #}\n{%- macro render_system_message(user_system_messages) %}\n    {%- if default_system_prompt %}\n        {%- if not sys_ns.is_first_block %}{{- \"\\n\\n\" }}{%- endif %}\n        {%- set sys_ns.is_first_block = false %}\n        {{- \"## Provider System Prompt\\n\\nYou are Solar Open 100B, a large language model trained by Upstage AI, a Korean startup. Your knowledge cutoff is 2025-07. The current date is \" + strftime_now(\"%Y-%m-%d\") + \".\" }}\n    {%- endif -%}\n    {%- if user_system_messages %}\n        {%- if not sys_ns.is_first_block %}{{- \"\\n\\n\" }}{%- endif %}\n        {%- set sys_ns.is_first_block = false %}\n        {{- \"## System Prompt\" }}\n        {%- for system_message in user_system_messages %}\n            {{- \"\\n\\n\" }}\n            {{- system_message }}\n        {%- endfor %}\n    {%- endif -%}\n{%- endmacro %}\n\n{%- macro render_tool_instruction(tools) %}\n    {%- if not sys_ns.is_first_block %}{{- \"\\n\\n\" }}{%- endif %}\n    {%- set sys_ns.is_first_block = false %}\n    {{- \"## Tools\\n\\n### Tool Call Instruction\" }}\n    {{- \"\\nYou may invoke one or more tools to assist with the user's query. Available tools are provided in JSON Schema format: <|tools:begin|><|tool:begin|><tools-json-object><|tool:end|>...<|tools:end|>\\n\" }}\n    {{- \"\\n### Available Tools\\n\" }}\n    {{- \"<|tools:begin|>\" }}\n    {%- for tool in tools %}\n        {{- \"<|tool:begin|>\" }}\n        {{- tool.function | tojson }}\n        {{- \"<|tool:end|>\" }}\n    {%- endfor %}\n    {{- \"<|tools:end|>\\n\" }}\n    {{- \"\\n### Tool Call Format\\n\" }}\n    {{- \"For each tool call, return a JSON object with the following structure, enclosed within <|tool_call:begin|> and <|tool_call:end|> tags: \\n<|tool_call:begin|><tool-call-id><|tool_call:name|><tool-name><|tool_call:args|><args-json-object><|tool_call:end|>\\n\" }}\n    {{- \"- The <tool-call-id> must be a randomly generated string consisting of 10 lowercase letters (a-z) and/or digits (0-9) (e.g., a1b2c3d4e5)\\n\" }}\n    {{- \"\\n### Tool Response Format\\n\" }}\n    {{- \"Each tool is responded by `tool` with the following structure:\\n<|tool_response:id|><tool-call-id><|tool_response:name|><tool-name><|tool_response:result|><results><|tool_response:end|>\\n\" }}\n    {{- \"- Ensure the <tool-call-id> matches the corresponding tool call\" -}}\n{%- endmacro %}\n\n{%- macro render_json_response_format_instruction(response_format) %}\n    {%- if not sys_ns.is_first_block %}{{- \"\\n\\n\" }}{%- endif %}\n    {%- set sys_ns.is_first_block = false %}\n    {{- \"## Output Format Constraint\" }}\n    {{- \"\\n\\nYour final response should follow the JSON schema: \\n[Start of schema]\" }}\n    {{- response_format }}\n    {{- \"\\n[End of schema]\\nPlease ensure your answers adhere to this format and do not contain any unnecessary text.\" }}\n{%- endmacro %}\n\n{%- macro get_tool_name(messages, tool_call_id) %}\n    {%- for msg in messages -%}\n        {%- if msg.role == 'assistant' and msg.tool_calls -%}\n            {%- for tool_call in msg.tool_calls -%}\n                {%- if tool_call.id == tool_call_id -%}\n                    {{- tool_call.function.name }}\n                {%- endif -%}\n            {%- endfor -%}\n        {%- endif -%}\n    {%- endfor -%}\n{%- endmacro %}\n\n{%- macro render_tool_arguments(tool_arguments) %}\n    {%- if tool_arguments is mapping -%}\n        {{- tool_arguments | tojson }}\n    {%- else -%}\n        {{- tool_arguments }}\n    {%- endif -%}\n{%- endmacro %}\n\n{#- ======== Render system message ========  #}\n{%- set ns = namespace(system_messages=[]) -%}\n{%- for message in messages -%}\n    {%- if message.role == 'system' -%}\n        {%- set ns.system_messages = ns.system_messages + [message.content] -%}\n    {%- endif -%}\n{%- endfor -%}\n\n{%- if ns.system_messages or default_system_prompt or tools or response_format -%}\n    {{- \"<|begin|>system<|content|>\" }}\n        {{- render_system_message(ns.system_messages) }}\n        {%- if tools -%}\n            {{- render_tool_instruction(tools) }}\n        {%- endif %}\n        {%- if response_format -%}\n            {{- render_json_response_format_instruction(response_format) }}\n        {%- endif %}\n    {{- \"<|end|>\" }}\n{%- endif -%}\n\n{#- ======== Render main messages ========  #}\n{%- for message in messages -%}\n    {%- if message.role == 'user' -%}\n         {{- \"<|begin|>user<|content|>\" + message.content + \"<|end|>\" }}\n    {%- elif message.role == 'tool' -%}\n        {%- set prev_is_tool = loop.index0 > 0 and messages[loop.index0 - 1].role == 'tool' -%}\n        {%- set next_is_tool = loop.index0 < (messages | length - 1) and messages[loop.index0 + 1].role == 'tool' -%}\n        {%- if not prev_is_tool -%}\n            {{- \"<|begin|>tool<|tool_response|>\" }}\n        {%- endif -%}\n        {{- \"<|tool_response:begin|>\" + message.tool_call_id + \"<|tool_response:name|>\" }}\n        {{- get_tool_name(messages, message.tool_call_id) }}\n        {{- \"<|tool_response:result|>\" }}\n        {{- message.content }}\n        {{- \"<|tool_response:end|>\" }}\n        {%- if not next_is_tool -%}\n            {{- \"<|end|>\" }}\n        {%- endif -%}\n    {%- elif message.role == 'assistant' -%}\n        {#- ======== Assistant Thinking ========  #}\n        {%- if think_render_option == \"all\" -%}\n            {%- if message.reasoning -%}\n                {{- \"<|begin|>assistant<|think|>\" + message.reasoning + \"<|end|>\" }}\n            {%- endif -%}\n        {%- elif think_render_option == \"lastthink\" -%}\n            {%- if message.reasoning and loop.index0 > last_user_idx.value -%}\n                {{- \"<|begin|>assistant<|think|>\" + message.reasoning + \"<|end|>\" }}\n            {%- endif -%}\n        {%- endif -%}\n\n        {#- ======== Assistant Messages ========  #}\n        {%- if message.tool_calls -%}\n            {{- \"<|begin|>assistant<|tool_calls|>\" }}\n            {%- for tool_call in message.tool_calls -%}\n                {{- \"<|tool_call:begin|>\" + tool_call.id +\"<|tool_call:name|>\" + tool_call.function.name + \"<|tool_call:args|>\" }}\n                {{- render_tool_arguments(tool_call.function.arguments) }}\n                {{- \"<|tool_call:end|>\" }}\n            {%- endfor -%}\n            {{- \"<|calls|>\" }}\n        {%- else -%}\n            {{- \"<|begin|>assistant<|content|>\" + message.content + \"<|end|>\" }}\n        {%- endif -%}\n    {%- endif -%}\n{%- endfor -%}\n\n{%- if add_generation_prompt -%}\n    {%- if reasoning_effort in [\"low\", \"minimal\"] -%}\n        {{- \"<|begin|>assistant<|think|><|end|>\" }}\n    {%- endif -%}\n    {{- \"<|begin|>assistant\" }}\n{%- endif -%}\n"
+}