Instructions to use nobrand/KULLM-R with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use nobrand/KULLM-R with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="nobrand/KULLM-R")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("nobrand/KULLM-R")
model = AutoModelForCausalLM.from_pretrained("nobrand/KULLM-R")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use nobrand/KULLM-R with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "nobrand/KULLM-R"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nobrand/KULLM-R",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/nobrand/KULLM-R

SGLang

How to use nobrand/KULLM-R with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "nobrand/KULLM-R" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nobrand/KULLM-R",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "nobrand/KULLM-R" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nobrand/KULLM-R",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use nobrand/KULLM-R with Docker Model Runner:
```
docker model run hf.co/nobrand/KULLM-R
```

nobrand commited on Aug 6, 2025

Commit

1d23fa2

verified ·

1 Parent(s): 136f303

Update README.md

Browse files

Files changed (1) hide show

README.md +3 -4

README.md CHANGED Viewed

@@ -12,7 +12,7 @@ language:
 # KULLM-R
-Introducing KULLM-R, a large language model specialized for high-level reasoning queries in Korean, with a particular focus on complex mathematical problems. The model is designed to provide both the correct reasoning paths and answers for such queries, offering enhanced reasoning efficiency and language transferability to Korean compared to general-purpose reasoning models. It leverages reinforcement learning strategies for effective path exploration and Korean-specific generation.
 ## Model Details
@@ -34,7 +34,7 @@ KULLM-R is distinguished from standard reasoning LLMs based on Qwen3-8B by its f
 - **Reasoning Efficiency Aware Reinforcement Learning**: Introduces RL techniques considering both reasoning path efficiency and answer correctness, reducing unnecessary steps while maintaining answer quality.
 - **Reasoning Path Pruning**: Specialized for high-difficulty reasoning problems by pruning ineffective paths and emphasizing transparency and readability in generated answers.
 - **Support High Readability in Korean System**: Enhanced both logical reasoning and natural Korean expression ability in answer.
-- **Adaptive Length Penalty**: Adaptive penalties optimize the reasoning process according to the question’s complexity and reasoning path length, ensuring efficient solutions for complex problems.
 ## Data & Training Process
@@ -105,7 +105,7 @@ print("content:", content)
 ```
 > [!NOTE]
-> As mentioned in Qwen3, use `Temperature=0.6`, `TopP=0.95`, `TopK=20`, and `MinP=0` (the default setting in `generation_config.json`). **DO NOT use greedy decoding**, as it can lead to performance degradation and endless repetitions. For more detailed guidance, please refer to the [Best Practices](#best-practices) section.
 ## Evaluation
@@ -128,7 +128,6 @@ print("content:", content)
 ## Citation
 ```
 @misc{KULLM-R2025,
   title   = {KULLM-R: Korea University Large Language Model for Reasoning},

 # KULLM-R
+Introducing KULLM-R, a large language model specialized for high-level reasoning queries in Korean, with a particular focus on complex mathematical problems. The model is designed to provide both the correct reasoning paths and answers for such queries, offering enhanced reasoning efficiency and language transferability to Korean compared to general-purpose reasoning models. Reinforcement learning strategy is employed for efficient reasoning path exploration and Korean-specific generation.
 ## Model Details
 - **Reasoning Efficiency Aware Reinforcement Learning**: Introduces RL techniques considering both reasoning path efficiency and answer correctness, reducing unnecessary steps while maintaining answer quality.
 - **Reasoning Path Pruning**: Specialized for high-difficulty reasoning problems by pruning ineffective paths and emphasizing transparency and readability in generated answers.
 - **Support High Readability in Korean System**: Enhanced both logical reasoning and natural Korean expression ability in answer.
+- **Adaptive Length Penalty**: Adaptive penalties optimize the reasoning process according to the question’s complexity and difficulty, ensuring efficient solutions for various math problems.
 ## Data & Training Process
 ```
 > [!NOTE]
+> As mentioned in Qwen3, use `Temperature=0.6`, `TopP=0.95`, `TopK=20`, and `MinP=0` (the default setting in `generation_config.json`). **DO NOT use greedy decoding**, as it can lead to performance degradation and endless repetitions.
 ## Evaluation
 ## Citation
 ```
 @misc{KULLM-R2025,
   title   = {KULLM-R: Korea University Large Language Model for Reasoning},