Text Generation
Transformers
Safetensors
Korean
English
qwen3
reasoning
LLMs
Korean
conversational
text-generation-inference
Instructions to use nobrand/KULLM-R with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use nobrand/KULLM-R with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="nobrand/KULLM-R") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("nobrand/KULLM-R") model = AutoModelForCausalLM.from_pretrained("nobrand/KULLM-R") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use nobrand/KULLM-R with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "nobrand/KULLM-R" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nobrand/KULLM-R", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/nobrand/KULLM-R
- SGLang
How to use nobrand/KULLM-R with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "nobrand/KULLM-R" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nobrand/KULLM-R", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "nobrand/KULLM-R" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nobrand/KULLM-R", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use nobrand/KULLM-R with Docker Model Runner:
docker model run hf.co/nobrand/KULLM-R
Update README.md
Browse files
README.md
CHANGED
|
@@ -12,7 +12,7 @@ language:
|
|
| 12 |
|
| 13 |
# KULLM-R
|
| 14 |
|
| 15 |
-
Introducing KULLM-R, a large language model specialized for high-level reasoning queries in Korean, with a particular focus on complex mathematical problems. The model is designed to provide both the correct reasoning paths and answers for such queries, offering enhanced reasoning efficiency and language transferability to Korean compared to general-purpose reasoning models.
|
| 16 |
|
| 17 |
|
| 18 |
## Model Details
|
|
@@ -34,7 +34,7 @@ KULLM-R is distinguished from standard reasoning LLMs based on Qwen3-8B by its f
|
|
| 34 |
- **Reasoning Efficiency Aware Reinforcement Learning**: Introduces RL techniques considering both reasoning path efficiency and answer correctness, reducing unnecessary steps while maintaining answer quality.
|
| 35 |
- **Reasoning Path Pruning**: Specialized for high-difficulty reasoning problems by pruning ineffective paths and emphasizing transparency and readability in generated answers.
|
| 36 |
- **Support High Readability in Korean System**: Enhanced both logical reasoning and natural Korean expression ability in answer.
|
| 37 |
-
- **Adaptive Length Penalty**: Adaptive penalties optimize the reasoning process according to the question’s complexity and
|
| 38 |
|
| 39 |
|
| 40 |
## Data & Training Process
|
|
@@ -105,7 +105,7 @@ print("content:", content)
|
|
| 105 |
```
|
| 106 |
|
| 107 |
> [!NOTE]
|
| 108 |
-
> As mentioned in Qwen3, use `Temperature=0.6`, `TopP=0.95`, `TopK=20`, and `MinP=0` (the default setting in `generation_config.json`). **DO NOT use greedy decoding**, as it can lead to performance degradation and endless repetitions.
|
| 109 |
|
| 110 |
|
| 111 |
## Evaluation
|
|
@@ -128,7 +128,6 @@ print("content:", content)
|
|
| 128 |
|
| 129 |
|
| 130 |
## Citation
|
| 131 |
-
|
| 132 |
```
|
| 133 |
@misc{KULLM-R2025,
|
| 134 |
title = {KULLM-R: Korea University Large Language Model for Reasoning},
|
|
|
|
| 12 |
|
| 13 |
# KULLM-R
|
| 14 |
|
| 15 |
+
Introducing KULLM-R, a large language model specialized for high-level reasoning queries in Korean, with a particular focus on complex mathematical problems. The model is designed to provide both the correct reasoning paths and answers for such queries, offering enhanced reasoning efficiency and language transferability to Korean compared to general-purpose reasoning models. Reinforcement learning strategy is employed for efficient reasoning path exploration and Korean-specific generation.
|
| 16 |
|
| 17 |
|
| 18 |
## Model Details
|
|
|
|
| 34 |
- **Reasoning Efficiency Aware Reinforcement Learning**: Introduces RL techniques considering both reasoning path efficiency and answer correctness, reducing unnecessary steps while maintaining answer quality.
|
| 35 |
- **Reasoning Path Pruning**: Specialized for high-difficulty reasoning problems by pruning ineffective paths and emphasizing transparency and readability in generated answers.
|
| 36 |
- **Support High Readability in Korean System**: Enhanced both logical reasoning and natural Korean expression ability in answer.
|
| 37 |
+
- **Adaptive Length Penalty**: Adaptive penalties optimize the reasoning process according to the question’s complexity and difficulty, ensuring efficient solutions for various math problems.
|
| 38 |
|
| 39 |
|
| 40 |
## Data & Training Process
|
|
|
|
| 105 |
```
|
| 106 |
|
| 107 |
> [!NOTE]
|
| 108 |
+
> As mentioned in Qwen3, use `Temperature=0.6`, `TopP=0.95`, `TopK=20`, and `MinP=0` (the default setting in `generation_config.json`). **DO NOT use greedy decoding**, as it can lead to performance degradation and endless repetitions.
|
| 109 |
|
| 110 |
|
| 111 |
## Evaluation
|
|
|
|
| 128 |
|
| 129 |
|
| 130 |
## Citation
|
|
|
|
| 131 |
```
|
| 132 |
@misc{KULLM-R2025,
|
| 133 |
title = {KULLM-R: Korea University Large Language Model for Reasoning},
|