prithivMLmods
/

SmolLM2-Rethink-135M

Text Generation

text-generation-inference

Model card Files Files and versions

SmolLM2-Rethink-135M / README.md

prithivMLmods's picture

Update README.md

80a1918 verified 8 months ago

|

history blame contribute delete

3.33 kB

	---
	license: apache-2.0
	datasets:
	- sequelbox/Celestia3-DeepSeek-R1-0528
	base_model:
	- HuggingFaceTB/SmolLM2-135M-Instruct
	language:
	- en
	pipeline_tag: text-generation
	library_name: transformers
	tags:
	- trl
	- text-generation-inference
	- re-think
	- reasoning
	---

	![sm2.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/jhbqEfst42Fa2oDNZwTyV.png)

	# SmolLM2-Rethink-135M

	> SmolLM2-Rethink-135M is an experimental lightweight model trained on the Celestia3-DeepSeek-R1-0528 reasoning dataset. Based on the SmolLM2-135M-Instruct architecture, this model is specifically optimized for reasoning, structured outputs, and efficient small-scale deployment. Despite its compact size (135M parameters), it demonstrates strong capabilities in logical deduction, conversational coherence, and lightweight inference tasks.

	---

	## Key Highlights

	1. Compact & Efficient
	Lightweight architecture (135M) suitable for fast inference, mobile applications, and edge deployment.

	2. Reasoning-Centric Training
	Fine-tuned on high-quality reasoning and instruction datasets like Celestia3-DeepSeek-R1-0528, focusing on multi-step logical thinking.

	3. Low-Resource Optimization
	Designed to run effectively on CPUs or single-GPU setups with minimal memory footprint.

	4. Structured Outputs
	Supports generation of clean, structured content including lists, steps, tables, and JSON-like responses.

	---

	## Quickstart with 🤗 Transformers

	```python
	%%capture
	!pip install transformers
	```

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	checkpoint = "prithivMLmods/SmolLM2-Rethink-135M"
	device = "cuda" # or "cpu"

	tokenizer = AutoTokenizer.from_pretrained(checkpoint)
	model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)

	messages = [{"role": "user", "content": "What is gravity?"}]
	input_text = tokenizer.apply_chat_template(messages, tokenize=False)
	print(input_text)

	inputs = tokenizer.encode(input_text, return_tensors="pt").to(device)
	outputs = model.generate(
	inputs,
	max_new_tokens=1024,
	temperature=0.2,
	top_p=0.9,
	do_sample=True
	)

	print(tokenizer.decode(outputs[0]))
	```

	---

	## Intended Use

	* Instruction Following & QA
	Good for answering simple questions, following short instructions, and general user interactions.

	* Educational Tools
	Suitable for lightweight tutoring bots or classroom assistants on low-compute setups.

	* Reasoning Tasks
	Performs well on logic puzzles, multi-step reasoning, and chain-of-thought queries.

	* Prototype Agents & Microservices
	Can be deployed in memory-efficient environments or as modular AI components.

	---

	## Limitations

	1. Limited Knowledge Capacity
	Due to small parameter size, lacks the depth and breadth of large-scale models.

	2. Short-Term Context Handling
	Performs best with short to moderate-length prompts; lacks extended context support.

	3. Creative Generation Limitations
	Output may lack diversity or depth in open-ended storytelling or imaginative tasks.

	4. Token Budget
	Smaller output range; optimized for shorter and structured completions.

	5. Basic Multilingual Support
	Some support for multilingual input, but less accurate than larger multilingual models.