Instructions to use CastIronMind/stentor_python_30m with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use CastIronMind/stentor_python_30m with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="CastIronMind/stentor_python_30m")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("CastIronMind/stentor_python_30m")
model = AutoModelForCausalLM.from_pretrained("CastIronMind/stentor_python_30m")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use CastIronMind/stentor_python_30m with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "CastIronMind/stentor_python_30m"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "CastIronMind/stentor_python_30m",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/CastIronMind/stentor_python_30m

SGLang

How to use CastIronMind/stentor_python_30m with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "CastIronMind/stentor_python_30m" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "CastIronMind/stentor_python_30m",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "CastIronMind/stentor_python_30m" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "CastIronMind/stentor_python_30m",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use CastIronMind/stentor_python_30m with Docker Model Runner:
```
docker model run hf.co/CastIronMind/stentor_python_30m
```

stentor_python_30m / README.md

Stanislav

Update README.md

13960d6 verified 3 months ago

preview code

raw

history blame contribute delete

4.25 kB

	---
	license: apache-2.0
	language:
	- en
	base_model:
	- StentorLabs/Stentor-30M
	pipeline_tag: text-generation
	tags:
	- python
	- code-generation
	- tiny-model
	- code
	library_name: transformers
	---
	Model Description

	Stentor Python 30M is a compact language model specifically fine-tuned for Python code generation and autocompletion tasks. Based on the Stentor-30M architecture, this model contains 30 million parameters and is designed to run efficiently on resource-constrained devices including mobile phones and embedded systems.

	Model Details

	- Developed by: Experimental fine-tuning project
	- Model type: Causal language model (LlamaForCausalLM)
	- Language: Python code, English instructions
	- Parameters: 30,419,712
	- Context length: 512 tokens
	- Model size: 60 MB (FP16), 30 MB (INT8)
	- License: Apache 2.0

	Training Data

	The model was fine-tuned on a curated dataset of 872 Python examples, including:

	- Basic algorithms (factorial, prime numbers, list operations)
	- Class implementations (Stack, BankAccount, Rectangle, Circle)
	- Recursive functions (quicksort, Fibonacci)
	- String manipulation (palindrome, anagram, vowel counting)
	- MBPP (Mostly Basic Python Problems) dataset tasks

	All examples follow a consistent format with "### Task:" instruction and "### Solution:" code block.

	Training Process

	The fine-tuning process involved multiple stages:

	1. Base model: Stentor-30M
	2. Initial fine-tuning on 50k examples
	3. Multiple correction rounds with progressively lower learning rates
	4. Final detoxification training with learning rate 3e-7 to remove undesirable patterns

	Evaluation Results

	The model was evaluated on several test categories:

	\| Category \| Pass Rate \| Notes \|
	\|----------\|-----------\|-------\|
	\| Basic functions \| 80% \| Factorial, prime check, etc. \|
	\| Classes from training set \| 100% \| Stack, BankAccount, Rectangle \|
	\| New complex classes \| 33% \| Graph, Queue, inheritance \|
	\| Function signatures (MBPP) \| 100% \| Correctly generates def statements \|

	Capabilities

	- Generates Python functions from natural language descriptions
	- Implements basic algorithms (factorial, prime check, palindrome)
	- Creates class definitions with methods (Stack, BankAccount, Rectangle)
	- Handles recursive functions (quicksort, Fibonacci)
	- Produces syntactically correct function signatures

	Limitations

	- May produce repeated or redundant code after the main solution
	- Struggles with complex data structures (graphs, trees, queues)
	- Does not reliably handle class inheritance patterns
	- Can generate incorrect list indexing operations
	- May continue generating text beyond the intended solution
	- Limited to 512 token context window
	- Not suitable for production use without output post-processing

	Recommended Use Cases

	- Code autocompletion in lightweight IDEs
	- Educational tool for Python beginners
	- Rapid prototyping of simple functions
	- Embedded systems with limited computational resources
	- Offline code assistance on mobile devices

	Not Recommended For

	- Complex algorithm implementation
	- Production code generation without human review
	- Tasks requiring deep contextual understanding
	- Generating large codebases
	- Security-critical applications

	Usage Example

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM

	model_path = "stas122/stentor_python_30m"
	tokenizer = AutoTokenizer.from_pretrained(model_path)
	model = AutoModelForCausalLM.from_pretrained(model_path)

	prompt = "### Task: Write a function that checks if a number is even\n\n### Solution:\n"
	inputs = tokenizer(prompt, return_tensors="pt")
	outputs = model.generate(**inputs, max_new_tokens=100, temperature=0.2)
	response = tokenizer.decode(outputs[0], skip_special_tokens=True)
	```

	Ethical Considerations

	This model is intended for educational and development assistance purposes. Users should verify all generated code before deployment, particularly for security-sensitive applications. The model may occasionally produce incorrect or inefficient code and should not be relied upon as the sole source of truth for programming tasks.

	Contact

	For questions or feedback about this model, please open an issue on the Hugging Face repository.