🍎 functiongemma-270m-it-4bit-mlx

google/functiongemma-270m-it converted to MLX format

QuantLLM Format Quantization

⭐ Star QuantLLM on GitHub


πŸ“– About This Model

This model is google/functiongemma-270m-it converted to MLX format optimized for Apple Silicon (M1/M2/M3/M4) Macs with native acceleration.

Property Value
Base Model google/functiongemma-270m-it
Format MLX
Quantization Q4_K_M
License apache-2.0
Created With QuantLLM

πŸš€ Quick Start

Generate Text with mlx-lm

from mlx_lm import load, generate

# Load the model
model, tokenizer = load("QuantLLM/functiongemma-270m-it-4bit-mlx")

# Simple generation
prompt = "Explain quantum computing in simple terms"
messages = [{"role": "user", "content": prompt}]
prompt_formatted = tokenizer.apply_chat_template(
    messages, 
    add_generation_prompt=True
)

# Generate response
text = generate(model, tokenizer, prompt=prompt_formatted, verbose=True)
print(text)

Streaming Generation

from mlx_lm import load, stream_generate

model, tokenizer = load("QuantLLM/functiongemma-270m-it-4bit-mlx")

prompt = "Write a haiku about coding"
messages = [{"role": "user", "content": prompt}]
prompt_formatted = tokenizer.apply_chat_template(
    messages, 
    add_generation_prompt=True
)

# Stream tokens as they're generated
for token in stream_generate(model, tokenizer, prompt=prompt_formatted, max_tokens=200):
    print(token, end="", flush=True)

Command Line Interface

# Install mlx-lm
pip install mlx-lm

# Generate text
python -m mlx_lm.generate --model QuantLLM/functiongemma-270m-it-4bit-mlx --prompt "Hello!"

# Interactive chat
python -m mlx_lm.chat --model QuantLLM/functiongemma-270m-it-4bit-mlx

System Requirements

Requirement Minimum
Chip Apple Silicon (M1/M2/M3/M4)
macOS 13.0 (Ventura) or later
Python 3.10+
RAM 8GB+ (16GB recommended)
# Install dependencies
pip install mlx-lm

πŸ“Š Model Details

Property Value
Original Model google/functiongemma-270m-it
Format MLX
Quantization Q4_K_M
License apache-2.0
Export Date 2025-12-21
Exported By QuantLLM v2.0

πŸš€ Created with QuantLLM

QuantLLM

Convert any model to GGUF, ONNX, or MLX in one line!

from quantllm import turbo

# Load any HuggingFace model
model = turbo("google/functiongemma-270m-it")

# Export to any format
model.export("mlx", quantization="Q4_K_M")

# Push to HuggingFace
model.push("your-repo", format="mlx")
GitHub Stars

πŸ“š Documentation Β· πŸ› Report Issue Β· πŸ’‘ Request Feature

Downloads last month
8
Safetensors
Model size
0.3B params
Tensor type
F32
Β·
F16
Β·
I8
Β·
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for QuantLLM/functiongemma-270m-it-4bit-mlx

Finetuned
(414)
this model