Local Models
Collection
16 items • Updated • 1
How to use cortexso/gemma with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="cortexso/gemma", filename="gemma-7b-it-q2_k.gguf", )
llm.create_chat_completion(
messages = [
{
"role": "user",
"content": "What is the capital of France?"
}
]
)How to use cortexso/gemma with llama.cpp:
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf cortexso/gemma:Q4_K_M # Run inference directly in the terminal: llama-cli -hf cortexso/gemma:Q4_K_M
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf cortexso/gemma:Q4_K_M # Run inference directly in the terminal: llama-cli -hf cortexso/gemma:Q4_K_M
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf cortexso/gemma:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf cortexso/gemma:Q4_K_M
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf cortexso/gemma:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf cortexso/gemma:Q4_K_M
docker model run hf.co/cortexso/gemma:Q4_K_M
How to use cortexso/gemma with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "cortexso/gemma"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "cortexso/gemma",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker model run hf.co/cortexso/gemma:Q4_K_M
How to use cortexso/gemma with Ollama:
ollama run hf.co/cortexso/gemma:Q4_K_M
How to use cortexso/gemma with Unsloth Studio:
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for cortexso/gemma to start chatting
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for cortexso/gemma to start chatting
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for cortexso/gemma to start chatting
How to use cortexso/gemma with Docker Model Runner:
docker model run hf.co/cortexso/gemma:Q4_K_M
How to use cortexso/gemma with Lemonade:
# Download Lemonade from https://lemonade-server.ai/ lemonade pull cortexso/gemma:Q4_K_M
lemonade run user.gemma-Q4_K_M
lemonade list
llm.create_chat_completion(
messages = [
{
"role": "user",
"content": "What is the capital of France?"
}
]
)The Gemma, state-of-the-art open model trained with the Gemma datasets that includes both synthetic data and the filtered publicly available websites data with a focus on high-quality and reasoning dense properties. The model belongs to the Gemma family with the 4B, 7B version in two variants 8K and 128K which is the context length (in tokens) that it can support.
| No | Variant | Cortex CLI command |
|---|---|---|
| 1 | Gemma-7b | cortex run gemma:7b |
cortexso/gemma
cortex run gemma
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="cortexso/gemma", filename="", )