PingVortexLM
Collection
Series of our PingVortexLM models • 1 item • Updated • 1
How to use pvlabs/PingVortexLM1-20M-Base with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="pvlabs/PingVortexLM1-20M-Base") # Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("pvlabs/PingVortexLM1-20M-Base")
model = AutoModelForCausalLM.from_pretrained("pvlabs/PingVortexLM1-20M-Base")How to use pvlabs/PingVortexLM1-20M-Base with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "pvlabs/PingVortexLM1-20M-Base"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "pvlabs/PingVortexLM1-20M-Base",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker model run hf.co/pvlabs/PingVortexLM1-20M-Base
How to use pvlabs/PingVortexLM1-20M-Base with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "pvlabs/PingVortexLM1-20M-Base" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "pvlabs/PingVortexLM1-20M-Base",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "pvlabs/PingVortexLM1-20M-Base" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "pvlabs/PingVortexLM1-20M-Base",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'How to use pvlabs/PingVortexLM1-20M-Base with Docker Model Runner:
docker model run hf.co/pvlabs/PingVortexLM1-20M-Base
A small experimental language model based on LLaMA architecture trained on custom high-quality English dataset with around 200M tokens. This model is just an experiment, it is not designed for coherent text generation or logical reasoning and may produce repetitive or nonsensical outputs.
Built by PingVortex Labs.
from transformers import LlamaForCausalLM, PreTrainedTokenizerFast
model = LlamaForCausalLM.from_pretrained("pvlabs/PingVortexLM1-20M-Base")
tokenizer = PreTrainedTokenizerFast.from_pretrained("pvlabs/PingVortexLM1-20M-Base")
# don't expect a coherent response
prompt = "The capital of France is"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50, repetition_penalty=1.3)
print(tokenizer.decode(outputs[0]))
Made by PingVortex.