Instructions to use codellama/CodeLlama-34b-Instruct-hf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use codellama/CodeLlama-34b-Instruct-hf with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="codellama/CodeLlama-34b-Instruct-hf") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("codellama/CodeLlama-34b-Instruct-hf") model = AutoModelForCausalLM.from_pretrained("codellama/CodeLlama-34b-Instruct-hf") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use codellama/CodeLlama-34b-Instruct-hf with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "codellama/CodeLlama-34b-Instruct-hf" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "codellama/CodeLlama-34b-Instruct-hf", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/codellama/CodeLlama-34b-Instruct-hf
- SGLang
How to use codellama/CodeLlama-34b-Instruct-hf with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "codellama/CodeLlama-34b-Instruct-hf" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "codellama/CodeLlama-34b-Instruct-hf", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "codellama/CodeLlama-34b-Instruct-hf" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "codellama/CodeLlama-34b-Instruct-hf", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use codellama/CodeLlama-34b-Instruct-hf with Docker Model Runner:
docker model run hf.co/codellama/CodeLlama-34b-Instruct-hf
KeyError: "filename 'storages' not found"
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(
model_path, use_fast=self.use_fast_tokenizer, revision=revision
)
model = AutoModelForCausalLM.from_pretrained(
model_path,
low_cpu_mem_usage=True,
**from_pretrained_kwargs,
)
log:
Loading checkpoint shards: 0%| | 0/7 [00:00<?, ?it/s]
Loading checkpoint shards: 14%|βββββββββββββββββ | 1/7 [00:09<00:56, 9.38s/it]
Loading checkpoint shards: 29%|βββββββββββββββββββββββββββββββββ | 2/7 [00:18<00:45, 9.16s/it]
Loading checkpoint shards: 29%|βββββββββββββββββββββββββββββββββ | 2/7 [00:18<00:45, 9.19s/it]
2023-10-06 17:09:12,662 | ERROR | stderr |
2023-10-06 17:09:12,662 | ERROR | stderr | Traceback (most recent call last):
2023-10-06 17:09:12,662 | ERROR | stderr | File "/opt/miniconda3/envs/lib/python3.10/site-packages/transformers/modeling_utils.py", line 484, in load_state_dict
2023-10-06 17:09:12,663 | ERROR | stderr | return torch.load(checkpoint_file, map_location=map_location)
2023-10-06 17:09:12,663 | ERROR | stderr | File "/opt/miniconda3/envs/lib/python3.10/site-packages/torch/serialization.py", line 815, in load
2023-10-06 17:09:12,663 | ERROR | stderr | return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
2023-10-06 17:09:12,664 | ERROR | stderr | File "/opt/miniconda3/envs/lib/python3.10/site-packages/torch/serialization.py", line 1018, in _legacy_load
2023-10-06 17:09:12,664 | ERROR | stderr | return legacy_load(f)
2023-10-06 17:09:12,664 | ERROR | stderr | File "/opt/miniconda3/envs/lib/python3.10/site-packages/torch/serialization.py", line 904, in legacy_load
2023-10-06 17:09:12,665 | ERROR | stderr | tar.extract('storages', path=tmpdir)
2023-10-06 17:09:12,665 | ERROR | stderr | File "/opt/miniconda3/envs/lib/python3.10/tarfile.py", line 2091, in extract
2023-10-06 17:09:12,665 | ERROR | stderr | tarinfo = self.getmember(member)
2023-10-06 17:09:12,665 | ERROR | stderr | File "/opt/miniconda3/envs/lib/python3.10/tarfile.py", line 1813, in getmember
2023-10-06 17:09:12,666 | ERROR | stderr | raise KeyError("filename %r not found" % name)
2023-10-06 17:09:12,666 | ERROR | stderr | KeyError: "filename 'storages' not found"
2023-10-06 17:09:12,666 | ERROR | stderr |
2023-10-06 17:09:12,666 | ERROR | stderr | The above exception was the direct cause of the following exception:
2023-10-06 17:09:12,666 | ERROR | stderr |
2023-10-06 17:09:12,666 | ERROR | stderr | Traceback (most recent call last):
2023-10-06 17:09:12,667 | ERROR | stderr | File "/opt/miniconda3/envs/lib/python3.10/site-packages/transformers/modeling_utils.py", line 495, in load_state_dict
2023-10-06 17:09:12,667 | ERROR | stderr | raise ValueError(
2023-10-06 17:09:12,667 | ERROR | stderr | ValueError: Unable to locate the file models/CodeLlama-34b-Instruct-hf/pytorch_model-00003-of-00007.bin which is necessary to load this pretrained model. Make sure you have saved the model properly.
2023-10-06 17:09:12,667 | ERROR | stderr |
2023-10-06 17:09:12,667 | ERROR | stderr | During handling of the above exception, another exception occurred:
2023-10-06 17:09:12,667 | ERROR | stderr |
2023-10-06 17:09:12,668 | ERROR | stderr | Traceback (most recent call last):
2023-10-06 17:09:12,668 | ERROR | stderr | File "/opt/miniconda3/envs/lib/python3.10/runpy.py", line 196, in _run_module_as_main
2023-10-06 17:09:12,668 | ERROR | stderr | return _run_code(code, main_globals, None,
2023-10-06 17:09:12,668 | ERROR | stderr | File "/opt/miniconda3/envs/lib/python3.10/runpy.py", line 86, in _run_code
2023-10-06 17:09:12,668 | ERROR | stderr | exec(code, run_globals)
2023-10-06 17:09:12,668 | ERROR | stderr | File "/home/fastchat/serve/model_worker.py", line 467, in
2023-10-06 17:09:12,669 | ERROR | stderr | worker = ModelWorker(
2023-10-06 17:09:12,669 | ERROR | stderr | File "/home/fastchat/serve/model_worker.py", line 210, in init
2023-10-06 17:09:12,669 | ERROR | stderr | self.model, self.tokenizer = load_model(
2023-10-06 17:09:12,669 | ERROR | stderr | File "/home/fastchat/model/model_adapter.py", line 264, in load_model
2023-10-06 17:09:12,669 | ERROR | stderr | model, tokenizer = adapter.load_model(model_path, kwargs)
2023-10-06 17:09:12,669 | ERROR | stderr | File "/home/fastchat/model/model_adapter.py", line 1280, in load_model
2023-10-06 17:09:12,670 | ERROR | stderr | model = AutoModelForCausalLM.from_pretrained(
2023-10-06 17:09:12,670 | ERROR | stderr | File "/opt/miniconda3/envs/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 565, in from_pretrained
2023-10-06 17:09:12,670 | ERROR | stderr | return model_class.from_pretrained(
2023-10-06 17:09:12,670 | ERROR | stderr | File "/opt/miniconda3/envs/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3307, in from_pretrained
2023-10-06 17:09:12,671 | ERROR | stderr | ) = cls._load_pretrained_model(
2023-10-06 17:09:12,671 | ERROR | stderr | File "/opt/miniconda3/envs/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3681, in _load_pretrained_model
2023-10-06 17:09:12,672 | ERROR | stderr | state_dict = load_state_dict(shard_file)
2023-10-06 17:09:12,672 | ERROR | stderr | File "/opt/miniconda3/envs/lib/python3.10/site-packages/transformers/modeling_utils.py", line 500, in load_state_dict
2023-10-06 17:09:12,673 | ERROR | stderr | raise OSError(
2023-10-06 17:09:12,673 | ERROR | stderr | OSError: Unable to load weights from pytorch checkpoint file for 'models/CodeLlama-34b-Instruct-hf/pytorch_model-00003-of-00007.bin' at 'models/CodeLlama-34b-Instruct-hf/pytorch_model-00003-of-00007.bin'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.
evns:
transformers 4.34
accelerate 0.23.0