Instructions to use bigscience/bloom with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use bigscience/bloom with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="bigscience/bloom")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("bigscience/bloom") model = AutoModelForCausalLM.from_pretrained("bigscience/bloom") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use bigscience/bloom with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "bigscience/bloom" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "bigscience/bloom", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/bigscience/bloom
- SGLang
How to use bigscience/bloom with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "bigscience/bloom" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "bigscience/bloom", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "bigscience/bloom" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "bigscience/bloom", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use bigscience/bloom with Docker Model Runner:
docker model run hf.co/bigscience/bloom
base_model_prefix = "transformer"
Hello, why doesn't the nomenclature of the modules in the Bloom and Bloomz models adhere to those created by the BloomPreTrainedModel class: base_model_prefix = "transformer"?
The issue is that in TGI, which has adapted to Bloom modeling, models trained by Transformers do not work because the TGI library looks for model names without the "transformer" prefix.
FYI, here is the related issue description https://github.com/huggingface/text-generation-inference/issues/541#issuecomment-1740913948
Well those foundation model work.
If loading the model and saving it back in transformers changes it that's an issue IMO.
We can make something for TGI but this feels like legacy support, would you agree ?
Hi Narsil,
I agree that the issue seems to be more related to the naming of modules in the foundation models rather than a TGI problem. What I find strange is that the planned prefix in the code is "transformer.[PyTorch module name]," but in the foundation model, this prefix is absent.
If I refer to the BERT model, for example, there is the prefix "bert.[etc]" on the module names, as stipulated in the code: base_model_prefix = "bert".
Indeed, allowing flexibility in TGI to let the user define the prefix would be a more robust solution.