Instructions to use ByteForge/Defog_llama-3-sqlcoder-8b-ct2-int8_float16 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ByteForge/Defog_llama-3-sqlcoder-8b-ct2-int8_float16 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="ByteForge/Defog_llama-3-sqlcoder-8b-ct2-int8_float16") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("ByteForge/Defog_llama-3-sqlcoder-8b-ct2-int8_float16", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use ByteForge/Defog_llama-3-sqlcoder-8b-ct2-int8_float16 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "ByteForge/Defog_llama-3-sqlcoder-8b-ct2-int8_float16" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ByteForge/Defog_llama-3-sqlcoder-8b-ct2-int8_float16", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/ByteForge/Defog_llama-3-sqlcoder-8b-ct2-int8_float16
- SGLang
How to use ByteForge/Defog_llama-3-sqlcoder-8b-ct2-int8_float16 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "ByteForge/Defog_llama-3-sqlcoder-8b-ct2-int8_float16" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ByteForge/Defog_llama-3-sqlcoder-8b-ct2-int8_float16", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "ByteForge/Defog_llama-3-sqlcoder-8b-ct2-int8_float16" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ByteForge/Defog_llama-3-sqlcoder-8b-ct2-int8_float16", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use ByteForge/Defog_llama-3-sqlcoder-8b-ct2-int8_float16 with Docker Model Runner:
docker model run hf.co/ByteForge/Defog_llama-3-sqlcoder-8b-ct2-int8_float16
| license: cc-by-sa-4.0 | |
| metrics: | |
| - accuracy | |
| pipeline_tag: text-generation | |
| tags: | |
| - code | |
| A capable language model for text to SQL generation for Postgres, Redshift and Snowflake that is on-par with the most capable generalist frontier models. | |
|  | |
| ## Model Description | |
| Developed by: Defog, Inc | |
| Model type: [Text to SQL] | |
| License: [CC-by-SA-4.0] | |
| Finetuned from model: [Meta-Llama-3-8B-Instruct] | |
| ## defog/llama-3-sqlcoder-8b for CTranslate2 | |
| **The model is quantized version of the [defog/llama-3-sqlcoder-8b](https://huggingface.co/defog/llama-3-sqlcoder-8b) with int8_float16 quantization and can be used in [CTranslate2](https://github.com/OpenNMT/CTranslate2).** | |
| ## How to use | |
| ```pip install ctranslate2``` | |
| This repository for use with [CTranslate2](https://github.com/OpenNMT/CTranslate2). | |
| ### Use with CTranslate2 | |
| This example code is obtained from [CTranslate2_transformers](https://opennmt.net/CTranslate2/guides/transformers.html#mpt) and [tokenizer AutoTokenizer](https://huggingface.co/docs/transformers/main_classes/tokenizer). | |
| More detailed information about the `generate_batch` methon can be found at [CTranslate2_Generator.generate_batch](https://opennmt.net/CTranslate2/python/ctranslate2.Generator.html#ctranslate2.Generator.generate_batch). | |
| ```python | |
| import ctranslate2 | |
| import transformers | |
| from huggingface_hub import snapshot_download | |
| model_id = "ByteForge/Defog_llama-3-sqlcoder-8b-ct2-int8_float16" | |
| model_path = snapshot_download(model_id) | |
| model = ctranslate2.Generator(model_path) | |
| tokenizer = transformers.AutoTokenizer.from_pretrained(model_id) | |
| prompt=""" | |
| CREATE TABLE stadium ( | |
| stadium_id number, | |
| location text, | |
| name text, | |
| capacity number, | |
| highest number, | |
| lowest number, | |
| average number | |
| ) | |
| CREATE TABLE singer ( | |
| singer_id number, | |
| name text, | |
| country text, | |
| song_name text, | |
| song_release_year text, | |
| age number, | |
| is_male others | |
| ) | |
| CREATE TABLE concert ( | |
| concert_id number, | |
| concert_name text, | |
| theme text, | |
| stadium_id text, | |
| year text | |
| ) | |
| CREATE TABLE singer_in_concert ( | |
| concert_id number, | |
| singer_id text | |
| ) | |
| -- Using valid SQLite, answer the following questions for the tables provided above. | |
| -- What is the maximum, the average, and the minimum capacity of stadiums ? (Generate 1 Sql query. No explaination needed) | |
| answer: | |
| """ | |
| messages = [ | |
| {"role": "system", "content": "You are SQL Expert. Given a input question and schema, answer with correct sql query"}, | |
| {"role": "user", "content": prompt}, | |
| ] | |
| input_ids = tokenizer.apply_chat_template( | |
| messages, | |
| tokenize=False, | |
| add_generation_prompt=True | |
| ) | |
| terminators = [ | |
| tokenizer.eos_token_id, | |
| tokenizer.convert_tokens_to_ids("<|eot_id|>") | |
| ] | |
| input_tokens = tokenizer.convert_ids_to_tokens(tokenizer.encode(input_ids)) | |
| results = model.generate_batch([input_tokens], include_prompt_in_result=False, max_length=256, sampling_temperature=0.6, sampling_topp=0.9, end_token=terminators) | |
| output = tokenizer.decode(results[0].sequences_ids[0]) | |
| print(output) | |
| ``` | |
| ## Ideal prompt and inference parameters | |
| Set temperature to 0, and do not do sampling. | |
| ## Evaluation | |
| This model was evaluated on SQL-Eval, a PostgreSQL based evaluation framework developed by Defog for testing and alignment of model capabilities. | |
| You can read more about the methodology behind SQLEval [here](https://defog.ai/blog/open-sourcing-sqleval/). |