Instructions to use techcodebhavesh/AutoDashAnalyticsV1GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use techcodebhavesh/AutoDashAnalyticsV1GGUF with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("techcodebhavesh/AutoDashAnalyticsV1GGUF", dtype="auto") - llama-cpp-python
How to use techcodebhavesh/AutoDashAnalyticsV1GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="techcodebhavesh/AutoDashAnalyticsV1GGUF", filename="AutoDashv1.F16.gguf", )
output = llm( "Once upon a time,", max_tokens=512, echo=True ) print(output)
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use techcodebhavesh/AutoDashAnalyticsV1GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf techcodebhavesh/AutoDashAnalyticsV1GGUF:F16 # Run inference directly in the terminal: llama-cli -hf techcodebhavesh/AutoDashAnalyticsV1GGUF:F16
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf techcodebhavesh/AutoDashAnalyticsV1GGUF:F16 # Run inference directly in the terminal: llama-cli -hf techcodebhavesh/AutoDashAnalyticsV1GGUF:F16
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf techcodebhavesh/AutoDashAnalyticsV1GGUF:F16 # Run inference directly in the terminal: ./llama-cli -hf techcodebhavesh/AutoDashAnalyticsV1GGUF:F16
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf techcodebhavesh/AutoDashAnalyticsV1GGUF:F16 # Run inference directly in the terminal: ./build/bin/llama-cli -hf techcodebhavesh/AutoDashAnalyticsV1GGUF:F16
Use Docker
docker model run hf.co/techcodebhavesh/AutoDashAnalyticsV1GGUF:F16
- LM Studio
- Jan
- Ollama
How to use techcodebhavesh/AutoDashAnalyticsV1GGUF with Ollama:
ollama run hf.co/techcodebhavesh/AutoDashAnalyticsV1GGUF:F16
- Unsloth Studio new
How to use techcodebhavesh/AutoDashAnalyticsV1GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for techcodebhavesh/AutoDashAnalyticsV1GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for techcodebhavesh/AutoDashAnalyticsV1GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for techcodebhavesh/AutoDashAnalyticsV1GGUF to start chatting
- Docker Model Runner
How to use techcodebhavesh/AutoDashAnalyticsV1GGUF with Docker Model Runner:
docker model run hf.co/techcodebhavesh/AutoDashAnalyticsV1GGUF:F16
- Lemonade
How to use techcodebhavesh/AutoDashAnalyticsV1GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull techcodebhavesh/AutoDashAnalyticsV1GGUF:F16
Run and chat with the model
lemonade run user.AutoDashAnalyticsV1GGUF-F16
List all available models
lemonade list
Any plan for Q8
To have better performance I think it will be good to have more precisions for the model. Let me know your thoughts about this.
I believe increasing the model's precision could indeed lead to better performance, especially in tasks requiring high accuracy. However, it's important to consider how this might affect other aspects, such as processing speed and resource usage. Striking a balance between precision and efficiency will be key. I'm interested to hear what strategies or techniques others have used to achieve this.
Hey, that is true that it leads to higher performance but also to higher RAM usage as it needs more memory to think through. But there are many kinds of users out there, some with lower configuration and some with higher. Just in my case, I have 32 GB RAM on Mac M2 pro. So for me it's quite easy to run Q8 but there might be some users who has lesser config and hence might need only q4. So generally people give multiple quantization for the models so different users with different configurations can choose the right one for them.
BTW, I came across this model while searching for something else but it felt quite an interesting idea. I am yet to explore it but thanks for the good work.
Thanks for sharing your thoughts! I completely agree that different quantization options are important to cater to a variety of hardware configurations. Offering both Q8 and Q4 versions makes a lot of sense.
This model was actually developed with a specific problem statement in mind, which guided its design and optimization. We're planning to push the Q8 version soon to accommodate higher-performance needs, while still considering users with different configurations. Thanks for your feedback!!!