Instructions to use techcodebhavesh/AutoDashAnalyticsV1GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use techcodebhavesh/AutoDashAnalyticsV1GGUF with Transformers:

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("techcodebhavesh/AutoDashAnalyticsV1GGUF", dtype="auto")

llama-cpp-python

How to use techcodebhavesh/AutoDashAnalyticsV1GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="techcodebhavesh/AutoDashAnalyticsV1GGUF",
	filename="AutoDashv1.F16.gguf",
)

output = llm(
	"Once upon a time,",
	max_tokens=512,
	echo=True
)
print(output)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use techcodebhavesh/AutoDashAnalyticsV1GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf techcodebhavesh/AutoDashAnalyticsV1GGUF:F16
# Run inference directly in the terminal:
llama-cli -hf techcodebhavesh/AutoDashAnalyticsV1GGUF:F16

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf techcodebhavesh/AutoDashAnalyticsV1GGUF:F16
# Run inference directly in the terminal:
llama-cli -hf techcodebhavesh/AutoDashAnalyticsV1GGUF:F16

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf techcodebhavesh/AutoDashAnalyticsV1GGUF:F16
# Run inference directly in the terminal:
./llama-cli -hf techcodebhavesh/AutoDashAnalyticsV1GGUF:F16

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf techcodebhavesh/AutoDashAnalyticsV1GGUF:F16
# Run inference directly in the terminal:
./build/bin/llama-cli -hf techcodebhavesh/AutoDashAnalyticsV1GGUF:F16

Use Docker

docker model run hf.co/techcodebhavesh/AutoDashAnalyticsV1GGUF:F16

LM Studio
Jan
Ollama
How to use techcodebhavesh/AutoDashAnalyticsV1GGUF with Ollama:
```
ollama run hf.co/techcodebhavesh/AutoDashAnalyticsV1GGUF:F16
```

Unsloth Studio new

How to use techcodebhavesh/AutoDashAnalyticsV1GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for techcodebhavesh/AutoDashAnalyticsV1GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for techcodebhavesh/AutoDashAnalyticsV1GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for techcodebhavesh/AutoDashAnalyticsV1GGUF to start chatting

Docker Model Runner
How to use techcodebhavesh/AutoDashAnalyticsV1GGUF with Docker Model Runner:
```
docker model run hf.co/techcodebhavesh/AutoDashAnalyticsV1GGUF:F16
```

Lemonade

How to use techcodebhavesh/AutoDashAnalyticsV1GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull techcodebhavesh/AutoDashAnalyticsV1GGUF:F16

Run and chat with the model

lemonade run user.AutoDashAnalyticsV1GGUF-F16

List all available models

lemonade list

Any plan for Q8

by bhupesh-sf - opened Aug 10, 2024

Discussion

bhupesh-sf

Aug 10, 2024

To have better performance I think it will be good to have more precisions for the model. Let me know your thoughts about this.

techcodebhavesh

Owner Aug 10, 2024

I believe increasing the model's precision could indeed lead to better performance, especially in tasks requiring high accuracy. However, it's important to consider how this might affect other aspects, such as processing speed and resource usage. Striking a balance between precision and efficiency will be key. I'm interested to hear what strategies or techniques others have used to achieve this.

bhupesh-sf

Aug 12, 2024

Hey, that is true that it leads to higher performance but also to higher RAM usage as it needs more memory to think through. But there are many kinds of users out there, some with lower configuration and some with higher. Just in my case, I have 32 GB RAM on Mac M2 pro. So for me it's quite easy to run Q8 but there might be some users who has lesser config and hence might need only q4. So generally people give multiple quantization for the models so different users with different configurations can choose the right one for them.

BTW, I came across this model while searching for something else but it felt quite an interesting idea. I am yet to explore it but thanks for the good work.

techcodebhavesh

Owner Aug 12, 2024

Thanks for sharing your thoughts! I completely agree that different quantization options are important to cater to a variety of hardware configurations. Offering both Q8 and Q4 versions makes a lot of sense.
This model was actually developed with a specific problem statement in mind, which guided its design and optimization. We're planning to push the Q8 version soon to accommodate higher-performance needs, while still considering users with different configurations. Thanks for your feedback!!!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment