[FEEDBACK] Inference Providers
Any inference provider you love, and that you'd like to be able to access directly from the Hub?
Love that I can call DeepSeek R1 directly from the Hub π₯
from huggingface_hub import InferenceClient
client = InferenceClient(
provider="together",
api_key="xxxxxxxxxxxxxxxxxxxxxxxx"
)
messages = [
{
"role": "user",
"content": "What is the capital of France?"
}
]
completion = client.chat.completions.create(
model="deepseek-ai/DeepSeek-R1",
messages=messages,
max_tokens=500
)
print(completion.choices[0].message)
Is it possible to set a monthly payment budget or rate limits for all the external providers? I don't see such options in billings tab. In case a key is or session token is stolen, it can be quite dangerous to my thin wallet:(
@benhaotang you already get spending notifications when crossing important thresholds ($10, $100, $1,000) but we'll add spending limits in the future
@benhaotang you already get spending notifications when crossing important thresholds ($10, $100, $1,000) but we'll add spending limits in the future
Thanks for your quick reply, good to know!
Would be great if you could add Nebius AI Studio to the list :) New inference provider on the market, with the absolute cheapest prices and the highest rate limits...
Could be good to add featherless.ai
TitanML !!
Hi HuggingFace Team,
We would like to register Mokzu as an inference provider.
Provider Details:
- Organization: Mokzu (https://huggingface.co/mokzu)
- Website: https://mokzu.com
We have submitted PRs to both huggingface_hub and huggingface.js repositories:
- Python PR: https://github.com/huggingface/huggingface_hub/pull/3686
- JavaScript PR: https://github.com/huggingface/huggingface.js/pull/1920
Could you please enable the Model Mapping API for our organization and provide guidance on registering our provider?
Thank you!
Hi HuggingFace team! π
We'd like to register Latitude.sh as an inference provider.
About Latitude:
- Recently acquired by Megaport (ASX: MP1), a global Network-as-a-Service leader
- Operating 10,000+ physical servers and 1,000+ GPUs globally
- Combined platform spanning 1,000+ data centers in 26 countries
- Tier-3 data centers with 99.99% SLA
Technical Implementation:
- OpenAI-compatible API at https://api.lsh.ai
- Billing endpoint implemented (POST /api/billing/costs returning nano-USD)
- Inference-Id header in all responses (regular + streaming)
- Full support: tool calling, structured output (JSON mode), vision/multimodal, streaming
Submitted PRs:
- JS Client: https://github.com/huggingface/huggingface.js/pull/1927
- Python Client: https://github.com/huggingface/huggingface_hub/pull/3715
- Documentation: https://github.com/huggingface/hub-docs/pull/2180
HuggingFace Organization: https://huggingface.co/latitude-sh
Logo Assets:
- Light: https://logos-swart.vercel.app/latitude-light.png
- Dark: https://logos-swart.vercel.app/latitude-dark.png
Models Ready for Mapping:
ββββββββββββββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββ¬βββββββββββββββββ
β HF Model β Provider Model β Task β
ββββββββββββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββΌβββββββββββββββββ€
β Qwen/Qwen2.5-7B-Instruct β qwen-2.5-7b β conversational β
ββββββββββββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββΌβββββββββββββββββ€
β meta-llama/Llama-3.1-8B-Instruct β llama-3.1-8b β conversational β
ββββββββββββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββΌβββββββββββββββββ€
β Qwen/Qwen2.5-VL-7B-Instruct β qwen-2.5-vl-7b β conversational β
ββββββββββββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββΌβββββββββββββββββ€
β google/gemma-2-27b-it β gemma-2-27b β conversational β
ββββββββββββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββΌβββββββββββββββββ€
β Qwen/Qwen3-32B β qwen3-32b β conversational β
ββββββββββββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββΌβββββββββββββββββ€
β Qwen/Qwen2.5-Coder-32B-Instruct β qwen2.5-coder-32b β conversational β
ββββββββββββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββΌβββββββββββββββββ€
β deepseek-ai/DeepSeek-R1-Distill-Qwen-14B β deepseek-r1-distill-14b β conversational β
ββββββββββββββββββββββββββββββββββββββββββββ΄ββββββββββββββββββββββββββ΄βββββββββββββββββ
Please let us know if you need any additional information or if there are next steps we should follow.
Thank you!
Hey, I've been using inference providers from the CLI and kept running into the discoverability problem others mention here. I ended up building a small Rust CLI/TUI for it https://github.com/jadnohra/hf-providers (brew from tap). I understand it's not a solution, but happy to have eyes on it in case anyone finds it useful.
Hello, HF team / HF community
We're The Compute Factory, a multicloud AI inference platform from France π«π·. We'd like to register as an Inference Provider.
Looking at the current HF inference providers landscape, we see that text-to-image and especially text-to-video are significantly underserved; only a handful of providers cover these tasks today. This is a gap we specifically intend to fill. Our GPU (+other accelerators) infrastructure is well suited for the heavy compute these models demand, and we plan to aggressively expand HF hosted models catalog for image and video generation alongside LLMs.
Our plan is to start with openai/gpt-oss-20B for chat completion, and Lightricks/LTX-2-19B for text to video generation as our first integrations; this lets us establish the end-to-end HF provider pipeline (API, billing, model mapping, PRs). From there, we'll rapidly deploy dozens of additional models across chat completion, text-to-image, and text-to-video.
We're technically ready and can start implementing the HF Inference Provider specs immediately.
HF org: https://huggingface.co/thecomputefactory
PR: https://github.com/huggingface/huggingface.js/pull/1988
We would love any guidance on next steps. Happy to jump on a call.
Best,
The Compute Factory team
chouki@thecomputefactory.com
Hello,
We are a small datacenter company in Germany and have some questions to become a inference partner.
Our questions:
β’ Can we offer a ocr model as an inference partner?
β’ How are the scaling limits regulated or can they be adjusted? We cannot serve an infinite number of users.
β’ Is it acceptable to start with only one model as a partner initially?
Thanks in advanced and best regards
Hello HuggingFace Team,
We would like to add TextCLF as an inference provider.
Provider Details:
Organization: TextCLF(https://huggingface.co/textclf-ai)
Website: https://textclf.com
We have submitted PRs to both huggingface_hub and huggingface.js repositories:
Python PR: https://github.com/huggingface/huggingface_hub/pull/3895
JavaScript PR: https://github.com/huggingface/huggingface.js/pull/2022
Could you please enable the Model Mapping API for our organization and provide guidance on registering our provider?
Email: contact@textclf.com
Thanks,
TextCLF
Hi HuggingFace team,
Weβre the team behind Phosor, a GPU inference platform specializing in AI video generation. Weβd like to express our interest in becoming a HuggingFace Inference Provider.
About Phosor
Phosor provides high-performance inference for the Wan model family in production, with full support for both Wan 2.1 and Wan 2.2, covering:
- Text-to-Video (T2V) and Image-to-Video (I2V) at 14B scale
- LoRA support β users can upload and apply custom LoRA weights at inference time
- Multi-resolution output β 480p, 720p, and 1080p with configurable frame counts
- On-demand model loading β our service can download and run models on demand
We run dedicated GPU clusters with Ray-based job scheduling, and our RESTful API (/api/v1/inference/submit,/api/v1/inference/status/{request_id},/api/v1/inference/result/{request_id}) follows established async patterns. Our infrastructure is designed for high availability with automatic failover, ensuring stable and reliable service for production workloads.
Pricing
We use a pay-per-use model, priced per frame:
| Resolution | Per Frame | Per Second (24fps) |
|---|---|---|
| 480p | $0.0002 | $0.0048 |
| 720p | $0.0006 | $0.0144 |
| 1080p | $0.0012 | $0.0288 |
LoRA Training: $0.003 per training step (~$1.20 for a typical ~400-step training run)
Lowest price: $0.0002/frame (480p) β significantly lower than comparable providers in the market.
Content Policy
- NSFW content generation: No
- Celebrity / public figure generation: No
Why We Think This Is a Good Fit
The Wan model family has strong presence on HuggingFace Hub, but there is currently no dedicated video generation inference provider. Weβve already invested significant engineering effort into optimizing these models for production β including ComfyUI-based pipelines, LoRA hot-loading, and efficient queue management. Our pricing is among the lowest in the industry β starting from just $0.0002 per frame at 480p, making AI video generation accessible to a much wider audience. Additionally, our platform is built for production-grade stability, with dedicated GPU clusters, automatic failover, and consistent uptime, ensuring reliable inference at scale. Integrating as a provider would give HuggingFace users one-click access to affordable, stable video generation directly from Wan model pages.
Integration Timeline
Weβre ready to start immediately β ideally within 1β2 weeks. Weβd appreciate guidance on:
- Technical requirements for API adaptation (endpoint specs, auth flow, billing integration)
- Documentation or reference implementations for the provider onboarding process
- Timeline expectations for review and approval
Happy to align our API with whatever specification HuggingFace requires.
Contact
- Website: https://phosor.ai
- Email: hello@aoraki-labs.io
Best,
Aoraki Labs Team
Hi team,
I'm Ben, one of the founding engineers at relaxAI. We're a sovereign AI inference provider hosted on Civo infrastructure, the UK's leading sovereign cloud platform. We'd like to express our interest in becoming a HuggingFace Inference Provider.
About relaxAI
relaxAI provides high-performance, OpenAI-compatible inference for leading open source LLMs. We're currently serving models that are popular across your existing providers, including:
- gpt-oss-120b
- Kimi-K2.5
- Llama 4 Maverick
Our infrastructure runs on Blackwell GPUs through our NVIDIA partnership, and our throughput, latency, and token pricing are very competitive.
What Sets Us Apart
What makes us different from your current provider lineup is that we're fully UK sovereign β 100% UK data residency, processing, and legal jurisdiction. Sovereign AI is a growing priority for UK and European enterprises, particularly in regulated industries, and as far as we can tell there isn't currently a UK-domiciled provider represented in Inference Providers. We think that's a meaningful gap we could help fill.
Integration Readiness
We already have a Hub organisation at huggingface.co/relaxai and we're ready to get started immediately β JS client PR, model mappings, billing endpoint, the lot. Our API is fully OpenAI-compatible, so we'd expect the integration to be straightforward. Happy to provide API access for testing whenever suits.
We'd appreciate guidance on:
- Timeline expectations for review and approval
- Any provider-specific requirements beyond the standard onboarding docs
Contact
- Website: https://relax.ai
- API Docs: https://relax.ai/docs
- Email: ben@relax.ai
Best,
Ben
Founding Engineer, relaxAI
