Instructions to use LanguageBind/LanguageBind_Audio_FT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use LanguageBind/LanguageBind_Audio_FT with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("zero-shot-image-classification", model="LanguageBind/LanguageBind_Audio_FT") pipe( "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/parrots.png", candidate_labels=["animals", "humans", "landscape"], )# Load model directly from transformers import AutoModelForZeroShotImageClassification model = AutoModelForZeroShotImageClassification.from_pretrained("LanguageBind/LanguageBind_Audio_FT", dtype="auto") - Notebooks
- Google Colab
- Kaggle
Commit ·
8a8faf2
1
Parent(s): 7978be1
Update config.json
Browse files- config.json +2 -2
config.json
CHANGED
|
@@ -91,12 +91,12 @@
|
|
| 91 |
"audio_sample_rate": 16000,
|
| 92 |
"audio_mean": -4.2677393,
|
| 93 |
"audio_std": 4.5689974,
|
| 94 |
-
"lora_r":
|
| 95 |
"lora_alpha": 16,
|
| 96 |
"lora_dropout": 0.0,
|
| 97 |
"add_time_attn": false,
|
| 98 |
"num_frames": 1,
|
| 99 |
-
"num_mel_bins":
|
| 100 |
"target_length": 1036,
|
| 101 |
"add_cross_attention": false,
|
| 102 |
"architectures": null,
|
|
|
|
| 91 |
"audio_sample_rate": 16000,
|
| 92 |
"audio_mean": -4.2677393,
|
| 93 |
"audio_std": 4.5689974,
|
| 94 |
+
"lora_r": 0,
|
| 95 |
"lora_alpha": 16,
|
| 96 |
"lora_dropout": 0.0,
|
| 97 |
"add_time_attn": false,
|
| 98 |
"num_frames": 1,
|
| 99 |
+
"num_mel_bins": 112,
|
| 100 |
"target_length": 1036,
|
| 101 |
"add_cross_attention": false,
|
| 102 |
"architectures": null,
|