Instructions to use TaylorAI/gte-tiny with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use TaylorAI/gte-tiny with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("TaylorAI/gte-tiny") sentences = [ "That is a happy person", "That is a happy dog", "That is a very happy person", "Today is a sunny day" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Transformers
How to use TaylorAI/gte-tiny with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("TaylorAI/gte-tiny") model = AutoModel.from_pretrained("TaylorAI/gte-tiny") - Inference
- Notebooks
- Google Colab
- Kaggle
Any guidance on how to use this with sentence-transformers without downloading a bunch of extra stuff?
This is a really cool model!
I'm using this with SentenceTransformers and it downloaded 314M of files. The big ones were:
model.safetensors 43M
pytorch_model.bin 43M
onnx/model.onnx 86M
onnx/model_optimized.onnx 86M
onnx/model_quantized.onnx 22M
Is there a way to use this with SentenceTransformers that only downloads the model file that I need?
hi @simonw ,
You can use huggingface_hub and snapshot only the needed files and pass the folder to the SentenceTransformer constructor
from sentence_transformers import SentenceTransformer
from huggingface_hub import snapshot_download
sentences = ["This is an example sentence", "Each sentence is converted"]
model_path = snapshot_download(
repo_id="TaylorAI/gte-tiny", allow_patterns=["*.json", "pytorch_model.bin"]
)
model = SentenceTransformer(model_path)
embeddings = model.encode(sentences)
print(embeddings)
I'd also recommend the use of model.safetensors
from sentence_transformers import SentenceTransformer
from huggingface_hub import snapshot_download
sentences = ["This is an example sentence", "Each sentence is converted"]
model_path = snapshot_download(
repo_id="TaylorAI/gte-tiny", allow_patterns=["*.json", "model.safetensors"]
)
model = SentenceTransformer(model_path)
embeddings = model.encode(sentences)
print(embeddings)
I can also release another minimal version without the onnx weights for SentenceTransformers.
txtai has built-in logic for mean/cls pooling using Transformers, which only downloads the files it needs.
For example:
import txtai
embeddings = txtai.Embeddings(path="TaylorAI/gte-tiny")
embeddings.batchtransform(["text1", "text2"])
