Sentence Similarity
sentence-transformers
PyTorch
ONNX
Safetensors
OpenVINO
English
bert
mteb
Sentence Transformers
Eval Results (legacy)
text-embeddings-inference
Instructions to use intfloat/e5-base-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use intfloat/e5-base-v2 with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("intfloat/e5-base-v2") sentences = [ "That is a happy person", "That is a happy dog", "That is a very happy person", "Today is a sunny day" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Inference
- Notebooks
- Google Colab
- Kaggle
How this model count the token size?
#10
by WeiZhenKun - opened
How this model count the token size?
Is there a certain proportional relationship between the token size and the length of characters?
This model is based on the BERT tokenizer, as an approximate rule of thumb, there are roughly 0.75 words per token in English text. For precise count, please load the tokenizer and run on your data of interest.
intfloat changed discussion status to closed