Category : sentence-transformers

This page has two scripts When should one use 1st method shown below vs 2nd? As nli-distilroberta-base-v2 trained specially for finding sentence embedding wont that always be better than the first method? training_stsbenchmark.py1 – from sentence_transformers import SentenceTransformer, LoggingHandler, losses, models, util #You can specify any huggingface/transformers pre-trained model here, for example, bert-base-uncased, roberta-base, xlm-roberta-base ..

Read more

I used agglomerative/fast_clustering/kmeans from UKPLab/sentence_transformers to perform sentence similarity clustering on my data. Yet, I’m not quite sure how to visualize the results on matplotlib. Following three functions show how I use the library to perform sentence clustering. Thanks! def fast_clus(sentences, batch_size=None, min_community_size=None, threshold=None): embed = model.encode(sentences, batch_size=batch_size, show_progress_bar=True, convert_to_tensor=True) clusters = util.community_detection(embed, min_community_size=min_community_size, threshold=threshold, ..

Read more

I am using huggingface transformers models for quite a few tasks, it works good but the only problem is the response time. It takes around 6-7 seconds to generate result while some times it even takes around 15-20 seconds. I tried on google collab using GPU, the performance in GPU is too fast within just ..

Read more

I am trying to create an embedding to use for a matching technique of words but I get the following error: Traceback (most recent call last) /var/folders/k1/jt1nfyks4cx689d50f5mtg0w0000gp/T/ipykernel_1349/3490519318.py in <module> 53 #Compute embedding for both lists 54 —> 55 embeddings1 = model.encode(fifteen_percent_list, convert_to_tensor=True) 56 57 /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sentence_transformers/SentenceTransformer.py in encode(self, sentences, batch_size, show_progress_bar, output_value, convert_to_numpy, convert_to_tensor, device, normalize_embeddings) ..

Read more

Is it possible to decode a sentence representation derived from SentenceTransformer back to a sentence? See example from the documentation from sentence_transformers import SentenceTransformer model = SentenceTransformer(‘paraphrase-MiniLM-L6-v2’) #Our sentences we like to encode sentences = [‘This framework generates embeddings for each input sentence’, ‘Sentences are passed as a list of string.’, ‘The quick brown fox ..

Read more

I am running a sentence transformer model and trying to truncate my tokens, but it doesn’t appear to be working. My code is from transformers import AutoModel, AutoTokenizer model_name = "sentence-transformers/paraphrase-MiniLM-L6-v2" model = AutoModel.from_pretrained(model_name) tokenizer = AutoTokenizer.from_pretrained(model_name) text_tokens = tokenizer(text, padding=True, truncation=True, return_tensors="pt") text_embedding = model(**text_tokens)["pooler_output"] I keep getting the following warning: Token indices sequence ..

Read more

I am doing a Clustering using HDBSCAN with preprocess using UMAP. my code: model = SentenceTransformer(‘paraphrase-MiniLM-L6-v2’) embeddings = model.encode(data.Text, show_progress_bar=True) umap_embeddings_fit = umap.UMAP(n_neighbors=15, n_components=5,metric = ‘euclidean’).fit(embeddings) using the below code, I am predicting the clusters. embeddings1 = model.encode(test_data.Text[0:200], show_progress_bar=True) umap_embeddings1 = umap_embeddings_fit.transform(embeddings1) test_labels, strengths = hdbscan.approximate_predict(modwl, umap_embeddings1) error AttributeError: ‘NNDescent’ object has no attribute ‘_visited’ ..

Read more

I am following this article to find the text similarity. The code I have is this: from sentence_transformers import SentenceTransformer from tqdm import tqdm from sklearn.metrics.pairwise import cosine_similarity import numpy as np import pandas as pd documents = [ "Vodafone Wins ₹ 20,000 Crore Tax Arbitration Case Against Government", "Voda Idea shares jump nearly 15% ..

Read more

I am executing train.py which is given in https://github.com/kingoflolz/mesh-transformer-jax This uses Jax, Ray, TPU to train this mdoel. However I it stops in between and generates following error: (pid=9454, ip=10.164.0.9) jax runtime initialization starting 2021-07-09 12:11:59,794 ERROR worker.py:78 — Unhandled error (suppress with RAY_IGNORE_UNHANDLED_ERRORS=1): ray::NetworkRunner.run() (pid=9454, ip=10.164.0.9) File "python/ray/_raylet.pyx", line 501, in ray._raylet.execute_task File "python/ray/_raylet.pyx", ..

Read more