I have a list named as questions: >>> questions=[‘Where do you live?’, ‘What is my favourite color?’, ‘What is your age’, ‘Do you like coding?’] and a list named as answer: >>> answers=[‘I live in India’,’my favourite color is orange’, ‘my age is 16′,’I love coding’] What I need to do is make a function ..
I already visited almost every posts related to this but most of them are calculating the probability on basis of similar words but is there any way of getting the probability if two statements are same in meaning but may contain different words. Eg. "Mark zuckerberg owns the facebook company" and "Facebook company’s ceo is ..
So I’m relatively new to Python, and I’m trying to clean and summarize a ‘.txt’ file, and keep getting this warning after trying to create a similarity matrix:- /usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:4: RuntimeWarning: invalid value encountered in double_scalars after removing the cwd from sys.path. This is the similarity matrix code:- similarity_matrix = np.zeros((len(cleaned_sentences),len(cleaned_sentences))) for i in range(0,len(cleaned_sentences)): for ..
I’m having problems with "I’m" since I would like that the bot will response to "I’m Sara" with a sentence different from the sentence replying "I’m happy". Does anyone know how to fix it please? Specifically: [r"(.*) my name is (.*)", ["Hello %2, How are you doing today ?" ]] for the name sentence and ..
I’m trying to remove stopwords from each row of my dataframe and put it into a new dataframe column S. I’ve tried below code but it doesn’t seem to work… from nltk.corpus import stopwords stopwords = stopwords.words(‘english’) df[‘S’] = df.apply(lambda row: (word for word in row[‘remarks_tokenized’] if word.lower() not in stopwords), axis=1) Source: Python..
I have a list of sentences in a csv file. Now I need to lemmatize these sentences and extract those containing certain keywords. import wordnet, nltk from nltk.stem import WordNetLemmatizer from nltk.corpus import wordnet from nltk import word_tokenize import pandas as pd import csv # define lemmatizer lemmatizer = WordNetLemmatizer() result =  # define ..
I have the following chatbot which reads a text file and uses NLTK, then outputs text from this text file accordingly. But I want to use an excel file with multiple columns and hundreds of rows. Each row should have a question, its answer, and its answer source. How can I implement Excel into this ..
I am trying to remove stop words using a custom .txt file where all my stop words are contained. My code looks like this: #Removing stop words and cleaning all the columns def get_stop_words(stop_file_path): """load stop words """ with open(stop_file_path, ‘r’, encoding="utf-8") as f: stopwords = f.readlines() stop_set = set(m.strip() for m in stopwords) return ..
Is there a library or link where I can download a pre trained model. My straightforward requirement is to have a function that takes in an n-gram, and gives the probability of occurrence of the n-gram in the english language. Source: Python..
I’m trying to write a python program that will decide if a given post is about the topic of volunteering. My data-sets are small (only the posts, which are examined 1 by 1) so approaches like LDA do not yield results. My end goal is a simple True/False, a post is about the topic or ..