I am using transformers.BertForMaskedLM to further pre-train the BERT model on my custom dataset. I first serialize all the text to a .txt file by separating the words by a whitespace. Then, I am using transformers.TextDataset to load the serialized data with a BERT tokenizer given as tokenizer argument. Then, I am using BertForMaskedLM.from_pretrained() to ..
I am working on Intent classification problem and need your help. I fine-tuned one of the BERT model for text classification. Trained and evaluated it on a small dataset for detecting five intents. I used the following code Intent Recognition with BERT using Keras and TensorFlow 2 It is working fine! I have saved the ..
I am newer to NLP and coding in python. I am trying to train a BERT model on predicting the correct next utterance. I am given a disentangled conversation, and am trying to select the next utterance from a candidate pool of 100 which might not contain the correct next utterance. I am trying to ..
Trying to convert after encoding to integers but they are objects so i first turn them into strings train_df["labels"] = train_df["labels"].astype(str).astype(int) I am getting this error invalid literal for int() with base 10: ‘[0, 1, 0, 0] An example of a row from the dataset is text labels [word1,word2,word3,word4] [1,0,1,0] Source: Python..
[First I would like to thank the community of SO; for whose support I could actually finish making an end to end data science project (including deployment.)] Over to my question. I wanted to create an app that will predict success or failure depending upon textual input data. In order to construct the model; instead ..
I am a beginner in python and NLP I have a file that contains a module. When I give question and answer_text to the module it returns an answer to the question using nlp but the processing time is very large about 15-20 minutes. Is there a hardware problem or function optimation? answer_extractor.py import torch ..
I am new to NLP and I am working on an email classification project. I am programming in python and I use bert for this task, however I have an issues with the email texts. Quite a few of these emails contain disclaimer texts which are not relevant to the email. In many cases the ..
I am working on a project where I have to extract specific information from PDFs file like Document ID, Amount, Processing Fees, Description, Dates, Organization, Authority name, Department and many such things. Here description can be of few lines but other information will be of few characters. Challenge: No two PDFs are of same format ..
At the moment my model gives 3 output tensors. I want two of them to be more cooperative. I want to use the combination of self.dropout1(hs) and self.dropout2(cls_hs) to pass through the self.entity_out Linear Layer. The issue is mentioned 2 tensors are in different shapes. Current Code class NLUModel(nn.Module): def __init__(self, num_entity, num_intent, num_scenarios): super(NLUModel, ..
I am using huggingface transformers models for quite a few tasks, it works good but the only problem is the response time. It takes around 6-7 seconds to generate result while some times it even takes around 15-20 seconds. I tried on google collab using GPU, the performance in GPU is too fast within just ..