Category : nlp

While training for word vectors I’m facing the following runtime problems in between my epoch. /usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:12: RuntimeWarning: divide by zero encountered in log if sys.path[0] == ”: /usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:12: RuntimeWarning: invalid value encountered in multiply if sys.path[0] == ”: On checking I found that somehow all embedding matrix values are becoming NaN. How do I resolve ..

Read more

Is it possible to change the langauge (default detection) for tika? I am trying to use a pdf file in tamil. (language code ‘ta’). But tika is detecting it as ‘th’ (thai). Though most characters are recognized well, it not defecting few chars. see example below, where some ‘o’ is appearing in between text. ஓவச ..

Read more

I am a beginner in text processing techniques and I am trying to execute the below code. from keras.layers import Dense, Input, GlobalMaxPooling1D from keras.layers import Conv1D, MaxPooling1D, Embedding from keras.models import Model from keras.layers import Input, Dense, Embedding, Conv2D, MaxPooling2D, Dropout,concatenate from keras.layers.core import Reshape, Flatten from keras.callbacks import EarlyStopping from keras.optimizers import Adam ..

Read more

pattern = [{‘LOWER’:’hello’,’OP’:’?’},{‘LOWER’:’world’},{‘IS_PUNCT’:True}] matcher = Matcher(nlp.vocab) matcher.add(‘Helloworld’,None,pattern) nlp = spacy.load(‘en_core_web_sm’) document = nlp(‘hello World!’) matcher = matcher(document) matcher [(7909505024684541438, 0, 3), (7909505024684541438, 1, 3)] for matcher_id,start,end in matcher: string_id = nlp.vocab.strings[matcher_id] span = document[start:end] print(matcher_id,string_id,start,end,span.text) KeyError Traceback (most recent call last) <ipython-input-151-efe008726218> in <module> 1 for matcher_id,start,end in matcher: —-> 2 string_id = nlp.vocab.strings[matcher_id] 3 ..

Read more