i have a column of data that has tokenized rows in each column that looks likes this
0 [albert, betty, dave, wonder, jobe]
0 [working, way, chilling, classics].
i have a column of data that has tokenized rows in each column that looks likes this
0 [albert, betty, dave, wonder, jobe]
0 [working, way, chilling, classics].
i’m trying to use the bag of words model to vectorize the these words in order to put it into a ml model, however when i try to pass in my tokenize df column into Countvectorizer or tf-idf,i get this error: " ‘list’ object has no attribute ‘lower’ " the problem is i’m not sure why im getting this error.
here is the code so far:
dft['text'] = dft['text'].apply(word_tokenize)
X = dft['text']
Y = dft['label']
dft['text']=[" ".join(text) for text in dft['text'].values]
text = dft["text"].map(' '.join)
count_v = CountVectorizer(ngram_range=(1,2))
X_train = count_v.fit_transform(X)
X_test = count_v.transform(review_test['text'].values)
tf_idf_version = TfidfTransformer()
X_Train_tf_version = tf_idf_version.fit_transform(X_train)
X_Test_tf_version = tf_idf_version.transform(X_test)
when i run only the text = dft["text"].map(‘ ‘.join) line it doesn’t work, however when i comment that out and run the code with dft[‘text’]=[" ".join(text) for text in dft[‘text’].values] i no longer get the " ‘list’ object has no attribute ‘lower’ error. im just not sure if it’s right. any help will be greatly appreciated
Source: Python Questions