I am currently in love with working with hadoop clusters… I am new to this hadoop world I have worked with hadoop java configuration but as I need to implement with python as well. From few days I have been trying to implement the dgim algorithm with mapreduce but unable to find out the solution. ..
I am fairly new to ML and NLP. I am doing a student project: extracting certain information from an OCR dump text file (csv) for EDA in Python. The file is as below: I have ~ 2000 such observations and the number of lines is not consistent (some more/some less)…also basically the quality of the ..
I’m trying to fit the GaussianNB() classifier to a dataset but I’m running into this error: ————————————————————————— ValueError Traceback (most recent call last) /tmp/ipykernel_8812/3612414008.py in <module> 11 print("y_test : " + str(y_test.shape)) 12 —> 13 y_pred = gnb.fit(X_train, y_train).predict(X_test) 14 15 print("Number of mislabeled points out of a total %d points : %d" % (X_test.shape, ..
I am trying to import a csv file data in my python code. I try: Berea= pd.read_csv(‘Berea.csv’) and everything seemed okay; but when I wanted to analyze data for a column (deltaP), I figured out that the imported data is not the same as the file. I have 5 distinct values for this parameter in ..
I am new to Data Mining and I am seem to get errors on every single classifier and I am unable to see what I am doing wrong. I am using the Enron data (Enron 1 – 5) and trying to create a spam filter that can detect spam. Lets take the Naïve Bayes as ..
Full error: ValueError: Input 0 of layer sequential is incompatible with the layer: expected axis -1 of input shape to have value 20 but received input with shape (None, 1) The Issue I have been struggling a lot with setting up a neural network as it constantly complains about the shape received. The x_trian and ..
For my current project, I have two folders: Training Folder not spam spam Testing Folder not spam spam I then read everything from the training folder into one array (with labels spam and not spam) and do the same for the testing folder. I want to use one specifically for testing and one for branching. ..
I’m working on implementing the FP growth algorithm, and currently I can get an FP tree set up from a set of transactions. The next step is mining the prefix paths and building trees from them. Here’s my Node class: class Node: def __init__(self, name, count, parent): self.name = name self.count = count self.parent = ..
I drew the following diagram for my data, but the Arabic letters are written in reverse. How can I correct them? import pandas as pd pd.value_counts(my_input).plot.bar(figsize=(30, 3)) Source: Python..
I am looking for a possible way to extract information about DRGs (diagnoses related groups) out of a specification PDF. Sadly the institute providing this information provides no machine readable format. Thus I need to parse it, or even worse extract the information manually. The document has about 1500 pages containing text, images and tables ..