Category : data-analysis

I am using Spyder and the following source code: import pandas as pd filename = "file.csv" # 5.35 GB in size df = pd.read_csv(filename, nrows=5) pd.set_option(‘display.max_columns’, None) df Output runfile(‘C:/Users/pc/Desktop/Data Mining/file.py’, wdir=’C:/Users/pc/Desktop/Data Mining/’) Reloaded modules: jupyter_client.session, zmq.eventloop, zmq.eventloop.ioloop, tornado.platform, tornado.platform.asyncio, tornado.gen, zmq.eventloop.zmqstream, jupyter_client.jsonutil, jupyter_client.adapter, spyder, spyder.pil_patch, PIL, PIL._version, PIL.Image, PIL.ImageMode, PIL.TiffTags, PIL._binary, PIL._util, PIL._imaging, cffi, ..

Read more

I have a set of emails with extracted array of keywords and with metalabel. I want to use HDBSACN in python to make topic clustering but I cannot find any example whit is corect format of data to use in hdbscan. class Mail(object): id = 1 keywords = [("word1",0.45),("word2",0.36)…] metalabel = "metalabel" hdbscan.HDBSCAN(min_cluster_size=5, metric=’euclidean’, cluster_selection_method=’eom’).fit(???) ..

Read more