Category : scikit-learn

I have 3 classifiers that run over 288 samples. All of them are sklearn.neural_network.MLPClassifier structures. Here is the code i am using: list_of_clfs = [MLPClassifier(…), MLPClassifier(…), MLPClassifier(…)] probas_list = [] for clf in list_of_clfs: probas_list.append(clf.predict_proba(X_test)) Each predict_proba(X_test) will return a 2D array with shape (n_samples, n_classes). Then, i am creating a 3D array that will ..

Read more

I have a preprocessed real estate related DataFrame with the following data types: >> df.dtype OHE_Cat__x0_Single Family Residential Int64 OHE_Cat__x0_Townhouse Int64 OHE_Cat__x1_1 Int64 OHE_Cat__x1_2 Int64 ZIP OR POSTAL CODE Int64 PRICE Int64 BEDS Int64 BATHS Float64 SQUARE FEET Int64 LOT SIZE Int64 YEAR BUILT Int64 LATITUDE Float64 LONGITUDE Float64 Age Int64 TIME VAR UNIX Int64 ..

Read more

I started playing with Boston data set exposed by the API. I looked at the column B which is highly biased. I am not sure if anybody has complained till now. However, can you please stop exposing the data set or if you want to do that, please remove the column. Source: Python..

Read more

I have a dataset containing coordinates and categorical data, such as below: dataset I have searched a lot of paper and journal trying to find explanation regarding which distance measurement method I should apply on my dataset with DBSCAN Algorithm. I’ve been stuck with this problem for days. Please help me out of this problem ..

Read more

I’m trying to tune hyperparameters for a GridSearchCV with param as n_neighbours, checking the candidates against each other with a custom precision-recall-area-under-curve but the code doesn’t seem to do anything from sklearn.model_selection import GridSearchCV from sklearn.model_selection import cross_val_score from sklearn.model_selection import RepeatedStratifiedKFold from sklearn.metrics import precision_recall_curve from sklearn.metrics import auc from sklearn.metrics import make_scorer from ..

Read more

I am participating in a Kaggle multiclass classification competition. The submissions will be scored based on the ‘logloss’ score. I am using Keras and Scikit libraries and a deep learning network model and have taken the below approach. I have corrected class imbalance in the training data using oversampling the minority classes. I have split ..

Read more

enter code here > # Python version import sys print(‘Python: {}’.format(sys.version)) > # scipy import scipy print(‘scipy: {}’.format(scipy.__version__)) > # numpy import numpy print(‘numpy: {}’.format(numpy.__version__)) > # matplotlib import matplotlib print(‘matplotlib: {}’.format(matplotlib.__version__)) > # pandas import pandas print(‘pandas: {}’.format(pandas.__version__)) > # scikit-learn import sklearn print(‘sklearn: {}’.format(sklearn.__version__)) > > # Load libraries import pandas as pd ..

Read more