Category : feature-selection

I have a project related to a Kaggle competition dataset that appears here at the application_train csv https://drive.google.com/drive/folders/1zYotRg3l_m66JQRrGYi1VkuW0A0tfC4K?usp=sharing the goal is to do a logistic regression. Howhever I am having trouble with the data selection. Since I have 122 variables, how would I procede to choose the most relevant ones? data = pd.read_csv("C:/Users/migue/Downloads/application_train.csv") Data_head Thank ..

Read more

I am trying to plot a feature importance plot that provides me with intuitive column names when it comes to their interpretation. Currently, I have a dataset mixed of numerical and 150+ categorical variables. None, of these categorical variables are ordinal. I attempted using get_dummies but I am worried about the dummy trap, speed due ..

Read more

Hi I am trying to get the feature names obtained from SelectKBest algorithm. I am implementing the algorithm inside the pipeline. When I try to access the feature names it shows attribute error. This is my code. import pandas as pd import numpy as np from sklearn.preprocessing import OneHotEncoder from sklearn.preprocessing import MinMaxScaler from sklearn.compose ..

Read more

x_tr = SelectKBest(chi2, k=25).fit_transform(x_tr,y_tr) x_ts = SelectKBest(chi2, k=25).fit_transform(x_ts, y_ts) This is the code I have. I’m worried that it will select different features for the training and testing data. Should I change the code or will it give the same features? Source: Python..

Read more

I’m trying to do cross validation to select the maximum degree of polynomial to add to my features in a Logistic regression. But I get the following error ValueError: could not broadcast input array from shape (928,2) into shape (928,0), which I understand means elements in my input array must be equal to elements of ..

Read more

I am performing feature selection by using two methods: MDI (RandomForest importances) and Feature Permutation, in order to compare what are the features considered relevant for both methods. My dataset is full categorical (features with values 0 o 1). The importances obtained for both methods for an specific feature (the most important one, for instance) ..

Read more