Feature Selection For Linear Regression

  jupyter-notebook, numpy, pandas, python, regression

I am doing a feature selection for a linear regression model.
I have the training data and the test data in seperate dataframes. The training data frame ends at the end of january and the test data frame at the end of february.
The goal is to select a promising set of features to train the model on the january dataframe and test the linear regression prediction on the february data frame.

I am doing the feature selection and model prediction like this:

feature_selector = SequentialFeatureSelector(RandomForestClassifier(n_jobs=-1),
           k_features=15,
           forward=True,
           verbose=2,
           scoring='roc_auc',
           cv=4)
filtered_features= X_train.columns[list(features.k_feature_idx_)]
model = LinearRegression().fit(january[filtered_features], january['january_column'])
y_preds = model.predict(february[['february_column']])
print('RMSE:', metrics.mean_squared_error(february['february_column'], y_preds, squared=False))

At the time I am getting the following error when I try to calculate y_preds:

matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 15 is different from 1)

I am not sure if this is the correct way to go about it?

Help much appreciated!

Source: Python Questions

LEAVE A COMMENT