Error making a prediction with a single data instance – xgboost python

  machine-learning, python, xgboost
import pandas as pd
from xgboost import XGBRegressor
from sklearn.model_selection import train_test_split
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder


# split df into train and test
X_train, X_test, y_train, y_test = train_test_split(df.iloc[:,0:21], df.iloc[:,-1], test_size=0.2, random_state=42)

# check shape
X_train.shape
(1485, 21)

X_test.shape
(372, 21)

# Encode categorical variables

cat_vars = ['cat1','cat2'......]
cat_transform = ColumnTransformer([('cat', OneHotEncoder(handle_unknown='ignore'), cat_vars)], remainder='passthrough')

encoder = cat_transform.fit(X_train)
X_train = encoder.transform(X_train)
X_test = encoder.transform(X_test)


# check shape
X_train.shape
(1485, 401)

X_test.shape
(372, 401)

Why does the shape change to 401 columns from 21 columns with OneHotEncoding? My understanding was that it replaces the categorical variables with numeric variables representing the category.

    # Fit model
   
    model = XGBRegressor()
    model.fit(X_train, y_train)

When I attempt to make a prediction using a single test data point, I get an error. Mock test data point below:

    df = pd.DataFrame({
                           
                           'Year': 2000,
                           'Size': 200,
                           'Score': 60,
                           'loc': 'L1',
                           'Status': 'Yes',
                           ....
                           ....
                           
                    }, index=[0])


encoder = cat_transform.fit(df)
df_test = encoder.transform(df)

len(df_test[0])
21

y = rsearch.predict(df_test[0])


XGBoostError: [17:01:50] /Users/travis/build/dmlc/xgboost/src/predictor/cpu_predictor.cc:341: Check failed: m->NumColumns() == model.learner_model_param->num_feature (1 vs. 401) : Number of columns in data must equal to trained model.
Stack trace:

Source: Python Questions

LEAVE A COMMENT