I wanted to know how can I insert into a sklearn pipeline one step which multiplies two columns values and delete the original ones.

I’m doing something like that.

- After loading the Dataframe, I multiply the target columns and delete them.
- Prepare X, Y, training set and test set.
- Configure pipeline with StandardScaler and some ML method (for example Linear Regression)
- Fit and predict.

```
import pandas as pd, numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import Pipeline
# df is a pandas dataframe with columns A, B, C, Y
df['BC']=df['B']*te['C']
df.drop(columns=['B','C'], inplace=True)
X = df.loc[:,['A','BC']]
Y = df['Y']
x_train, x_test, y_train, y_test = train_test_split(X,Y,train_size=0.8)
pipe = Pipeline([
('minmax',StandardScaler()),
('linear',LinearRegression())
])
pipe.fit(x_train,y_train)
y_pred = pipe.predict(x_test)
```

With this approach, when I want to make some prediction of new data, I must pass the multiplication, for example A=1, B=3, C=4

```
print(pipe.predict(np.array([[1,12]])))
```

And I want an approach like

```
print(pipe.predict(np.array([[1,3,4]])))
```

What I want, is modify pipeline for something like

```
pipe = Pipeline([
('product', CustomFunction(columns_to_multiply, result_name_column)),
('minmax',StandardScaler()),
('linear',LinearRegression())
])
```

Is it possible with scikit-learn or custom functions? How?

Source: Python Questions