Category : pipeline

I am getting the below error when I call num_pipeline.fit_transform(data_num). AttributeError: ‘numpy.ndarray’ object has no attribute ‘fit’ import pandas as pd import numpy as np data = pd.read_csv("CustomTransformerData.csv") from sklearn.base import BaseEstimator,TransformerMixin #column index x1_ix, x2_ix, x3_ix, x4_ix, x5_ix = 0,1,2,3,4 class Assignment4Transformer(BaseEstimator, TransformerMixin): #the constructor def __init__(self, add_x6 = True, y = None): self.add_x6 ..

Read more

Inspired by pandas .pipe() functionality I want to implement a similar logic for a given initial value and provided functions. Usage should look similar to this: result = function_pipeline( init_value, func1(args1, kwargs1), func2(args2, kwargs2), … ) The result should correspond to func2(func1(init_value, args1, kwargs1), args2, kwargs2). Further requirement is, that the functions should still be ..

Read more

I am trying to evaluate scores for a series of models and have put together a pipeline for processing the data and fitting the models. My code is: numeric_transformer = Pipeline(steps=[ (‘imputer’, SimpleImputer(strategy=’median’)), (‘scaler’, StandardScaler())]) categorical_transformer = Pipeline(steps=[ (‘encoder’, LabelEncoder()), (‘onehot’, OneHotEncoder(handle_unknown=’ignore’))]) preprocessor = ColumnTransformer(transformers=[ (‘num’, numeric_transformer, selector(dtype_include="number")), (‘cat’, categorical_transformer, selector(dtype_exclude="number")) ]) scoring = {‘acc’: ..

Read more

I wrote multiple steps to impute a dataset, and I want to create a pipeline for these steps and also serialize/pickle the pipeline so that it can be loaded when analyzing a new sample. The steps I did for imputation are: imputer = MissForest() imputed_data = imputer.fit_transform(data) imputed_data = pd.DataFrame(imputed_data, columns=data.columns) #Drop ‘id’ imputed_data_initial = imputed_data.drop(‘id’, axis ..

Read more

Hi I am trying to learn the concept of pipeline. I have read a csv file https://www.kaggle.com/zhangjuefei/birds-bones-and-living-habits and want to apply pipeline for pre-processing and classification. I have been referring sklearn’s official documentation for pipeline.This is the code I used in google colab. import pandas as pd data1 = pd.read_csv(‘/content/drive/MyDrive/Colab Notebooks/data/bird.csv’) from sklearn.compose import ColumnTransformer ..

Read more

I have a pipeline for numerical and categorical data transformation and then later I’m doing FeatureUnion of the two pipelines to form one complete end to end pipeline for data transformation. In one of the transformation steps I’m creating 1 or 2 additional data attributes….mostly extracting the year difference and months from datetime.date attributes. Since ..

Read more

I am using coluntransfomer to and selectkbest on for my pipeline e.g.: numeric_transformer = Pipeline(steps=[(‘imputer’,SimpleImputer(missing_values=np.nan, fill_value=0) ),(‘scaler’, StandardScaler())]) preprocessor = ColumnTransformer(transformers=[(‘num’, numeric_transformer, numeric_cols), (‘cat’, categorical_transformer, cat_cols)]) sel = SelectKBest(k=60) models_new = { ‘xgb’: Pipeline(steps=[(‘preprocessor’, preprocessor),(‘sel’,sel),(‘clf’, XGBClassifier(objective=’multi:softprob’,n_jobs=-1))]),#tree_method=’gpu_hist’ ‘rf’: Pipeline(steps=[(‘preprocessor’, preprocessor),(‘sel’,sel),(‘clf’, RandomForestClassifier(criterion = ‘entropy’, random_state = 42,n_jobs=-1))]) } I then run gridsearch on the above using hypyerclassifersearch ..

Read more

I’m using an imblearn resampling transformer inside preprocessing pipeline, after that for feature selection I’m trying to select the best PCA components using sklearn’s SelectFromModel and it produce a ValueError: Found input variables with inconsistent numbers of samples: [16510, 10127] when fitting the pipeline. To debug this I tried combination between the use or not ..

Read more