Category : scikit-learn

I’m using scikit-multilearn for multi-labels classifications problems. Its compatibility with the scikit-learn framework makes it very easy to use. I have noticed that it’s no longer maintained since 2019, causing problems with libraries like scipy and numpy. For example, i can not ever use certain algorithms adapted to multi-labels classification like twin multi-Label Support Vector ..

Read more

I’m working on a dataset which has a feature called categories. The data for each observation in that feature consists of semi-colon delimited list eg. Rows categories Row 1 "categorya;categoryb;categoryc" Row 2 "categorya;categoryb" Row 3 "categoryc" Row 4 "categoryb;categoryc" If I try pd.get_dummies(df,columns=[‘categories’]) I get back columns with the entirety of the data as the ..

Read more

I’m using GridSearchCV to hyperparameter tune my machine learning results: grid_search = GridSearchCV(estimator=xg_clf, scoring=’f1′, param_grid=param_grid, n_jobs=-1, cv=kfold) However, my supervisor wants me to use the Matthews Coefficient to score, which unfortunately is not one of the available options: >>> sorted(sklearn.metrics.SCORERS.keys()) [‘accuracy’, ‘adjusted_mutual_info_score’, ‘adjusted_rand_score’, ‘average_precision’, ‘balanced_accuracy’, ‘completeness_score’, ‘explained_variance’, ‘f1’, ‘f1_macro’, ‘f1_micro’, ‘f1_samples’, ‘f1_weighted’, ‘fowlkes_mallows_score’, ‘homogeneity_score’, ‘jaccard’, ..

Read more

I’m have trained a cluster model using Scikit-learn pipeline and UMAP Learning My model architecture is: transformer = Pipeline([("imputer", SimpleImputer(strategy= "constant", fill_value= 0)), ("scaler",StandardScaler())), ("proyection",UMAP(n_components = 3, random_state = 1990))]).fit(X) model_iso = Pipeline([("scaler", transformer), ("cluster",GaussianMixture(random_state= 1990, n_components= 3))]).fit(X) Then I persist my model with: pickle.dump(model_iso, gzip.open(model_path, "wb")) Everything works fine locally, but I have deployed ..

Read more

I’m have trained a cluster model using Scikit-learn pipeline and UMAP Learning My model architecture is: transformer = Pipeline([("imputer", SimpleImputer(strategy= "constant", fill_value= 0)), ("scaler",StandardScaler())), ("proyection",UMAP(n_components = 3, random_state = 1990))]).fit(X) model_iso = Pipeline([("scaler", transformer), ("cluster",GaussianMixture(random_state= 1990, n_components= 3))]).fit(X) Then I persist my model with: pickle.dump(model_iso, gzip.open(model_path, "wb")) Everything works fine locally, but I have deployed ..

Read more