Pandas dataframe: Synthetic data generation

  dataframe, pandas, python, python-3.x, scikit-learn

I have a data frame df that contains 3 classes( classification Problem). The data contains most of the columns as categorical and the dataset is imbalanced. I am trying to generate a synthetic dataset that replicates the characteristics and features of the original data frame.

Q1. Does data.make_classification from scikit-learn can be used to generate synthetic data to balance the imbalanced df?

Q2. Does data.make_classification is used for random data generation only and not reproduce similar data with existing data df?

Source: Python Questions

LEAVE A COMMENT