Balance classes in multilabel classification problem

I’m working in a multi-label image classification problem in Keras and I am doing data augmentation to increase the amount of images. The labels are imbalanced and they are one hot encoded ([0 1 0 0 1] [0 0 0 1 1], etc)

So I thought that maybe it could be balanced with class_weights or sample_weights.

It’s the first time that I try to balance classes in a multi-label problem, I have tried a few things but when I fit the model I get an error.

  • I tried to compute the weights of the training samples and use that in the sample_weight parameter of the flow method, but it doesn’t work:
from sklearn.utils.class_weight import compute_sample_weight
sample_weights = class_weight.compute_sample_weight(class_weight = 'balanced', y = train_labels)

trainAug = ImageDataGenerator(rescale = 1.0/255, vertical_flip = True, zoom_range= 0.2, shear_range= 0.2, width_shift_range= 0.2, height_shift_range= 0.2, rotation_range= 20)
trainGen = trainAug.flow(x = train_image_data, y = train_labels, batch_size = 32, sample_weight= sample_weights)

model.fit(trainGen, epochs = 150, verbose = 1, validation_data = valGen, steps_per_epoch = (train_image_data_len / batch_dimension), validation_steps =(valid_image_data_len / batch_dimension))

W tensorflow/core/kernels/data/generator_dataset_op.cc:107] Error occurred when finalizing GeneratorDataset iterator: Failed precondition: Python interpreter state is not initialized. The process may be terminated.
[[{{node PyFunc}}]]
  • I also tried to use the sample_weight parameter directly in the fit method of Keras but that doesn’t work neither:
from sklearn.utils.class_weight import compute_sample_weight
sample_weights = class_weight.compute_sample_weight(class_weight = 'balanced', y = train_labels)
sample_weights_dict = dict(enumerate(sample_weights))

model.fit(trainGen, epochs = 150, verbose = 1, validation_data = valGen, steps_per_epoch = (train_image_data_len / batch_dimension), sample_weight = sample_weights_dict, validation_steps =(valid_image_data_len / batch_dimension))

ValueError: `sample_weight` argument is not supported when using `keras.utils.sequence` as input

Do you got any idea of I could solve this? Thanks

Source: Python Questions

LEAVE A COMMENT