Dataset size is smaller than memory, What’s wrong with my code?

The following is part of the code, epoch=300, each npz file is 2.73M, but the batch size of my dataloader gives 64, a total of 8 gpuss, so a mini batch should be 64×8×2.73M≈1.1G, my actual memory is 128G. Even if it becomes larger after decompression, it will not reach the size of 128G. The following figure link shows that all 128G of memory is occupied. How should I change my code?

class VimeoDataset(Dataset):
def __init__(self, dataset_name, batch_size=64):
    self.batch_size = batch_size
    self.path = '/data/dachunkai/train_sample/dataset/'
    self.dataset_name = dataset_name
    self.load_data()
    self.h = 256
    self.w = 448
    xx = np.arange(0, self.w).reshape(1,-1).repeat(self.h,0) #xx shape is(256,448)
    yy = np.arange(0, self.h).reshape(-1,1).repeat(self.w,1) #yy shape is(448,256)
    self.grid = np.stack((xx,yy),2).copy()

def __len__(self):
    return len(self.meta_data)

def load_data(self):
    self.train_data = []
    self.flow_data = []
    self.val_data = []
    for i in range(10000):
        f = np.load('/data/train_sample/dataset/{}.npz'.format(i))
        if i < 8000:
            self.train_data.append(f['i0i1gt'])
            self.flow_data.append(f['ft0ft1'])
        else:
            self.val_data.append(f['i0i1gt'])
    if self.dataset_name == 'train':
        self.meta_data = self.train_data
    else:
        self.meta_data = self.val_data
    self.nr_sample = len(self.meta_data)   

    dataset = VimeoDataset('train')

sampler = DistributedSampler(dataset) 
train_data = DataLoader(dataset, batch_size=args.batch_size, num_workers=8, pin_memory=True, 
drop_last=True, sampler=sampler)

system usage figure

Source: Python Questions

LEAVE A COMMENT