I am working with multiple csv files, each containing multiple 1D data. I have about 9000 such files and total combined data is about 40 GB. I have written a dataloader like this: class data_gen(torch.utils.data.Dataset): def __init__(self, files): self.files = files my_data = np.genfromtxt(‘/data/’+files, delimiter=’,’) self.dim = my_data.shape self.data =  def __getitem__(self, i): file1 ..
I have multiple csv files which contain 1D data and I want to use each row. Each file contains different number of rows. So I have written a dataloader like this: class data_gen(torch.utils.data.Dataset): def __init__(self, files): self.files = files print("FILES: ", type(self.files)) def __getitem__(self, i): print("GETite,") file1 = self.files[i] print("FILE1: ", file1) my_data = np.genfromtxt(‘/data/’+file1, ..
When trying to load some Images for Data Science using: DataLoader.from_folder(image_path) I get following error: File "<stdin>", line 1, in <module> File "/Users/myName/Desktop/mp_env/lib/python3.9/site-packages/tensorflow_examples/lite/model_maker/core/data_util/image_dataloader.py", line 73, in from_folder raise ValueError(‘Image size is zero’) ValueError: Image size is zero >>> In the folder under image_path there are 3 pictures (jpg). Why is it saying size of mage ..
I have the a dataset that gets loaded in with the following dimension [batch_size, seq_len, n_features] (e.g. torch.Size([16, 600, 130])). I want to be able to shuffle this data along the sequence length axis=1 without altering the batch ordering or the feature vector ordering in PyTorch. Further explanation: For exemplification let’s say my batch size ..
How to put the x_train and y_train into a model for training? The x_train is a tensor of size (3000, 13). The y_train is of size (3000, 1) That is for each element of x_train (1, 13), the respective y label is one digit from y_train. if I do: train_data = torch.hstack((train_feat, train_labels)) print(train_data.shape) print(train_data.shape) ..
I am trying to loop over a set of graphs as shown in the snippet from the main script below: import pickle for timeOfDay in range(1440): # total number of minutes in a day=1440 with open("G_" + timeOfDay +’.pickle’, ‘rb’) as handle: G = pickle.load(handle) ## do something with G I could load all the ..
I’m quite new to programming and have now clue where my error comes from. I got the following code to set up my dataset for training my classifier: class cows_train(Dataset): def __init__(self, folder_path): self.image_list = glob.glob(folder_path+’/content/cows/train’) self.data_len = len(self.image_list) def __getitem__(self, index): single_image_path = self.image_list[index] im_as_im = Image.open(single_image_path) im_as_np = np.asarray(im_as_im)/255 im_as_np = np.expand_dims(im_as_np, 0) ..
I have a dataframe with some (35 columns) Variables and millions (rows) of timesteps. I would like to cut the data with the timeseriesgenerator (https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/sequence/TimeseriesGenerator). I have done it before for some neural nets. I would like to stick to the generator, because I need to skip some samples, but how many isn’t clear at ..
The following is part of the code, epoch=300, each npz file is 2.73M, but the batch size of my dataloader gives 64, a total of 8 gpuss, so a mini batch should be 64×8×2.73M≈1.1G, my actual memory is 128G. Even if it becomes larger after decompression, it will not reach the size of 128G. The ..
The code below is part of the data set code I defined, which neglect the getitem(self, index) part. But when I train the model with the dataset, because the dataset is too large, cpu memory can not support the dataset. So I wondering how to modify my datset code! class VimeoDataset(Dataset): def __init__(self, dataset_name, batch_size=32): ..