Get Data from MNIST-Data-Files

  database, mnist, python

First of: no, I’m not using tensorflow and won’t be.

I’ve found several downloads, but the one I’m particually interested in is the the official one.

I am trying to make a network from scratch and planned on using this as a test. I need to get them into a tuple with a list of the pixel-grayscales and the correct value ([pixel0, pixel1, ..., pixel784], correct_value).

According to the website you just ignore the first 8 bytes of the label-file and then read out the labels as 1 byte (8bit) Integers. However I haven’t managed to implement this in Python. Or rather: I tried, but I ended up with nonsenical values. E.G. one handwritten digit simply can’t correspond to values 100+

This code is apparently flawed in some way:

with open("train-labels-idx1-ubyte.gz", "rb") as f:
    data = f.read()
    for i in range(8, 12):
        print(date[i])

gives the output:

0 3 116 114

Any help is appreciated

Source: Python Questions

LEAVE A COMMENT