#### Trying to code the nearest neighbours algorithm – euclidean distance function only calculates the distances for one row of the test set – why?

I am trying to code the Nearest Neighbours Algorithm from scratch and have come across a problem – my algorithm was only giving the index/classification of the nearest neighbour for one row/point of the the training set. I went through every part of my code and realised that the problem is my Euclidean distance function. It only gives the result for one row.

This is the code I have written for Euclidean distance;

``````def euclidean_dist(r1, r2):
dist = 0
for j in range(0, len(r2)-1):
dist = dist + (r2[j] - r1[j])**2
return dist**0.5
``````

Within my Nearest Neighbours algorithm this is the implementation of this Euclid distance function;

``````for i in range(len(x_test)):
dist1 = []
dist2 = []
for j in range(len(x_train)):
distances = euclidean_dist(x_test[i], x_train[j,:])
dist1.append(distances)
dist2.append(distances)
dist1 = np.array(dist1)
sorting(dist1) #separate sorting function to sort the distances from lowest to highest,
#the aim was to get one array, dist1, with the euclidean distances for each row sorted
#and one array with the unsorted euclidean distances, dist2, (to be able to search for index later in the code)
``````

I noticed the problem when using the iris dataset and trying out this part of the function with it. I split the data set into test and training (X_test and X_train and y_test).

When this was implemented with the data set I got the following array for dist2;

``````[0.3741657386773946,
1.643167672515499,
3.389690251335658,
2.085665361461421,
1.284523257866513,
3.9572717874818752,
0.9539392014169458,
3.5805027579936315,
0.7211102550927979,
...
0.8062257748298555,
0.4242640687119287,
0.5196152422706631]
``````

Its length is 112 which is the same length as X_train, but these are only the Euclidean distances for the first row or point of the X_test set. The dist1 array is the same except it is sorted.

Why am I not getting the Euclidean distances for every row/point of the test set? I thought I iterated through correctly with the for loops, but clearly something is not quite right. Any advice or help would be appreciated.

Source: Python Questions