How do I map a numpy array and an indices array to a pandas dataframe?

  dataframe, numpy, pandas, python, scikit-learn

I have been following this tutorial on how to find nearest neighbors of a point with scikit.

However, when it comes to displaying the data, the tutorial merely mentions that "the indices can be mapped to useful values and the two arrays merged with the rest of the data"

But there’s no actual explanation on how to do this. I’m not very well-versed in Pandas and I don’t know how to perform this merge, so I just end up with 2 multidimensional arrays and I don’t know how to map them to the original data to study the example and experiment with it.

This is the code

import numpy as np
from sklearn.neighbors import BallTree, KDTree
import pandas as pd

# Column names for the example DataFrame.
column_names = ["STATION NAME", "LAT", "LON"]

# A list of locations that will be used to construct the binary
# tree.
locations_a = [['BEAUFORT', 32.4, -80.633],
       ['CONWAY HORRY COUNTY AIRPORT', 33.828, -79.122],
       ['HUSTON/EXECUTIVE', 29.8, -95.9],
       ['ELIZABETHTON MUNI', 36.371, -82.173],
       ['JACK BARSTOW AIRPORT', 43.663, -84.261],
       ['MARLBORO CO JETPORT H E AVENT', 34.622, -79.734],
       ['SUMMERVILLE AIRPORT', 33.063, -80.279]]

# A list of locations that will be used to construct the queries.
# for neighbors.
locations_b = [['BOOMVANG HELIPORT / OIL PLATFORM', 27.35, -94.633],
       ['LEE COUNTY AIRPORT', 36.654, -83.218],
       ['ELLINGTON', 35.507, -86.804],
       ['LAWRENCEVILLE BRUNSWICK MUNI', 36.773, -77.794],
       ['PUTNAM CO', 39.63, -86.814]]

# Converting the lists to DataFrames. We will build the tree with
# the first and execute the query on the second.

locations_a = pd.DataFrame(locations_a, columns = column_names)
locations_b = pd.DataFrame(locations_b, columns = column_names)

# Creates new columns converting coordinate degrees to radians.
for column in locations_a[["LAT", "LON"]]:
    rad = np.deg2rad(locations_a[column].values)
    locations_a[f'{column}_rad'] = rad
for column in locations_b[["LAT", "LON"]]:
    rad = np.deg2rad(locations_b[column].values)
    locations_b[f'{column}_rad'] = rad

# Takes the first group's latitude and longitude values to construct
# the ball tree.
ball = BallTree(locations_a[["LAT_rad", "LON_rad"]].values, metric='haversine')

# The amount of neighbors to return.
k = 1

# Executes a query with the second group. This will also return two
# arrays.
distances, indices = ball.query(locations_b[["LAT_rad", "LON_rad"]].values, k = k)
#converting to kilometers
distances = distances * 6.371

So how do I take distances and indices and map them to my dataframe to visually see the nearest neighbor of each point?

Source: Python Questions

LEAVE A COMMENT