find all sets of matches in two numpy arrays

  numpy, python

I have 2 numpy arrays as following:

#[ 3  5  6  8  8  9  9  9 10 10 10 11 11 12 13 14] #rows
#[11  7 11  4  7  2  4  7  2  4  7  4  7  7 11 11] #cols

I want to find all sets of matches e.g:

3 6 13 14 from rows match 11 in cols

5 8 9 10 11 12 from rows match 2 4 7 in cols

Is there a direct numpy way to do this? There are no blank values and row and col size will be same.

What I have tried (loops and not most efficient):

#first get array of indices, sorted by unique element
idx_sort = np.argsort(cols)

# sorts records array so all unique elements are together 
sorted_records_array = cols[idx_sort]

# returns the unique values, the index of the first occurrence of a value, and the count for each element
vals, idx_start, count = np.unique(sorted_records_array, return_counts=True, return_index=True)

# splits the indices into separate arrays
res = np.split(idx_sort, idx_start[1:])

#Using looping I use intersections and concatenate to group sets:
for cntr,itm in enumerate(res):
    idx = rows[itm]
    for cntr2,itm2 in enumerate(res):
        if cntr != cntr2:
            intersectItems = np.intersect1d(rows[itm], rows[itm2])
            if intersectItems.size > 0:
                #print('intersectItems',intersectItems)
                res[cntr] = np.unique(np.concatenate((res[cntr], res[cntr2]), axis=0))

I will further need to find and remove duplicates as my output here is [ 3 6 13 14],[11 11 11 11] …

Source: Python Questions

LEAVE A COMMENT