Python: Memory efficient, quick lookup in python for 100 million pairs of data?

  dataframe, dictionary, numpy, pandas, python

This is my first time asking a question on here, so apologies if I am doing something wrong.

I am looking to create some sort of dataframe/dict/list where I can check if the ID in one column has seen a specific value in another column before.

For example for one pandas dataframe like this (90 million rows):

ID  Another_ID
1   10
1   20
2   50
3   10
3   20
4   30

And another like this(10 million rows):

ID  Another_ID
1   30
2   30
2   50
2   20
4   30
5   70         

I want to end up with a third column that is like this:

ID  Another_ID seen_before
1   30         0
2   30         0
2   50         1
2   20         0
4   30         1
5   20         0

I am looking for a memory efficient but quick way to do this, any ideas? Thanks!

Source: Python Questions

LEAVE A COMMENT