Python – Pandas – String Replacement with regex | (OR)

  dataframe, pandas, python, replace, string

I’ve got 2 Dataframes :

df_bisID = pd.DataFrame({'A': ['ID1#ID2', 'ID3#ID4'],
                         'B': ['ID5#ID6', 'ID7#ID8'],
                         'C': ['ID9#ID10', 'ID11#ID12'],
                         'D': ['ID13#ID14', 'ID15#ID16']})

    A       B       C           D
0   ID1#ID2 ID5#ID6 ID9#ID10    ID13#ID14
1   ID3#ID4 ID7#ID8 ID11#ID12   ID15#ID16


df_eliminateID = pd.DataFrame({'A': ['ID1', 'ID3', 'ID9', 'ID11'],
                               'B': ['ID5', 'ID7', 'ID13', 'ID15']})

    A    B
0   ID1  ID5
1   ID3  ID7
2   ID9  ID13
3   ID11 ID15

The IDs are complicated, with alphanumerics, and unique (‘xxxx-xxxxxxxxxx-xxx’+ facultative suffix ‘-xxx-xxx’). I propose an example with simplified IDs.

In df_bisID I need to get rid of one of both ID and #, in each column, where it’s respectively contained in columns A or B in df_eliminateID.

So I want to replace ‘ID1#ID2’ by ‘ID2’ and ‘ID9#ID10’ by ‘ID10’, etc.

Note that it also could be ‘ID2#ID1’ instead of ‘ID1#ID2’ and it’s complicating the replacement conditions, that’s also the cause of this question string replacement.

A logical solution to my problem is :

for i in df_eliminateID['A']:
    df_bisID.replace(i+'#|#'+i, '')
for i in df_eliminateID['B']:
    df_bisID.replace(i+'#|#'+i, '')

But at the end, my df_bisID is empty, that’s why I ask your help with the use of | regex in my string replacement, I want to replace IDX# or #IDX by ” to conserve only one of the IDs.
The OR is | in regex, maybe I didn’t used it the right way (is there an exclusif or in regex ?)

In advance, thanks for your help.

Source: Python Questions

LEAVE A COMMENT