How to replace strings in one dataframe with values from another dataframe given a reference?

  pandas, python

I have dataframe, df, that looks like this:

    0           1                   2   3
0   Reference   Wang et al., 2003   NaN Fieldhouse & Fingas, 2003; Fieldhouse & Fingas...

With the shortened "reference" string.
I need to replace it with the full reference from another dataframe. Let’s say df_ref:

    0           1
0   Citation    Reference
1   Fieldhouse & Fingas, 2003   Fingas M. and Fieldhouse B. "Studies of t...
2   Fieldhouse & Fingas, 2004   Fingas M. and Fieldhouse B. "Formation of...
3   Fieldhouse & Fingas, 2012   Fingas M. and Fieldhouse B. "Studies on w...
4   Fieldhouse & Khalifa., 2013 Fieldhouse B. and Khelifa A. "Validation ...
5   Hollebone & Yang, 2015  Hollebone B. and Yang Z. "Fingerprinting ...

I’ve trade creating a list from the first dataframe:

df_list = list(filter(lambda x: str(x) !='nan', df.iloc[0,:]))

Then created a dictionary with the reference "name" and full reference from df_ref:

dict_lookup = dict(zip(df_ref.iloc[1:,0], df_ref.iloc[1:,1]))

And then trying to replace the values in df with those from the dictionary if the reference is in the list:

df.iloc[8,5:]=[dict_lookup[item] for item in df_list]

But I get error:

KeyError: 'Fieldhouse & Fingas, 2003; Fieldhouse & Fingas, 2004; Wang et al., 2005; Wang et al., 2003; Yang et al., 2006; Yang et al., 2009'

I think the problem is that there are multiple references in one cell in df. But in the reference list they are all separated.
In the example above the references are separated by semicolons. So that one cell in df has 6 references to be replaced with the long version of them.

So it would need to read the "reference" row in df as semicolon separated strings to compare to the dict_lookup I created?

Source: Python Questions

LEAVE A COMMENT