I have dataframe, df, that looks like this:
0 1 2 3 0 Reference Wang et al., 2003 NaN Fieldhouse & Fingas, 2003; Fieldhouse & Fingas...
With the shortened "reference" string.
I need to replace it with the full reference from another dataframe. Let’s say df_ref:
0 1 0 Citation Reference 1 Fieldhouse & Fingas, 2003 Fingas M. and Fieldhouse B. "Studies of t... 2 Fieldhouse & Fingas, 2004 Fingas M. and Fieldhouse B. "Formation of... 3 Fieldhouse & Fingas, 2012 Fingas M. and Fieldhouse B. "Studies on w... 4 Fieldhouse & Khalifa., 2013 Fieldhouse B. and Khelifa A. "Validation ... 5 Hollebone & Yang, 2015 Hollebone B. and Yang Z. "Fingerprinting ...
I’ve trade creating a list from the first dataframe:
df_list = list(filter(lambda x: str(x) !='nan', df.iloc[0,:]))
Then created a dictionary with the reference "name" and full reference from df_ref:
dict_lookup = dict(zip(df_ref.iloc[1:,0], df_ref.iloc[1:,1]))
And then trying to replace the values in df with those from the dictionary if the reference is in the list:
df.iloc[8,5:]=[dict_lookup[item] for item in df_list]
But I get error:
KeyError: 'Fieldhouse & Fingas, 2003; Fieldhouse & Fingas, 2004; Wang et al., 2005; Wang et al., 2003; Yang et al., 2006; Yang et al., 2009'
I think the problem is that there are multiple references in one cell in df. But in the reference list they are all separated.
In the example above the references are separated by semicolons. So that one cell in df has 6 references to be replaced with the long version of them.
So it would need to read the "reference" row in df as semicolon separated strings to compare to the dict_lookup I created?
Source: Python Questions