Pandas Duplicate() return all duplicates one time except one row

  dataframe, duplicates, pandas, python

I am trying to get all the Nobel prize winners that won more than once since 1901 – 2016. I tried pandas duplicate() method but it return all the duplicates once except the one row or item. I am getting duplicates based on full_name column in DataFrame. I have tried different combinations of parameters but got the same result. I know I can remove that one row manually, but what is happening wrong here. My code is given as

Try-1

lucky_winners = df[df.duplicated(['full_name'])]

Try-2

lucky_winners = df[df.duplicated(['full_name'], keep='first')]

Try-3

lucky_winners = df[df.duplicated(['full_name'], keep='last')]

Same OutPut:

lucky_winners.full_name

62                           Marie Curie, née Sklodowska
215    Comité international de la Croix Rouge (Intern...
340                                   Linus Carl Pauling
348    Comité international de la Croix Rouge (Intern...
424                                         John Bardeen
505                                     Frederick Sanger
523    Office of the United Nations High Commissioner...

The duplicated entity is Comité international de la Croix Rouge (International Committee of the Red Cross). I even checked them for Boolean Comparison and get True. Checked it using

lucky_winners.iloc[1].full_name == lucky_winners.iloc[3].full_name

I can’t get that where is the actual problem.

Source: Python Questions

LEAVE A COMMENT