Summing up multiple values in single row

  dataframe, pandas, pandas-groupby, python

Given a dataframe such as this, is it possible to add up the countries specific value even if there are multiple countries in one row? For example, for the 1st row Japan and USA are present, so i would want the value to be Japan=1 USA=1

import pandas as pd
import numpy as np

countries=["Europe","USA","Japan"]
data= {'Employees':[1,2,3,4],
    'Country':['Japan;USA','USA;Europe',"Japan","Europe;Japan"]}
df=pd.DataFrame(data)
print(df)

patt = '(' + '|'.join(countries) + ')'
grp = df.Country.str.extractall(pat=patt).values
new_df = df.groupby(grp).agg({'Employees': sum})
print(new_df)

I have tried this but it returns a grouper and axis must be same length error. Is this the correct way to do it?

ValueError                                Traceback (most recent call last)
<ipython-input-81-53e8e9f0f301> in <module>()
     10 patt = '(' + '|'.join(countries) + ')'
     11 grp = df.Country.str.extractall(pat=patt).values
---> 12 new_df = df.groupby(grp).agg({'Employees': sum})
     13 print(new_df)

    4 frames
    /usr/local/lib/python3.7/dist-packages/pandas/core/groupby/grouper.py in _convert_grouper(axis, grouper)
        842     elif isinstance(grouper, (list, Series, Index, np.ndarray)):
        843         if len(grouper) != len(axis):
    --> 844             raise ValueError("Grouper and axis must be same length")
        845         return grouper
        846     else:

Thus, i would like the end result to be
Japan: 8
Europe:6
USA:3

Thanks

Source: Python Questions

LEAVE A COMMENT