Pandas-merging rows based on conditions

  analytics, customer, pandas, python, xcode

I’m trying to analyze the difference of items bought(if there is any) between returning and non-returning customers(userId). The default data frame looks like this. Dataframe firstOrder should contain only products from customers that have more than one order(order code) and only their first order. So basically first order from returning customers.
Dataframe noReturn should contain products ordered by customers that never returned.
Problem is that each product bought has its unique row. So I need to drop all items when I need to work with duplicates and then I need to contact, merge? them back after I drop duplicated or non duplicated values.
I don’t know what is wrong. I don’t have any errors it just keeps duplicated orders from customers and that is the wrong thing in both dataframes
[enter image description here][1]

firstOrder = products.copy(deep=True)
firstOrder = firstOrder.drop(firstOrder[firstOrder.itemName.str.contains(r'(Průhledná mandalová samolepka|Průhledná samolepka|Mandalový zápisníček|Mandalové přáníčko|Plastová mandalová šablona pro tečkování (13x13 cm))')].index)
firstOrder.drop(firstOrder.loc[firstOrder['itemName']=="Plastová mandalová šablona pro tečkování (13x13 cm)"].index, inplace=True)
firstOrder.drop(firstOrder.loc[firstOrder['itemStatusName']=="Stornována"].index, inplace=True)
#firstOrder = firstOrder.drop_duplicates(subset=["itemName", "orderCode"])

firstOrder1 = firstOrder.copy(deep=True)
firstOrder1 = firstOrder1 = firstOrder1.drop_duplicates(subset=["itemName", "orderCode"])
firstOrder1 = firstOrder1[["itemName", "orderCode"]]

firstOrder = firstOrder.drop_duplicates(subset=["userId", "orderCode"])
firstOrder = firstOrder[firstOrder.duplicated(subset=['userId'], keep=False)]
dropnonUnique = firstOrder['userId'].value_counts()
firstOrder[firstOrder['userId'].isin(dropnonUnique[dropnonUnique>1].index)]
firstOrder.drop_duplicates(subset=['userId'], keep = "first")
firstOrder.dropna(subset=["userId"])
firstOrder = pd.merge(firstOrder, firstOrder1, on = ["orderCode"], how="left") 



noReturn = products.copy(deep=True)
noReturn = noReturn.drop(noReturn[noReturn.itemName.str.contains(r'(Průhledná mandalová samolepka|Průhledná samolepka|Mandalový zápisníček|Mandalové přáníčko|Plastová mandalová šablona pro tečkování (13x13 cm))')].index)
noReturn.drop(noReturn.loc[noReturn['itemName']=="Plastová mandalová šablona pro tečkování (13x13 cm)"].index, inplace=True)
noReturn.drop(noReturn.loc[noReturn['itemStatusName']=="Stornována"].index, inplace=True)


noReturn1 = noReturn.copy(deep=True)
noReturn1 = noReturn1 = noReturn1.drop_duplicates(subset=["itemName", "orderCode"])
noReturn1 = noReturn1[["itemName", "orderCode"]]

noReturn = noReturn.drop_duplicates(subset=["userId", "orderCode"])
noReturn.drop_duplicates(subset=['userId'], keep= False)

noReturn = pd.merge(noReturn, noReturn1, on = ["orderCode"], how="left") ```


 [1]: https://i.stack.imgur.com/2QSMZ.png

Source: Python Questions

LEAVE A COMMENT