Category : group-by

I am using Spark Koalas to explore and analyze a large dataset of products. One thing I would like to do is to sort product numbers and add a column that has product ranking, based on their numbers. Here is an example dataset: import databricks.koalas as ks ks.set_option(‘compute.default_index_type’, ‘distributed’) ks.set_option(‘compute.ops_on_diff_frames’, True) data_df = ks.DataFrame( { ..

Read more

I have a df called "df_filled_weeks" with columns "retailer_id", "store_id", "chain", "week_id" (and others). I am trying to group my data first by retailer_id then store_id and then sort by descending week_id within each group. I tried: df_filled_weeks.groupby([‘retailer_id’, ‘spins_store_id’]).apply(lambda x: x.sort_values(["week_id"], ascending = False)) but this does not seem to work. Is there another way ..

Read more

I have a dataframe: data = {‘first_column’: [‘first_value’, ‘second_value’, …], ‘second_column’: [‘yes’, ‘no’, …], ‘third_column’: [‘first_value’, ‘second_value’, …], ‘fourth_column’: [‘yes’, ‘no’, …], } I’m trying to groupby ‘first_column’, when values in ‘second_column’ and ‘fourth_column’ == ‘yes’ and I get an error: "TypeError: unsupported operand type(s) for &: ‘list’ and ‘list’ " I receive no errors ..

Read more

I have a dataframe: date type 2021-08-12 fail 2021-08-12 fail 2021-08-12 win 2021-08-12 great_win 2021-08-13 fail 2021-08-13 win 2021-08-13 win 2021-08-13 win I want to calculate percentage of each ‘type’ within date group and then average values among all dates. So desired results must be: date type type_perc 2021-08-12 fail 0.5 2021-08-12 win 0.25 2021-08-12 ..

Read more