Count occurrences of same values with condition

  dataframe, pandas, python, python-3.x, valueerror

Following a previous question

I’m trying to analyze the number of occurrences of price areas, i.e., I’m trying to count the number of hours that different price areas converge. This is my data:

df1:

                            Time    DK1    DK2  ...    SE3    SE4     FI
0      2017-01-01T00:00:00+01:00  20.96  20.96  ...  24.03  24.03  24.03
1      2017-01-01T01:00:00+01:00  20.90  20.90  ...  24.03  24.03  24.03
2      2017-01-01T02:00:00+01:00  18.13  18.13  ...  24.02  24.02  24.02
3      2017-01-01T03:00:00+01:00  16.03  16.03  ...  23.19  23.19  23.19
4      2017-01-01T04:00:00+01:00  16.43  16.43  ...  24.10  24.10  24.10
...                          ...    ...    ...  ...    ...    ...    ...
35059  2020-12-31T19:00:00+01:00  59.47  59.47  ...  25.72  58.04  35.32
35060  2020-12-31T20:00:00+01:00  56.70  56.70  ...  24.84  54.45  24.84
35061  2020-12-31T21:00:00+01:00  52.44  52.44  ...  24.77  51.18  28.00
35062  2020-12-31T22:00:00+01:00  51.86  51.86  ...  24.61  45.84  26.55
35063  2020-12-31T23:00:00+01:00  52.26  52.26  ...  24.07  24.07  24.07

[35064 rows x 13 columns]

df2:

                            Time    DK1    DK2  ...    SE3    SE4     FI
0      2017-01-01T00:00:00+01:00    NaN    NaN  ...    NaN    NaN    NaN
1      2017-01-01T01:00:00+01:00  29.40  29.40  ...  29.40  29.40  31.10
2      2017-01-01T02:00:00+01:00  24.73  24.73  ...  24.73  24.73  30.47
3      2017-01-01T03:00:00+01:00  24.73  24.73  ...  24.73  24.73  30.00
4      2017-01-01T04:00:00+01:00  27.89  27.89  ...  27.89  27.89  30.00
...                          ...    ...    ...  ...    ...    ...    ...
35059  2020-12-31T19:00:00+01:00    NaN    NaN  ...    NaN    NaN    NaN
35060  2020-12-31T20:00:00+01:00    NaN    NaN  ...    NaN    NaN    NaN
35061  2020-12-31T21:00:00+01:00    NaN    NaN  ...    NaN    NaN    NaN
35062  2020-12-31T22:00:00+01:00    NaN    NaN  ...    NaN    NaN    NaN
35063  2020-12-31T23:00:00+01:00    NaN    NaN  ...    NaN    NaN    NaN

[35064 rows x 13 columns]

I would like to be able to count the number of times that combination of areas have converging prices in df1. Furthermore, then count the number of hours for each combination where the price in the converging area differs from any price within the same hour in df2. Meaning that if fx DK1, DK2 & and SE3 have the same price in df1 AND fx the df1 value differs from either of the df2 values within that same hour, it is an occurrence.

This is what I’ve tried so far:

s = ubalancepris_nordic.columns.to_series()
s.index = s.index.str.replace('d+','', regex=True)

d = s.groupby(level=0).agg('-'.join).to_dict()
d = {v:k for k, v in d.items()}

convergence_dk1 = (ubalancepris_nordic.eq(ubalancepris_nordic['DK1'], axis=0)
        .dot(ubalancepris_nordic.columns + '-')
        .str[:-1]
        .to_frame('Combo')
        .value_counts()
        .unstack(0, fill_value=0)
        .add_prefix('Occurrence ')
        .rename_axis(columns = None)
        .reset_index()
        )

However, I’m receiving this message:

ValueError: zero-size array to reduction operation maximum which has no identity

Hopefully, someone can help me.

Source: Python-3x Questions

LEAVE A COMMENT