Using K Means to predict users based on weather

  data-science, k-means, python

I have 2 years of user data for a growing company, and have identified a clear correlation with the weather. I have 2019 and 2020, I have clustered some data like below. I have taken a weekly average of the daily maximum temperature, and grouped the total number of users for that week with it, like so:

DF4(below) is for 2020, I have the same(df3) for 2019:


kmeans = KMeans(n_clusters=3).fit(df4)
centroids = kmeans.cluster_centers_

plt.scatter(df4['tmax'], df4['Entries'], c= kmeans.labels_.astype(float), s=50, alpha=0.5)
plt.scatter(centroids[:, 1], centroids[:, 0], c='red', s=50)

enter image description here

I also have the weather for the first 3 months of 2021. How can I begin to use the weather data I have and information from the past 2 years to form an estimate of the number of users that week, so I can compare with the actual. I don’t expect to get extremely close, just want a rough estimate that can follow the trend of the previous years. The idea is that then I can use a weeks weather forecast to ‘guess’ the number of users.

I also have the problem that I only have 2 years of data as its a new company, and the growth rate is a massive factor(around doubled from a month in 2020 to the same month in 2021).

How do I begin using K means to form some coherent prediction? Will I have to use some form of growth rate too?

Source: Python Questions