I’ve set up a dataframe with a lot of independent columns, where the last column is a dependent variable. I’m trying to generate a linear equation for the dependent variable using the independent variables. However the scipy curve_fit doesn’t seem to actually do anything, and just returns the inital guess function. The relevant code:
def linear(independent_variables, *params): paramstack = np.hstack(params) expected_value = 0 for index in range(len(independent_variables)): try: add_value = independent_variables[index] * paramstack[index] if type(add_value) == float: #This was just to validate the add_value makes sense expected_profit += add_value except: continue return expected_value df = pd.DataFrame(all_data) independent_variables = df.drop('dependent_variable', axis=1) # Gets all the independent variables initial_guess = np.random.randn(independent_variables.shape) c, cov = curve_fit(linear, independent_variables, df['dependent_variable'], p0=initial_guess) print(initial_guess) print(c)
The printed values of initial_guess and c are identical.
Am I setting this up wrong, or can curve_fit not be used on datasets that are too large. Or is it just that the correlation between my independent variables are too low, I analyzed the pandas.corr data with:
and most of the columns are in the ~0.4 region for correlation strength.
Source: Python Questions