I’ve set up a dataframe with a lot of independent columns, where the last column is a dependent variable. I’m trying to generate a linear equation for the dependent variable using the independent variables. However the scipy curve_fit doesn’t seem to actually do anything, and just returns the inital guess function. The relevant code:
def linear(independent_variables, *params):
paramstack = np.hstack(params)
expected_value = 0
for index in range(len(independent_variables)):
try:
add_value = independent_variables[index] * paramstack[index]
if type(add_value) == float: #This was just to validate the add_value makes sense
expected_profit += add_value
except:
continue
return expected_value
df = pd.DataFrame(all_data)
independent_variables = df.drop('dependent_variable', axis=1) # Gets all the independent variables
initial_guess = np.random.randn(independent_variables.shape[1])
c, cov = curve_fit(linear, independent_variables, df['dependent_variable'], p0=initial_guess)
print(initial_guess)
print(c)
The printed values of initial_guess and c are identical.
Am I setting this up wrong, or can curve_fit not be used on datasets that are too large. Or is it just that the correlation between my independent variables are too low, I analyzed the pandas.corr data with:
df.corr()['dependent_variable']
and most of the columns are in the ~0.4 region for correlation strength.
Source: Python Questions