SciPy Curve Fitting a Pandas Dataframe

  dataframe, pandas, python, scipy-optimize

I’ve set up a dataframe with a lot of independent columns, where the last column is a dependent variable. I’m trying to generate a linear equation for the dependent variable using the independent variables. However the scipy curve_fit doesn’t seem to actually do anything, and just returns the inital guess function. The relevant code:

def linear(independent_variables, *params):
    paramstack = np.hstack(params)

    expected_value = 0
    for index in range(len(independent_variables)):
        try:
            add_value = independent_variables[index] * paramstack[index]
            if type(add_value) == float: #This was just to validate the add_value makes sense
                expected_profit += add_value
        except:
            continue
    return expected_value 

df = pd.DataFrame(all_data)

independent_variables = df.drop('dependent_variable', axis=1) # Gets all the independent variables
initial_guess = np.random.randn(independent_variables.shape[1])

c, cov = curve_fit(linear, independent_variables, df['dependent_variable'], p0=initial_guess)

print(initial_guess)
print(c)

The printed values of initial_guess and c are identical.
Am I setting this up wrong, or can curve_fit not be used on datasets that are too large. Or is it just that the correlation between my independent variables are too low, I analyzed the pandas.corr data with:

df.corr()['dependent_variable']

and most of the columns are in the ~0.4 region for correlation strength.

Source: Python Questions

LEAVE A COMMENT