Why can’t I transfer dataframe values to SQL Server after manipulating dataframe data using iloc

  dataframe, pandas, pyodbc, python, sql-server

I have a csv files that I’m turning into a pandas dataframes (df), then manipulating the values in a couple of columns, and finally exporting the dataframe to a table in SQL Server.

The code is:

import pyodbc
import sys

df = pd.read_csv(f'../data/polreg/{sys.argv[3]}', usecols=[0, 1, 2, 3, 4, 5, 8, 10, 11, 17, 18, 20])

# trim leading and trailing white space from all string type columns
df_obj = df.select_dtypes(['object'])
df[df_obj.columns] = df_obj.apply(lambda x: x.str.strip())

# drop all non-alphanumeric values at end policy numbers using regex
df['a'].replace(to_replace = '[W^]+$', value='', regex=True, inplace=True)


df.loc[df['b'] == 'rn', ['b']] = None
df.loc[df['b'] == 'cn', ['b']] = df['c']
df.loc[df['b'] != None, ['c']] = None

#define connection string to database and create connection cursor "cur"
conn = pyodbc.connect('Driver={SQL Server};'
                      f'Server={sys.argv[1]};'
                      f'Database={sys.argv[2]};'
                      'Trusted_Connection=yes;')
cur = conn.cursor()

cur.execute('EXEC src.CreateJob;') #Execute stored procedure src.CreateJob

#get most recent jobId value from src.ExtractJob, create new column in csv_df called jobId, fill all values of jobId column with jobId value.
cur.execute('SELECT TOP(1) jobId FROM src.ExtractJob ORDER BY jobId DESC;')
extractJob_info = cur.fetchall()
jobId = extractJob_info[0][0]
df["jobId"] = jobId


# execute src.Update stored procedure for each row in df
Update_query = """EXEC src.Update @a=?, @b=?, @c=?, @d=?, @e=?, @f=?, @g=?, @h=?, @i=?, @j=?, @k=?, @l=?, @jobId=?"""
for index, row in df.iterrows():
    values = (row['a'], row['b'], row['c'], row['d'], row['e'], row['f'],
              row['g'], row['h'], row['i'], row['j'], row['k'], row['l'], jobId)
    cur.execute(Update_query, values)

cur.execute(f'EXEC src.FinishJob @jobID = {jobId}') # execute src.FinishJob sotred procedure

cur.commit()

When I wrote the code originally it worked fine, but then realized I needed to manipulate columns "b" and "c" so I added the 3 lines of code that start with df.loc[…] = value (they are right before I create the connection cursor). When I check the values of df after those 3 lines of code they do exactly what I want them to do so I thought everything was set to go, but then when I run my code I get the error message.

"Traceback (most recent call last):
File "C:" line 48, in
cur.execute(Update_query, values)
pyodbc.ProgrammingError: (‘42000’, ‘[42000] [Microsoft][ODBC SQL Server Driver][SQL Server]The incoming tabular data stream (TDS) remote procedure call (RPC) protocol stream is incorrect. Parameter 13 ("@b"): The supplied value is not a valid instance of data type float. Check the source data for invalid values. An example of an invalid value is data of numeric type with scale greater than precision. (8023) (SQLExecDirectW)’)"

I can not for the life of me figure out what changed in the dataframe that is making this error occur. If I take out the 3 lines of code stated earlier everything works fine but the values are obviously not right in columns ‘b’ and ‘c’ because I still need to manipulate them. Does using df.loc[…] = value somehow change the indexing of the dataframe? I’m really lost here, any help will be greatly appreciated.

Source: Python Questions

LEAVE A COMMENT