CSV formatting LF to CRLF with Pandas

  csv, excel, pandas, python

I have a CSV file that when I open in notepad, displays as follows:

A,B
C,
D,E,F,G,H

I see that it shows up as Unix (LF) and UTF-8 at the bottom right of the status bar. When I open the file in excel, save it (but without making any changes), and close it, it will convert it to Windows (CRLF) as expected and displays as follows in notepad:

A,B,,,
C,,,,
D,E,F,G,H

The header row is the third row (D,E,F,G,H) and my understanding is that prior to saving, Excel reads the entire CSV file, figures out that the longest row has 4 commas and uses that format throughout the entire file. The problem I’m running into is reading the original LF CSV file into Pandas .read_csv. I think I’ve narrowed down the solution to 2 possible options (but please correct me if i’m wrong):

Option 1: In my main python script, I start with a def function that just iterates through every csv file in a folder where I open, save, and close in order to format it into CRLF prior to working with the csv files in Pandas.

Option 2: Format the csv file upon reading it into Pandas. I feel like this is the better option, especially knowing the number of columns I have and using .read_csv(header = 3) but when I open the output file and run excel formulas, calculation times are insane, even for relatively small files. I have a feeling it’s a datatype issue but I’m still new to all of this. Any clarification or resources are greatly appreciated!

Source: Python Questions

LEAVE A COMMENT