write / append to very large csv with panda's to_csv

I am opening one very large csv in chunks using pandas read_csv with a chunksize set because the csv is too large to fit into memory. I am performing transformations on each chunk. I then want to append the transformed df chunk to the end another existing (and very large) csv.

I have been running into out-of-memory errors though. Does pandas to_csv(mode='a', header=False) open the csv in order to append the new chunk? In other words, is the to_csv() causing my memory errors?

1 answer

  • answered 2018-02-13 01:41 David Zarebski

    I had this same issue several times. What you might try is to export your data chunks in several csv (without headers) and then concatenate them with a non pandas function (e.g. Writing new lines on a text file read from your different csv)