Category : feather

Previously worked on .csv files which was straightforward to upload to GCS For csv I would do the following, which works: blob = bucket.blob(path) blob.upload_from_string(dataframe.to_csv(), ‘text/csv’) I am trying to do the same i.e. write the dataframe as a .feather file in bucket blob = bucket.blob(path) blob.upload_from_string(dataframe.reset_index().to_feather(), ‘text/feather’) However, this fails saying to_feather() requires a ..

Read more

I want to make an easy to use class to store a data frame in disk in chunks and in the end automatically merge all into a single file. I wrote this: import pandas import atexit from pathlib import Path import datetime import shutil class DataFrameDumper: def __init__(self, file_path_in_the_end, df): self._columns_of_the_df = set(df.columns) self._file_path_in_the_end = ..

Read more

I am trying to write a feather based on a data frame extracted from a climate model. I’m having an issue being able to convert the time column in to the feather file. Script import nctoolkit as nc import feather ds = nc.open_data("GFDL_ESM4_hist.nc") ds.crop(lat=[-90,-30]) ds.select(years=range(2010,2015)) df1= ds.to_dataframe() ds = nc.open_data("GFDL_ESM4_ssp245.nc") ds.crop(lat=[-90,-30]) ds.select(years=range(2015,2022)) df2= ds.to_dataframe() df=df1.append(df2) ..

Read more

In a dashboard using plotly Dash I need to perform an expensive download from DB only when a component (DataPicker with the period to consider) is updated and then use the resulting DataFrame with other components (e.g. Dropdowns filtering the DataFrame) avoiding the expensive download process. The docs suggests to use dash_core_components.Store as Output of ..

Read more

I would like to to first write a stream into an arrow file and then later read it back into a pandas dataframe, with as little memory overhead as posible. Writing data in batches works perfectly fine: import pyarrow as pa import pandas as pd import random data = [pa.array([random.randint(0, 1000)]), pa.array([‘B’]), pa.array([‘C’])] columns = ..

Read more

Using apache-arrow js (https://github.com/apache/arrow/tree/master/js), I can read arrow file (or even feather file) by a few lines only. const arrow = fs.readFileSync("test.feather"); const table = apArrow.Table.from([arrow]); However I found that the trailing zero are being removed. In python dataframe (original data) 4.185771942138672,2019-12-02,2019-12-01,0.0 After reading with arrow-js library: 4.185771942138672,2019-12-02,2019-12-01,**0** Is there any way to avoid the ..

Read more