Data Manipulation#

ensae_projects.datainc.data_bikes.add_missing_time (df, column, values, delay = 10)

After aggregation, it usually happens that the series is sparse. This function adds rows for missing time.

ensae_projects.datainc.change_encoding_improve (infile, outfile, enc1, enc2 = ‘utf-8’, process = None, fLOG = <function noLOG at 0x7fb16874a3a0>)

Changes the encoding of a text file, removes quotes. By default process is process_line() but the function has access to the distribution of the number of columns in the previous lines.

ensae_projects.datainc.data_bikes.df_crossjoin (df1, df2, kwargs)

Makes a cross join (cartesian product) between two dataframes by using a constant temporary key. Also sets a MultiIndex which is the cartesian product of the indices of the input dataframes. Source: Cross join / cartesian product between pandas DataFrames.

ensae_projects.hackathon.enumerate_json_items (filename, encoding = None, fLOG = <function noLOG at 0x7fb16874a3a0>)

Enumerates items from a JSON file or string.

ensae_projects.datainc.enumerate_text_lines (filename, sep = ‘ ‘, encoding = ‘utf-8’, quotes_as_str = False, header = True, clean_column_name = None, convert_float = False, option = None, skip = 0, take = -1, fLOG = <function noLOG at 0x7fb16874a3a0>)

Enumerates all lines from a text file and does some cleaning (see the list of parameters).

ensae_projects.datainc.data_geo_streets.shapely_records (filename, kwargs)

Uses pyshp to return shapes and records from shapefiles.