module datainc.data_cresus
#
Short summary#
module ensae_projects.datainc.data_cresus
Script to process the date from Cresus for the hackathon 2016
Functions#
function |
truncated documentation |
---|---|
Prepares the data for the challenge. |
|
Processes the database sent by cresus and produces a list of flat files. |
|
Processes the database from Cresus until it splits the data into two two sets of files. |
|
Splits the tables into two sets for tables (based on users). |
|
Splits XY for the blind set. |
Documentation#
Script to process the date from Cresus for the hackathon 2016
- ensae_projects.datainc.data_cresus.cresus_dummy_file()#
- Returns:
local filename
- ensae_projects.datainc.data_cresus.prepare_cresus_data(dbfile, outfold=None, fLOG=<function fLOG>)#
Prepares the data for the challenge.
- Parameters:
dbfile – database file
outfold – output folder
fLOG – logging function
- Returns:
dictionary of table files
- ensae_projects.datainc.data_cresus.process_cresus_sql(infile, out_clean_sql=None, outdb=None, fLOG=<function fLOG>)#
Processes the database sent by cresus and produces a list of flat files.
- Parameters:
infile – dump of a sql database
out_clean_sql – filename which contains the cleaned sql
outdb – sqlite3 file (removed if it exists)
fLOG – logging function
- Returns:
dataframe with a list
- ensae_projects.datainc.data_cresus.process_cresus_whole_process(infile, outfold, ratio=0.2, fLOG=<function fLOG>)#
Processes the database from Cresus until it splits the data into two two sets of files.
- ensae_projects.datainc.data_cresus.split_XY_bind_dataset_cresus_data(filename, fLOG=<function fLOG>)#
Splits XY for the blind set.
- Parameters:
filename – table to split
fLOG – logging function
- Returns:
dictionary of created files
It assumes the targets are columns orientation, nature.
- ensae_projects.datainc.data_cresus.split_train_test_cresus_data(tables, outfold, ratio=0.2, fLOG=<function fLOG>)#
Splits the tables into two sets for tables (based on users).
- Parameters:
tables – dictionary of tables,
prepare_cresus_data
outfold – if not None, output all tables in this folder
fLOG – logging function
- Returns:
couple of dictionaries of table files