module `datainc.data_cresus`#

Short summary#

module ensae_projects.datainc.data_cresus

Script to process the date from Cresus for the hackathon 2016

Functions#

function	truncated documentation
`cresus_dummy_file`
`prepare_cresus_data`	Prepares the data for the challenge.
`process_cresus_sql`	Processes the database sent by cresus and produces a list of flat files.
`process_cresus_whole_process`	Processes the database from Cresus until it splits the data into two two sets of files.
`split_train_test_cresus_data`	Splits the tables into two sets for tables (based on users).
`split_XY_bind_dataset_cresus_data`	Splits XY for the blind set.

Documentation#

Script to process the date from Cresus for the hackathon 2016

source on GitHub

ensae_projects.datainc.data_cresus.cresus_dummy_file()#

Returns:: local filename

source on GitHub

ensae_projects.datainc.data_cresus.prepare_cresus_data(dbfile, outfold=None, fLOG=<function fLOG>)#

Prepares the data for the challenge.

Parameters:

dbfile – database file
outfold – output folder
fLOG – logging function

Returns:

dictionary of table files

source on GitHub

ensae_projects.datainc.data_cresus.process_cresus_sql(infile, out_clean_sql=None, outdb=None, fLOG=<function fLOG>)#

Processes the database sent by cresus and produces a list of flat files.

Parameters:

infile – dump of a sql database
out_clean_sql – filename which contains the cleaned sql
outdb – sqlite3 file (removed if it exists)
fLOG – logging function

Returns:

dataframe with a list

source on GitHub

ensae_projects.datainc.data_cresus.process_cresus_whole_process(infile, outfold, ratio=0.2, fLOG=<function fLOG>)#

Processes the database from Cresus until it splits the data into two two sets of files.

source on GitHub

ensae_projects.datainc.data_cresus.split_XY_bind_dataset_cresus_data(filename, fLOG=<function fLOG>)#

Splits XY for the blind set.

Parameters:

filename – table to split
fLOG – logging function

Returns:

dictionary of created files

It assumes the targets are columns orientation, nature.

source on GitHub

ensae_projects.datainc.data_cresus.split_train_test_cresus_data(tables, outfold, ratio=0.2, fLOG=<function fLOG>)#

Splits the tables into two sets for tables (based on users).

Parameters:

tables – dictionary of tables, prepare_cresus_data
outfold – if not None, output all tables in this folder
fLOG – logging function

Returns:

couple of dictionaries of table files

source on GitHub

Links

Contents

Information

module `datainc.data_cresus`#

Short summary#

Functions#

Documentation#

Links

Contents

Information

module datainc.data_cresus#

Short summary#

Functions#

Documentation#

module `datainc.data_cresus`#