Functions

Summary

function

class parent

truncated documentation

_setup_hook

if this function is added to the module, the help automation and unit tests call it first before anything goes on …

check

Checks the library is working. It raises an exception. If you want to disable the logs:

dataframe_hash_columns

Hashes a set of columns in a dataframe. Keeps the same type. Skips missing values.

dataframe_shuffle

Shuffles a dataframe.

dataframe_unfold

One column may contain concatenated values. This function splits these values and multiplies the rows for each split …

dummy_streaming_dataframe

Returns a dummy streaming dataframe mostly for unit test purposes.

enumerate_json_items

Enumerates items from a JSON file or string.

flatten_dictionary

Flattens a dictionary with nested structure to a dictionary with no hierarchy.

hash_float

Hashes a float into a float.

hash_int

Hashes an integer into an integer.

hash_str

Hashes a string.

numpy_types

Returns the list of numpy available types.

pandas_fillna

Replaces the nan values for something not nan. Mostly used by pandas_groupby_nan().

pandas_groupby_nan

Does a groupby including keeping missing values (nan).

read_zip

Reads a dataframe from a zip file. It can be saved by read_zip().

sklearn_train_test_split

Randomly splits a dataframe into smaller pieces. The function returns streams of file names. The function relies …

sklearn_train_test_split_streaming

Randomly splits a dataframe into smaller pieces. The function returns streams of file names. The function relies …

to_zip

Saves a Dataframe into a zip file. It can be read by to_zip().

train_test_apart_stratify

This split is for a specific case where data is linked in one way. Let’s assume we have two ids as we have for online …

train_test_connex_split

This split is for a specific case where data is linked in many ways. Let’s assume we have three ids as we have for …

train_test_split_weights

Splits a database in train/test given, every row can have a different weight.