Hackathon Helpers#


Functions about images#

ensae_projects.hackathon.image_helper.enumerate_batch_features (folder, batch_or_image = False)

Enumerates all batches saved in a folder.

ensae_projects.hackathon.image_helper.enumerate_image_class (folder, abspath = True, ext = {‘.png’, ‘.jpg’})

Lists all images in one folder assuming subfolders indicates the class of each image belongs to.

ensae_projects.hackathon.image_helper.folder_split_train_test (src_folder, dest_train, dest_test, seed = None, ext = {‘.png’, ‘.jpg’}, test_size = 0.25)

Splits images from a folder into train and test. The function saves images into two separate folders.

ensae_projects.hackathon.image_helper.histogram_image_size (folder, ext = {‘.png’, ‘.jpg’})

Computes the distribution of images size.

ensae_projects.hackathon.image_helper.img2gray (img, mode = ‘L’)

Converts an image (PIL) to gray scale.

ensae_projects.hackathon.image_helper.image_zoom (img, new_size, kwargs)

Resizes an image (from PIL).

ensae_projects.hackathon.image_helper.last_element (iter)

Returns the last element of sequence assuming they were generated by an iterator or a generator.

ensae_projects.hackathon.image_helper.load_batch_features (batch_file)

Loads a batch file saved by stream_image2features.

ensae_projects.hackathon.image_helper.plot_gallery_random_images (folder, n = 12, seed = None, ext = {‘.png’, ‘.jpg’}, kwargs)

Plots a gallery of images using matplotlib. Extracts a random sample from a folder which contains many images. Relies on fonction enumerate_image_class. Calls plot_gallery_images to build the gallery.

ensae_projects.hackathon.image_helper.read_image (filename_or_bytes)

Reads an image.

ensae_projects.hackathon.image_helper.stream_apply_image_transform (src_folder, dest_folder, transform, ext = {‘.png’, ‘.jpg’}, fLOG = None)

Applies a transform on every image in a folder, saves it in another one. It keeps the same subfolders.

ensae_projects.hackathon.image_helper.stream_copy_images (src_folder, dest_folder, valid, ext = {‘.png’, ‘.jpg’}, fLOG = None)

Copies all images from src_folder to dest_folder if valid(name) is True.

ensae_projects.hackathon.image_helper.stream_download_images (urls, dest_folder, fLOG = None, use_request = None, skipif_done = True, dummys = None, skip = 0)

Downloads images based on their urls.

ensae_projects.hackathon.image_helper.stream_image2features (src_folder, dest_folder, transform, batch_size = 1000, prefix = ‘batch’, ext = {‘.png’, ‘.jpg’}, fLOG = None)

Considers all images in a folder, transform them into features (function transform) and saves them with pickle into numpy arrays by batch.

ensae_projects.hackathon.image_helper.stream_random_sample (folder, n = 1000, seed = None, abspath = True, ext = {‘.png’, ‘.jpg’})

Extracts a random sample from a folder which contains many images. Relies on fonction enumerate_image_class.

Some of these functions are used in notebook Image et doublons. Many examples can be found in unit test test_image.py.

Functions or classes to analyse#

ensae_projects.hackathon.image_knn.ImageNearestNeighbors (self, transform = ‘gray’, image_size = (10, 10), kwargs)

Builds a model on the top of NearestNeighbors in order to find close images.

Functions about performance#

ensae_projects.hackathon.perf2018.MLStoragePerf2018 (self, storage, examples, cache_file = ‘cache_file.csv’)

Computes the performances the a hackathon.

ensae_projects.hackathon.perf2018.MLStoragePerf2018Image (self, storage, examples, cache_file = ‘cache_file.csv’)

Overloads compute_perf for images. Example of use…


ensae_projects.hackathon.extract_images_from_json_2017 (filename, encoding = None, fLOG = <function noLOG at 0x7fb16874a3a0>)

Extracts fields from a JSON files such as images.

ensae_projects.hackathon.resize_image (filename_or_bytes, maxdim = 512, dest = None, format = None)

Resizes an image until one of its dimension becomes smaller than maxdim after dividing the dimensions by two many times.


ensae_projects.ml.competitions.AUC (answers, scores)

Compute the AUC.

ensae_projects.ml.competitions.AUC_multi (answers, scores, ignored = None)

Compute the AUC.

ensae_projects.ml.competitions.AUC_multi_multi (nb, answers, scores, ignored = None)

Compute the AUC.


ensae_projects.datainc.change_encoding (infile, outfile, enc1, enc2 = ‘utf-8’, process = None, fLOG = <function noLOG at 0x7fb16874a3a0>)

Changes the encoding of a text file and removes quotes. By default process is process_line().

ensae_projects.datainc.change_encoding_improve (infile, outfile, enc1, enc2 = ‘utf-8’, process = None, fLOG = <function noLOG at 0x7fb16874a3a0>)

Changes the encoding of a text file, removes quotes. By default process is process_line() but the function has access to the distribution of the number of columns in the previous lines.

ensae_projects.datainc.clean_column_name_sql_dump (i, line, hist, sep = ‘;’)

Removes quotes in a line which looks like…

ensae_projects.datainc.convert_dates (sd, option = None, exc = False)

Converts a string into a date.

ensae_projects.datainc.enumerate_text_lines (filename, sep = ‘ ‘, encoding = ‘utf-8’, quotes_as_str = False, header = True, clean_column_name = None, convert_float = False, option = None, skip = 0, take = -1, fLOG = <function noLOG at 0x7fb16874a3a0>)

Enumerates all lines from a text file and does some cleaning (see the list of parameters).