module hackathon.web_search_helper#

Short summary#

module ensae_projects.hackathon.web_search_helper

Helpers for the hackathon 2018 related to search internet.

source on GitHub

Functions#

function

truncated documentation

extract_bing_result

Extract the first results from a search page assuming it comes from Bing Image.

query_bing_image

Returns the search page from Bing Image for a specific query.

Documentation#

Helpers for the hackathon 2018 related to search internet.

source on GitHub

ensae_projects.hackathon.web_search_helper.extract_bing_result(search_page, filter_fct=<function <lambda>>)#

Extract the first results from a search page assuming it comes from Bing Image.

Parameters:
  • search_page – content of Bing Image search page (or filename)

  • filter_fct – remove some urls if this function is False filter(u) --> True or False

Returns:

a list with the urls

source on GitHub

ensae_projects.hackathon.web_search_helper.query_bing_image(query, folder_cache='cache_search_page', filter_fct=<function <lambda>>, add_options=False, use_selenium=False, navigator=None, fLOG=None)#

Returns the search page from Bing Image for a specific query.

Parameters:
  • query – search query

  • folder_cache – folder used to stored the result page or to retrieve a page if the query was already searched for

  • filter_fct – remove some urls if this function is False filter(u) --> True or False

  • add_options – add options to the search url

  • use_selenium – relies on webhtml

  • navigator – see webhtml

  • fLOG – logging function

Returns:

list of urls

source on GitHub