module td_1a.discours_politique
¶
Short summary¶
module ensae_teaching_cs.td_1a.discours_politique
Retrive political speeches from Internet
Functions¶
function |
truncated documentation |
---|---|
Enumerates speeches from the Elysees. |
|
Deals with unicodes. |
|
Retrieves the text from the Elysees. |
|
Removes HTML or XML character references and entities from a text string. keep |
|
Replaces French accents by regular letters. |
|
Extracts the longest div section. |
Documentation¶
Retrive political speeches from Internet
- ensae_teaching_cs.td_1a.discours_politique.enumerate_speeches_from_elysees(url='agenda', skip=0)¶
Enumerates speeches from the Elysees.
- Paramètres
url – subaddress, url source will be
'https://www.elysee.fr/' + url
skip – skip the first skip one in the list
- Renvoie
enumerate dictionaries
Récupérer des discours du président de la république
for i, disc in enumerate(enumerate_speeches_from_elysees()): print(disc)
Others links can be used such as
https://www.elysee.fr/recherche?query=discours
. The website changed in 2018 and no longer support xml or json streams.
- ensae_teaching_cs.td_1a.discours_politique.force_unicode(text)¶
Deals with unicodes.
- Paramètres
text – text
- Renvoie
text
- ensae_teaching_cs.td_1a.discours_politique.get_elysee_speech_from_elysees(title, url='https://www.elysee.fr/')¶
Retrieves the text from the Elysees.
- Paramètres
title – title of the document
url – website
- Renvoie
html page
The function tries something like:
url + title.replace(" ","-")
- ensae_teaching_cs.td_1a.discours_politique.html_unescape(text)¶
Removes HTML or XML character references and entities from a text string. keep
&
,>
,<
in the source code. from Fredrik Lundh- Paramètres
text – text
- Renvoie
cleaning text
- ensae_teaching_cs.td_1a.discours_politique.remove_accent(text)¶
Replaces French accents by regular letters.
- Paramètres
text – text
- Renvoie
cleaned text
- ensae_teaching_cs.td_1a.discours_politique.xmlParsingLongestDiv(text)¶
Extracts the longest div section.
- Paramètres
text – text of HTML page
- Renvoie
text