module filehelper.pig_helper
¶
Short summary¶
module pyenbc.filehelper.pig_helper
Hadoop uses a java implementation of Python: Jython. This provides provides helper around that.
Functions¶
function |
truncated documentation |
---|---|
Downloads the standalone :epkg:`jython`. If it does not exists, we should version |
|
Returns the list of jars to include into the command line in order to run :epkg:`HADOOP`. |
|
This function assumes a folder pig |
|
Returns the list of jars to include into the command line in order to run :epkg:`PIG`. |
|
This function assumes a folder pig |
|
Runs a :epkg:`pig` script and returns the standard output and error. |
Documentation¶
Hadoop uses a java implementation of Python: Jython. This provides provides helper around that.
New in version 1.1.
- pyenbc.filehelper.pig_helper.download_pig_standalone(pig_version='0.17.0', hadoop_version='3.3.0', fLOG=<function noLOG>)¶
Downloads the standalone :epkg:`jython`. If it does not exists, we should version
HADOOP_VERSION
by default in order to fit the cluster’s version.- Parameters:
pig_version – pig_version
hadoop_version – hadoop_version
fLOG – logging function
- Returns:
location
This function might need to be run twice if the first try fails, it might to due to very long path when unzipping the downloaded file.
:epkg:`Hadoop` is downloaded from one of the websites referenced at Apache Software Foundation. Check the source to see which one was chosen.
- pyenbc.filehelper.pig_helper.get_hadoop_jars()¶
Returns the list of jars to include into the command line in order to run :epkg:`HADOOP`.
- Returns:
list of jars
- pyenbc.filehelper.pig_helper.get_hadoop_path()¶
This function assumes a folder pig
hadoopjar
is present in this directory, the function returns the folder.- Returns:
absolute path
- pyenbc.filehelper.pig_helper.get_pig_jars()¶
Returns the list of jars to include into the command line in order to run :epkg:`PIG`.
- Returns:
list of jars
- pyenbc.filehelper.pig_helper.get_pig_path()¶
This function assumes a folder pig
pigjar
is present in this directory, the function returns the folder- Returns:
absolute path
- pyenbc.filehelper.pig_helper.run_pig(pigfile, argv=None, pig_path=None, hadoop_path=None, jython_path=None, timeout=None, logpath='logs', pig_version='0.17.0', hadoop_version='3.3.0', jar_no_hadoop=True, fLOG=<function noLOG>)¶
Runs a :epkg:`pig` script and returns the standard output and error.
- Parameters:
pigfile – pig file
argv – arguments to sned to the command line
pig_path – path to pig 0.XX.0
hadoop_path – path to hadoop
timeout – timeout
logpath – path to the logs
pig_version – PIG version (if pig_path is not defined)
hadoop_version – Hadoop version (if hadoop_path is not defined)
jar_no_hadoop – use :epkg:`pig` without :epkg:`hadoop`
fLOG – logging function
- Returns:
out, err
If pig_path is None, the function looks into this directory.