module remote.ssh_remote_connection

Inheritance diagram of pyenbc.remote.ssh_remote_connection

Short summary

module pyenbc.remote.ssh_remote_connection

A class to help connect with a remote machine and send command line.

source on GitHub

Classes

class

truncated documentation

ASSHClient

A simple class to access to remote machine through SSH. It requires modules paramiko, …

Static Methods

staticmethod

truncated documentation

_get_out_format

Returns a function which converts an :epkg:`ANSI` string into a different format.

build_command_line_parameters

builds a string for pig based on the parameters in params

parse_lsout

parses the output of a command ls

Methods

method

truncated documentation

__init__

constructor

__str__

usual

close

close the connection

close_session

close a session

connect

connect

dfs_exists

tells if a file exists on the cluster

dfs_ls

return the content of a folder on the cluster as a DataFrame

dfs_mkdir

creates a directory on the cluster

dfs_rm

removes a file on the cluster

download

download a file from the remote machine (not on the cluster)

download_cluster

download a file directly from the cluster to the local machine

execute_command

execute a command line, it raises an error if there is an error

exists

tells if a file exists on the bridge

hive_submit

submits a PIG script, it first upload the script to the default folder and submit it

ls

return the content of a folder on the bridge as a DataFrame

open_session

Opens a session with method invoke_shell. …

pig_submit

Submits a :epkg:`PIG` script, it first upload the script to the default folder and submits it.

send_recv_session

Send something through a session, the function is supposed to return when the execute of the given command is done, …

upload

upload a file to the remote machine (not on the cluster)

upload_cluster

the function directly uploads the file to the cluster, it first goes to the bridge, uploads it to the cluster and …

Documentation

A class to help connect with a remote machine and send command line.

source on GitHub

class pyenbc.remote.ssh_remote_connection.ASSHClient(server, username, password)

Bases: object

A simple class to access to remote machine through SSH. It requires modules paramiko, pycrypto, ecdsa.

This class is used in magic command remote_open. On Windows, the installation of pycrypto can be tricky. See Pycrypto on Windows. Those modules are part of the Anaconda distribution.

source on GitHub

constructor

Parameters:
  • server – server

  • username – username

  • password – password

source on GitHub

__init__(server, username, password)

constructor

Parameters:
  • server – server

  • username – username

  • password – password

source on GitHub

__str__()

usual

source on GitHub

_allowed_form = {None: None, 'plain': None, 'html': None}
static _get_out_format(format)

Returns a function which converts an :epkg:`ANSI` string into a different format.

Parameters:

format – string

Returns:

function

source on GitHub

static build_command_line_parameters(params, command_name='-param')

builds a string for pig based on the parameters in params

Parameters:
  • params – dictionary

  • command_name-param or -hiveconf

Returns:

string

New in version 1.1.

source on GitHub

close()

close the connection

source on GitHub

close_session()

close a session

source on GitHub

connect()

source on GitHub

dfs_exists(path)

tells if a file exists on the cluster

Parameters:

path – path

Returns:

boolean

New in version 1.1.

source on GitHub

dfs_ls(path)

return the content of a folder on the cluster as a DataFrame

Parameters:

path – path on the cluster

Returns:

DataFrame

New in version 1.1.

source on GitHub

dfs_mkdir(path)

creates a directory on the cluster

Parameters:

path – path

New in version 1.1.

source on GitHub

dfs_rm(path, recursive=False)

removes a file on the cluster

Parameters:
  • path – path

  • recursive – boolean

New in version 1.1.

source on GitHub

download(remotepath, localpath)

download a file from the remote machine (not on the cluster)

Parameters:
  • localpath – local file

  • remotepath – remote file (it can be a list, localpath is a folder in that case)

Changed in version 1.1: remotepath can be a list of paths

source on GitHub

download_cluster(remotepath, localpath, merge=False)

download a file directly from the cluster to the local machine

Parameters:
  • localpath – local file

  • remotepath – remote file (it can be a list, localpath is a folder in that case)

  • merge – True to use getmerge instead of get

New in version 1.1.

source on GitHub

execute_command(command, no_exception=False, fill_stdin=None)

execute a command line, it raises an error if there is an error

Parameters:
  • command – command

  • no_exception – if True, do not raise any exception

  • fill_stdin – data to send on the stdin input

Returns:

stdout, stderr

Example of commands:

ssh.execute_command("ls")
ssh.execute_command("hdfs dfs -ls")

source on GitHub

exists(path)

tells if a file exists on the bridge

Parameters:

path – path

Returns:

boolean

New in version 1.1.

source on GitHub

hive_submit(hive_file_or_query, params=None, redirection='redirection.hive', no_exception=True, fLOG=<function noLOG>)

submits a PIG script, it first upload the script to the default folder and submit it

Parameters:
  • hive_file_or_query – pig script (local)

  • params – parameters to send to the job

  • redirection – string empty or not

  • no_exception – sent to execute_command

  • fLOG – logging function

Returns:

out, err from execute_command

If redirection is not empty, the job is submitted but the function returns after the standard output and error were redirected to redirection.hive.out and redirection.hive.err.

The function executes the command line:

hive -f <filename>

Or:

hive -e <query>

With redirection:

hive -execute -f <filename> 2> redirection.hive.err 1> redirection.hive.out &

If there is no redirection, the function waits and return the output.

Submit a HIVE query

client = ASSHClient()

hive_sql = '''
    DROP TABLE IF EXISTS bikes20;
    CREATE TABLE bikes20 (sjson STRING);
    LOAD DATA INPATH "/user/__USERNAME__/unittest2/paris*.txt" INTO TABLE bikes20;
    SELECT * FROM bikes20 LIMIT 10;
    '''.replace("__USERNAME__", self.client.username)

out,err = client.hive_submit(hive_sql, redirection=None)

New in version 1.1.

source on GitHub

ls(path)

return the content of a folder on the bridge as a DataFrame

Parameters:

path – path on the bridge

Returns:

DataFrame

New in version 1.1.

source on GitHub

open_session(no_exception=False, timeout=1.0, add_eol=True, prompts=('~$', '>>>'), out_format=None)

Opens a session with method invoke_shell.

Parameters:
  • no_exception – if True, do not raise any exception in case of error

  • timeout – timeout in s

  • add_eol – if True, the function will add a EOL to the sent command if it does not have one

  • prompts – if function terminates if the output ends by one of those strings.

  • out_format – None, plain, html

How to open a remote shell?

ssh = ASSHClient(   "<server>",
                    "<login>",
                    "<password>")
ssh.connect()
out = ssh.send_recv_session("ls")
print( ssh.send_recv_session("python") )
print( ssh.send_recv_session("print('3')") )
print( ssh.send_recv_session("import sys\nsys.executable") )
print( ssh.send_recv_session("sys.exit()") )
print( ssh.send_recv_session(None) )
ssh.close_session()
ssh.close()

The notebook Communication with a remote Linux machine through SSH illustrates the output of these instructions.

source on GitHub

static parse_lsout(out, local_schema=True)

parses the output of a command ls

Parameters:
  • out – output

  • local_schema – schema for the bridge or the cluster (False)

Returns:

DataFrame

New in version 1.1.

source on GitHub

pig_submit(pig_file, dependencies=None, params=None, redirection='redirection.pig', local=False, stop_on_failure=False, check=False, no_exception=True, fLOG=<function noLOG>)

Submits a :epkg:`PIG` script, it first upload the script to the default folder and submits it.

Parameters:
  • pig_file – pig script (local)

  • dependencies – others files to upload (still in the default folder)

  • params – parameters to send to the job

  • redirection – string empty or not

  • local – local run or not (option -x local) (in that case, redirection will be empty)

  • stop_on_failure – if True, add option -stop_on_failure on the command line

  • check – if True, add option -check (in that case, redirection will be empty)

  • no_exception – sent to execute_command

  • fLOG – logging function

Returns:

out, err from execute_command

If redirection is not empty, the job is submitted but the function returns after the standard output and error were redirected to redirection.out and redirection.err.

The first file will contain the results of commands DESCRIBE DUMP, EXPLAIN. The standard error receives logs and exceptions.

The function executes the command line:

pig -execute -f <filename>

With redirection:

pig -execute -f <filename> 2> redirection.pig.err 1> redirection.pig.out &

New in version 1.1.

source on GitHub

send_recv_session(fillin)

Send something through a session, the function is supposed to return when the execute of the given command is done, but this is quite difficult to detect without knowing what exactly was send.

So we add a timeout just to tell the function it has to return even if nothing tells the command has finished. It fillin is None, the function will just listen to the output.

Parameters:

fillin – sent to stdin

Returns:

stdout

The output contains escape codes. They can be converted to plain text or HTML by using the module ansiconv and ansi2html. This can be specified when opening the session.

source on GitHub

upload(localpath, remotepath)

upload a file to the remote machine (not on the cluster)

Parameters:
  • localpath – local file (or a list of files)

  • remotepath – remote file

Changed in version 1.1: it can upload multiple files if localpath is a list

source on GitHub

upload_cluster(localpath, remotepath)

the function directly uploads the file to the cluster, it first goes to the bridge, uploads it to the cluster and deletes it from the bridge

Parameters:
  • localpath – local filename (or list of files)

  • remotepath – path to the cluster

Returns:

filename

New in version 1.1.

source on GitHub