module filehelper.synchelper
¶
Short summary¶
module pyquickhelper.filehelper.synchelper
Series of functions related to folder, explore, synchronize, remove (recursively).
Functions¶
function |
truncated documentation |
---|---|
Same as |
|
Returns the list of files included in a folder and its subfolders. Returned names can be modified if sub_pattern is … |
|
Same as |
|
Returns all files present in folder and added to a SVN or GIT repository. |
|
It assumes dest is a copy of source, it wants to know if the copy is up to date or not. |
|
Removes everything in folder top. |
|
Synchronizes two folders (or copy if the second is empty), it only copies more recent files. It can walk through … |
|
Does the same as os.walk plus does not go through a sub-folder if this one is big. Folders such build … |
Documentation¶
Series of functions related to folder, explore, synchronize, remove (recursively).
- pyquickhelper.filehelper.synchelper.download_urls_iterfile(folder, pattern=None, neg_pattern=None, fullname=False, recursive=True)[source]¶
Same as
explore_folder
but iterates on files included in a folder and its subfolders.- Parameters:
folder – folder
pattern – if None, get all files, otherwise, it is a regular expression, the filename must verify (with the folder is fullname is True)
neg_pattern – negative pattern to exclude files
fullname – if True, include the subfolder while checking the regex
recursive – look into subfolders
- Returns:
iterator on files
- pyquickhelper.filehelper.synchelper.explore_folder(folder, pattern=None, neg_pattern=None, fullname=False, return_only=None, recursive=True, sub_pattern=None, sub_replace=None, fLOG=None)[source]¶
Returns the list of files included in a folder and its subfolders. Returned names can be modified if sub_pattern is specified.
- Parameters:
folder – (str) folder
pattern – (str) if None, get all files, otherwise, it is a regular expression, the filename must verify (with the folder if fullname is True)
neg_pattern – (str) negative pattern
fullname – (bool) if True, include the subfolder while checking the regex (pattern)
return_only – (str) to return folders and files (=None), only the files (=’f’) or only the folders (*=’d’)
recursive – (bool) look into subfolders
sub_pattern – (str) replacements pattern, the output is then modified accordingly to this regular expression
sub_replace – (str) if sub_pattern is specified, this second pattern specifies how to replace
fLOG – (fct) logging function
- Returns:
(list, list), a list of folders, a list of files (the folder is not included the path name)
Explore the content of a directory
The command calls function
explore_folder
and makes the list of all files in a directory or all folders. Example:python -m pyquickhelper ls -f _mynotebooks -r f -p .*[.]ipynb -n checkpoints -fu 1
It works better with chrome. An example to change file names:
python -m pyquickhelper ls -f myfolder -p .*[.]py -r f -n pycache -fu 1 -s test_(.*) -su unit_\1
Or another to automatically create git commands to rename files:
python -m pyquickhelper ls -f _mynotebooks -r f -p .*[.]ipynb -s "(.*)[.]ipynb" -su "git mv \1.ipynb \1~.ipynb"
<<<
python -m pyquickhelper ls --help
>>>
usage: ls [-h] [-f FOLDER] [-p PATTERN] [-n NEG_PATTERN] [-fu FULLNAME] [-r RETURN_ONLY] [-re RECURSIVE] [-s SUB_PATTERN] [-su SUB_REPLACE] Returns the list of files included in a folder and its subfolders. Returned names can be modified if *sub_pattern* is specified. optional arguments: -h, --help show this help message and exit -f FOLDER, --folder FOLDER (str) folder (default: None) -p PATTERN, --pattern PATTERN (str) if None, get all files, otherwise, it is a regular expression, the filename must verify (with the folder if fullname is True) (default: ) -n NEG_PATTERN, --neg_pattern NEG_PATTERN (str) negative pattern (default: ) -fu FULLNAME, --fullname FULLNAME (bool) if True, include the subfolder while checking the regex (pattern) (default: False) -r RETURN_ONLY, --return_only RETURN_ONLY (str) to return folders and files (*=None*), only the files (*='f'*) or only the folders (*='d') (default: ) -re RECURSIVE, --recursive RECURSIVE (bool) look into subfolders (default: True) -s SUB_PATTERN, --sub_pattern SUB_PATTERN (str) replacements pattern, the output is then modified accordingly to this regular expression (default: ) -su SUB_REPLACE, --sub_replace SUB_REPLACE (str) if sub_pattern is specified, this second pattern specifies how to replace (default: )
- pyquickhelper.filehelper.synchelper.explore_folder_iterfile(folder, pattern=None, neg_pattern=None, fullname=False, recursive=True, verbose=False)[source]¶
Same as
explore_folder
but iterates on files included in a folder and its subfolders.- Parameters:
folder – folder
pattern – if None, get all files, otherwise, it is a regular expression, the filename must verify (with the folder is fullname is True)
neg_pattern – negative pattern to exclude files
fullname – if True, include the subfolder while checking the regex
recursive – look into subfolders
verbose – use :epkg:`tqdm` to display a progress bar
- Returns:
iterator on files
- pyquickhelper.filehelper.synchelper.explore_folder_iterfile_repo(folder, log=<function fLOG>)[source]¶
Returns all files present in folder and added to a SVN or GIT repository.
- Parameters:
folder – folder
log – log function
- Returns:
iterator
- pyquickhelper.filehelper.synchelper.has_been_updated(source, dest)[source]¶
It assumes dest is a copy of source, it wants to know if the copy is up to date or not.
- Parameters:
source – filename
dest – copy
- Returns:
True,reason or False,None
- pyquickhelper.filehelper.synchelper.remove_folder(top, remove_also_top=True, raise_exception=True)[source]¶
Removes everything in folder top.
- Parameters:
top – path to remove
remove_also_top – remove also root
raise_exception – raise an exception if a file cannot be remove
- Returns:
list of removed files and folders –> list of tuple ( (name, “file” or “dir”) )
- pyquickhelper.filehelper.synchelper.synchronize_folder(p1: str, p2: str, hash_size=1048576, repo1=False, repo2=False, size_different=True, no_deletion=False, filter: [<class 'str'>, typing.Callable[[str], str], None] = None, filter_copy: [<class 'str'>, typing.Callable[[str], str], None] = None, avoid_copy=False, operations=None, file_date: str = None, log1=False, copy_1to2=False, create_dest=False, fLOG=<function fLOG>)[source]¶
Synchronizes two folders (or copy if the second is empty), it only copies more recent files. It can walk through a git repository or SVN.
- Parameters:
p1 – (str) first path
p2 – (str) second path
hash_size – (bool) to check whether or not two files are different
repo1 – (bool) assuming the first folder is under SVN or GIT, it uses pysvn to get the list of files (avoiding any extra files)
repo2 – (bool) assuming the second folder is under SVN or GIT, it uses pysvn to get the list of files (avoiding any extra files)
size_different – (bool) if True, a file will be copied only if size are different, otherwise, it will be copied if the first file is more recent
no_deletion – (bool) if a file is found in the second folder and not in the first one, if will be removed unless no_deletion is True
filter – (str) None to accept every file, a string if it is a regular expression, a function for something more complex: function
(fullname) --> True
(every file is considered in lower case), (use re.search)filter_copy – (str) None to accept every file, a string if it is a regular expression, a function for something more complex: function (fullname) –> True
avoid_copy – (bool) if True, just return the list of files which should be copied but does not do the copy
operations – if None, this function is called the following way
operations(op, n1, n2)
if should return True if the file was updatedfile_date – (str) filename which contains information about when the last sync was done
log1 –
FileTreeNode
copy_1to2 – (bool) only copy files from p1 to p2
create_dest – (bool) create destination directory if not exist
fLOG – logging function
- Returns:
list of operations done by the function, list of 3-uple: action, source_file, dest_file
if
file_date
is mentioned, the second folder is not explored. Only the modified files will be taken into account (except for the first sync).synchronize two folders
The following function synchronizes a folder with another one on a USB drive or a network drive. To minimize the number of access to the other location, it stores the status of the previous synchronization in a file (
status_copy.txt
in the below example). Next time, the function goes through the directory and sub-directories to synchronize and only propagates the modifications which happened since the last modification. The functionfilter_copy
defines what file to synchronize or not.def filter_copy(file): return "_don_t_synchronize_" not in file synchronize_folder( "c:/mydata", "g:/mybackup", hash_size = 0, filter_copy = filter_copy, file_date = "c:/status_copy.txt")
The function is able to go through 90.000 files and 90 Gb in 12 minutes (for an update).