Files Controller

Data controller Practically unchanged from the original code in biomed-client library. Methods renamed from data to files, and some minor changes in the code to adapt it to use a single file-service. Merged with the original separated tools.py file from the same directory into the same file.

class genomcore.controllers.files.FilesController(*args, **kwargs)

Bases: BaseController

Data controller to interact with data section of the API

_idproject

If set, applies idproject filter to any call that accepts filters

Type:

int

_idactor

If set, applies idactor filter to any call that accepts filters

Type:

int

static _format_extensions(allowed_extensions: List[str]) List[str]

Transform all extensions to lowercase and add a dot prefix if they don’t start with a dot.

It checks if the string does not start with a dot: in that case, it adds the dot and appends the extension to the returned list.

In case it starts with a dot, it checks with a regex that it does not have more than one dot: if it does, it raises an error; if it doesn’t, it adds the extension to the list that will be returned.

is_alive() str
_is_valid_extension(local_path: str, allowed_extensions: List[str], allowed_secondary_extensions: List[str])

Returns True/False depending on whether file extension is valid or not.

It takes the 1st primary extension with the 1st secondary extension, the 2nd primary extension with the 2nd secondary extension, and so on, then it checks for each combination if a valid extension has been found.

Finally, it stores those True/False checks in a list called valid_files and at the end it returns True if any of the elements in that list is True, meaning the file is valid for at least one of the conditions.

_check_extensions(local_datas: List[LocalData], allowed_extensions: List[str], allowed_secondary_extensions: List[str])

Get a list of LocalData objects and check for each of them if its local_path extension is valid.

Raise an InvalidFileExtension at the first check that fails.

When entering this function, we already checked that allowed_extensions is not an empty list.

Warning

If any value in allowed_secondary_extensions is “null” or “.null”, it means that the file should end with the value given in its respective allowed_extensions element.

unfold(iddata: str) List[UnfoldedData]

Return the unfolded data entries for a given iddata.

If the data ID is a Biomed directory, it will be the list of Data corresponding to said directory and its contents (files and subdirectories).

If it’s a Biomed directory from Public and it’s mounted, the returned list contains only the Biomed mounted folder.

If it’s a Biomed file, it will be a list of Data with just an entry with the file.

After the request, a relative_path key has to be added to the data dictionaries (creating a datas_with_relative_path list of dictionaries) to be able to convert it to UnfoldedData Biomed objects. The relative_path is computed subtracting the initial parent directory.

Parameters:

iddata (str) – ID of the data to unfold

Returns:

List of the corresponding data

Return type:

List[UnfoldedData]

create_folder(remote_dir)

Create a directory in biomed. Create intermediate directories

Parameters:

remote_dir (str) – path to created directory

Returns:

LocalData with new directory created

Return type:

LocalData

download(iddata: str, local_dir: str, allowed_extensions: list = None, allowed_secondary_extensions: list = None) List[LocalData]

Download single data.

Given an iddata, download datas corresponding to that iddata. If a file, just the file. If a directory, the results will include both the directory and the files inside it.

If allowed_extensions is given, it checks that all downloaded files match one of the extensions of the list.

If allowed_secondary_extensions is not an empty list, its length has to match the allowed_extensions list length, because it will match for every allowed_extensions element a secondary extension, e.g.:

>>> allowed_extensions = ["fastq", "bam", "bam"]
>>> allowed_secondary_extensions = ["gz", "bai", "null"]

This will pass all checks if it ONLY downloads *.fastq.gz, *.bam.bai and *.bam files (*.fastq files are not allowed).

Parameters:
  • iddata (str) – ID of the data to download

  • local_dir (str) – Directory to which datas will be downloaded

  • allowed_extensions (list) – List of extensions allowed for the downloaded files

  • allowed_secondary_extensions (list) – List of secondary extensions allowed for the downloaded files

Returns:

List of the LocalData of the downloaded files/folders

Return type:

List[LocalData]

upload_multiple(file_paths: List[str], dest_dir: str = None, action: str = 'default') List[LocalData]

Upload multiple files to the Biomed project found in the auth token.

It first checks that the files to be uploaded exist locally, and then performs the request to upload them through the specified File Manager.

Parameters:
  • file_paths (List[str]) – List of file paths to upload

  • dest_dir (str) – Directory path to which files will be uploaded

  • action (str) – Choose between ‘overwrite’, ‘default’, and ‘non-action’. Default value is ‘default’

Returns:

list of uploaded files

Return type:

List[LocalData]

download_multiple(iddatas: List[str], local_dir: str, allowed_extensions: List[str] = None, allowed_secondary_extensions: List[str] = None) List[LocalData]

Download multiple files or directories.

Given a list of data IDS, download them to the provided local_dir. If any of the IDs is a Biomed folder, the result will include both the folder and the files inside it.

For information about the allowed_extensions and allowed_secondary_extensions parameters, take a look at the documentation in DataController.download() method.

Parameters:
  • iddatas (List[str]) – list of files or directories to download

  • local_dir (str) – Directory to which datas will be downloaded

  • allowed_extensions (List[str]) – List of extensions allowed for the downloaded files

  • allowed_secondary_extensions (List[str]) – List of secondary extensions allowed for the downloaded files

Returns:

List of LocalDatas of the downloaded files/folders.

Return type:

List[LocalData]

_check_if_exists(path: str)

Check if local path exists.

_check_filetypes(data: LocalData)

Check if local file type and remote file type match.

upload(data: LocalData, action: str = 'default', v2=False) LocalData

Uploads a local file.

Given a LocalData, check if local file exists and if local and remote filetypes match, and then uploads local file to Biomed.

Parameters:
  • data (LocalData) – Data to update with the corresponding local_path attribute with the path to the actual file.

  • action (str) – Choose between ‘overwrite’, ‘default’, and ‘non-action’. Default value is ‘default’.

Returns:

LocalData with updated information

Return type:

LocalData

upload_dir(local_dir: str, remote_dir: str, do_not_upload: str = None) List[LocalData]

Upload all the files of a directory to Biomed.

First, check that the input local_dir is really a directory. Then, use a static method to build the local_paths and remote_paths that are needed to create the LocalData objects that will be passed to self.upload() method.

Examples

If local_dir is ‘/tmp/tmp2/dir_to_upload’ and remote_dir is ‘A/B/C’, this method will upload to biomed the folder ‘/A/B/C/dir_to_upload’ with its contents.

Parameters:
  • local_dir – Local directory to upload.

  • remote_dir – Remote biomed path where the local directory will be uploaded.

  • do_not_upload – empty files starting with the string defined here will not be uploaded.

Returns:

A list of files uploaded to biomed (as LocalData objects).

static _get_local_and_remote_paths(local_dir: str, remote_dir: str, do_not_upload: str) Tuple[List[str], List[str]]

Return a tuple of the paths of all files from local_dir and the new paths given a remote_dir.

Given a local directory (‘./A/B/C’) and a remote directory (‘D/E’) where you want to upload the local one, build the remote paths stripping the parent directories of local_dir from them, i.e. a file will have a remote_path like ‘D/E/C/file.txt’.

Note

Files starting with the string defined in do_not_upload will be removed from both lists.

is_dir(iddata=None, path=None)
Get if iddata or path is a directory.

You can only use one argument or iddata or path

Parameters:
  • iddata (str) – Id of the data to get

  • path (str) – Id of the data to get

Returns:

metadata from biomed as a Data object

Return type:

Data

get(iddata: str, unfolded: bool = False) Data | UnfoldedData

Get metadata of a data ID.

Parameters:
  • iddata (str) – Id of the data to get the metadata from

  • unfolded (bool) – a boolean to return an UnfoldedData if this method is called from the ‘unfold’ method

Returns:

metadata from biomed as a Data object

Return type:

Data

all() List[Data]

Get all data

Returns all datas from a project.

Returns:

List[Data]

filter(filters: Dict[str, Any]) List[Data]

Get filtered data with input filters

The input must contain the key search, and it will split its value between a parent directory (and add a leading ‘/’ if missing) and the basename of the last folder/file to avoid breaking backwards compatibility.

Parameters:

filters (Dict[str,Any]) – dictionary with the filters to apply to FileService

Returns:

List[Data]

filter_kube(filters) List[Data]

Get filtered data

Returns data that follows filter criteria (filtered by project or actor if filters set at controllers levels)

Parameters:

filters (Dict[str,Any]) – dictionary with the filters to apply to biomed backend

Examples

>>> filters = {
>>>   "parent": "aParentFolder",
>>>   "search": "test_regex_download_extensions/compression-no/biomed-1.0.0.tar",
>>>   "order_direction": "DESC",
>>>   "order_by": "updated",
>>> }
Returns:

List[Data]

get_metadata(filters: Dict[str, Any]) Data

Get metadata of a file given a filter criteria

Parameters:

filters (Dict[str,Any]) – dictionary with the filters to apply to biomed backend

Examples

>>> filters = {
>>>     "file": "/example_path/example_file.txt"  #NOTE: Root path must have `/`
>>> }
Returns:

Biomed file or folder

Return type:

Data

get_public_url(iddata: str, ttl_in_seconds: int = 7200) str

Get public URL from an ID data (it accepts an ID from a file or a folder).

Parameters:
  • iddata (str) – ID data to get public URL from.

  • ttl_in_seconds (int) – lifespan/expiration of generated URL in seconds (by default: 7200).

delete(iddata: str) Data

Delete a file or folder given its data ID

Parameters:

iddata (str) – ID data to delete.

move(src: str, dst: str) Data

Move file or directory to other difrectory

Parameters:
  • src (str) – ID data of file or directory.

  • dst (str) – ID data of directory.

unlock_file(iddata: str, days_unlock: int = 10) str

Unlock a single file for a given number of days.

Parameters:
  • iddata (str) – ID data to get public URL from.

  • days_unlock (int) – Number of days the file will remain unlocked. Default is 10 days.

__annotations__ = {}
__firstlineno__ = 45
__static_attributes__ = ('_file_manager', '_file_manager_class', '_kube_requester', '_onprem_requester')