boilercv_pipeline.sets#

Datasets.

Module Contents#

Functions#

process_datasets

Get unprocessed dataset names and write them to disk.

get_unprocessed_destinations

Get destination paths for unprocessed datasets.

inspect_dataset

Inspect a video dataset.

get_dataset

Load a video dataset.

get_selector

Get selector, preferring label-based selection even given an integer slice.

inspect_video

Inspect video data array.

load_video

Load video data array.

save_video

Save video data array.

get_stage

Get the paths associated with a particular video name and pipeline stage.

get_contours_df

Load contours from a dataset.

get_contours_df2

Load contours from a dataset.

slice_frames

Return a slice suitable for getting frames from datasets.

Data#

ROOTED_PATHS

Paths rooted to their directories.

ALL_FRAMES

Slice that gets all frames.

STAGE_DEFAULT

Default stage to work on.

API#

boilercv_pipeline.sets.ROOTED_PATHS#

‘Paths(…)’

Paths rooted to their directories.

boilercv_pipeline.sets.ALL_FRAMES#

‘slice(…)’

Slice that gets all frames.

boilercv_pipeline.sets.STAGE_DEFAULT#

‘sources’

Default stage to work on.

boilercv_pipeline.sets.process_datasets(
destination_dir: pathlib.Path,
reprocess: bool = False,
sources: pathlib.Path = ROOTED_PATHS.sources,
) collections.abc.Iterator[dict[str, Any]]#

Get unprocessed dataset names and write them to disk.

Use as a context manager. Given a destination directory, yield a mapping with unprocessed dataset names as its keys. Upon exiting the context, datasets assigned to the values of the mapping will be written to disk in the destination directory.

If no values are assigned to the yielded mapping, no datasets will be written. This is useful for processes which take input datasets but handle their own output, perhaps to a different file format.

Args: destination_dir: The directory to write datasets to. reprocess: Whether to reprocess all datasets.

boilercv_pipeline.sets.get_unprocessed_destinations(
destination_dir: pathlib.Path,
ext: str = 'nc',
reprocess: bool = False,
sources: pathlib.Path = ROOTED_PATHS.sources,
) dict[str, pathlib.Path]#

Get destination paths for unprocessed datasets.

Given a destination directory, yield a mapping of unprocessed dataset names to destinations with a given file extension. A dataset is considered unprocessed if a file sharing its name is not found in the destination directory.

Parameters#

  • names: Names of the datasets to process.

  • destination_dir: Desired destination directory.

  • ext: Desired file extension.

  • reprocess: Reprocess all datasets.

  • sources: Directory of sources to be processed.

Returns#

  • (A mapping of unprocessed dataset names to destinations with the given file.)

boilercv_pipeline.sets.inspect_dataset(
name: str,
stage: boilercv.correlations.types.Stage = STAGE_DEFAULT,
sources: pathlib.Path = ROOTED_PATHS.sources,
) boilercv.types.DS#

Inspect a video dataset.

boilercv_pipeline.sets.get_dataset(
name: str,
num_frames: int = 0,
frame: slice = ALL_FRAMES,
stage: boilercv.correlations.types.Stage = STAGE_DEFAULT,
sources: pathlib.Path = ROOTED_PATHS.sources,
rois: pathlib.Path = ROOTED_PATHS.rois,
) boilercv.types.DS#

Load a video dataset.

boilercv_pipeline.sets.get_selector(
video: boilercv.types.DA,
dim: str,
sel: slice | range | Any | None,
) slice | range | Any#

Get selector, preferring label-based selection even given an integer slice.

boilercv_pipeline.sets.inspect_video(
path: pathlib.Path,
) collections.abc.Iterator[boilercv.types.DA]#

Inspect video data array.

boilercv_pipeline.sets.load_video(
path: pathlib.Path,
slices: collections.abc.Mapping[str, slice | range | Any] | None = None,
) collections.abc.Iterator[boilercv.types.DA]#

Load video data array.

boilercv_pipeline.sets.save_video(
da: boilercv.types.DA,
path: pathlib.Path,
)#

Save video data array.

boilercv_pipeline.sets.get_stage(
name: str,
sources: pathlib.Path,
) tuple[pathlib.Path, pathlib.Path]#

Get the paths associated with a particular video name and pipeline stage.

boilercv_pipeline.sets.get_contours_df(
name: str,
contours: pathlib.Path = ROOTED_PATHS.contours,
) boilercv.types.DF#

Load contours from a dataset.

boilercv_pipeline.sets.get_contours_df2(
path: pathlib.Path,
) boilercv.types.DF#

Load contours from a dataset.

boilercv_pipeline.sets.slice_frames(
num_frames: int = 0,
frame: slice = ALL_FRAMES,
) slice#

Return a slice suitable for getting frames from datasets.