hawk_eye.data_generation.create_clf_data

This script creates classification data for the pre-detection classification step; however, this script relies upon detection already existing. Also, we need to ensure our output dataset does not have repeats of image slices between the train and validation sets – this would mess up our training metrics. Since the detector data can contain empty tiles, we will not be copying those here. Instead, we prefer to generate our own empty slices so we have precise cnotrol over which exist. We will first generate all the background crops and copy the target crops into one folder, then we’ll shuffle them all and split the data into 80% training and 20% validation.

hawk_eye.data_generation.create_clf_data.create_clf_images(num_gen: int, save_dir: pathlib.Path = PosixPath('/home/runner/work/hawk-eye/hawk-eye/hawk_eye/data_generation/data'), val_fraction: float = 0.2) → None[source]

Generate data for the classifier model.

hawk_eye.data_generation.create_clf_data.single_clf_image(image: PIL.Image.Image, number: int, num_gen: int, save_dir: pathlib.Path, num_tiles: int) → int[source]

Slice out crops from the original background image and save to disk. NOTE: we do not have any overlap between adjacent tiles because we want to avoid having any leakage between images. With data leakage, we might end up with two adjacent tiles in both the train and eval set.