If you have limited storage space, or want to iterate quickly through training experiments, we suggest you
Download data on to your local system
Run a utility script to generate a subset of the dataset
Upload this smaller dataset to your Google Drive to run experiments on
TAO provides utility scripts to generate such subsets for COCO dataset (which is around ~25 GB with ~120k images) and KITTI dataset (which is around ~12 GB with ~14k images)
To obtain subset for KITTI:
The subset generation file is present here. Run this script in your local system (not in Colab as you might have storage limitations)
Download and unzip kitti training and testing zip files here, you’ll have 2 folders training and testing
To obtain subset for training:
Copy
Copied!
python generate_kitti_subset.py --source-data-dir=path_to_training_folder --out-data-dir=path_to_save_subset_data/training/ --training True --num-images=num_of_images_in_subset
Example
Copy
Copied!
python generate_kitti_subset.py --source-data-dir=/home/user/data/training --out-data-dir=/home/user/subset_data/training/ --training True --num-images=100
To obtain subset for testing:
Copy
Copied!
python generate_kitti_subset.py --source-data-dir=path_to_training_folder --out-data-dir=path_to_save_subset_data/training/ --num-images=num_of_images_in_subset
Example
Copy
Copied!
python generate_kitti_subset.py --source-data-dir=/home/user/data/testing --out-data-dir=/home/user/subset_data/testing/ --num-images=100
Dataset folder structure for kitti:
path_to_training_folder|___images|___labelspath_to_testing_folder|___images
To obtain subset for COCO:
The subset generation file is present here. Run this script in your local system (not in Colab as you might have storage limitations)
Download and unzip 2017 train and val images, 2017 train/val annotations from here
To obtain subset for training:
Copy
Copied!
python generate_coco_subset.py --source-image-dir=path_to_train2017_folder --source-annotation-file=path_to_instances_train2017.json_file --out-data-dir=path_to_save_subset_data --num-images=num_of_images_in_subset
Example
Copy
Copied!
python generate_coco_subset.py --source-image-dir=/home/user/data/train2017 --source-annotation-file=/home/user/data/annotations/instances_train2017.json --out-data-dir=/home/user/subset_data/ --num-images=100
To obtain subset for validation:
Copy
Copied!
python generate_coco_subset.py --source-image-dir=path_to_val2017_folder --source-annotation-file=path_to_instances_val2017.json_file --out-data-dir=path_to_save_subset_data --num-images=num_of_images_in_subset
Example
Copy
Copied!
python generate_coco_subset.py --source-image-dir=/home/user/data/val2017 --source-annotation-file=/home/user/data/annotations/instances_val2017.json --out-data-dir=/home/user/subset_data/ --num-images=100
Dataset folder structure for coco:
Copy
Copied!
folder_into_which_downloaded_coco_files_are_unzipped|___train2017|___val2017|___annotations |___instances_train2017.json |___instances_val2017.json