Running TAO on Google Colab (2024)

If you have limited storage space, or want to iterate quickly through training experiments, we suggest you

Download data on to your local system
Run a utility script to generate a subset of the dataset
Upload this smaller dataset to your Google Drive to run experiments on

TAO provides utility scripts to generate such subsets for COCO dataset (which is around ~25 GB with ~120k images) and KITTI dataset (which is around ~12 GB with ~14k images)

To obtain subset for KITTI:

The subset generation file is present here. Run this script in your local system (not in Colab as you might have storage limitations)
Download and unzip kitti training and testing zip files here, you’ll have 2 folders training and testing

To obtain subset for training:

Copy

Copied!

 python generate_kitti_subset.py --source-data-dir=path_to_training_folder --out-data-dir=path_to_save_subset_data/training/ --training True --num-images=num_of_images_in_subset

Example

Copy

Copied!

 python generate_kitti_subset.py --source-data-dir=/home/user/data/training --out-data-dir=/home/user/subset_data/training/ --training True --num-images=100

To obtain subset for testing:

Copy

Copied!

 python generate_kitti_subset.py --source-data-dir=path_to_training_folder --out-data-dir=path_to_save_subset_data/training/ --num-images=num_of_images_in_subset

Example

Copy

Copied!

 python generate_kitti_subset.py --source-data-dir=/home/user/data/testing --out-data-dir=/home/user/subset_data/testing/ --num-images=100

Dataset folder structure for kitti:

Copy

To obtain subset for COCO:

The subset generation file is present here. Run this script in your local system (not in Colab as you might have storage limitations)
Download and unzip 2017 train and val images, 2017 train/val annotations from here

To obtain subset for training:

Copy

Copied!

 python generate_coco_subset.py --source-image-dir=path_to_train2017_folder --source-annotation-file=path_to_instances_train2017.json_file --out-data-dir=path_to_save_subset_data --num-images=num_of_images_in_subset

Example

Copy

Copied!

 python generate_coco_subset.py --source-image-dir=/home/user/data/train2017 --source-annotation-file=/home/user/data/annotations/instances_train2017.json --out-data-dir=/home/user/subset_data/ --num-images=100

To obtain subset for validation:

Copy

Copied!

 python generate_coco_subset.py --source-image-dir=path_to_val2017_folder --source-annotation-file=path_to_instances_val2017.json_file --out-data-dir=path_to_save_subset_data --num-images=num_of_images_in_subset

Example

Copy

Copied!

 python generate_coco_subset.py --source-image-dir=/home/user/data/val2017 --source-annotation-file=/home/user/data/annotations/instances_val2017.json --out-data-dir=/home/user/subset_data/ --num-images=100

Dataset folder structure for coco:

Copy

Copied!

 folder_into_which_downloaded_coco_files_are_unzipped|___train2017|___val2017|___annotations |___instances_train2017.json |___instances_val2017.json