Counting Vehicles - Setup TensorFlow Training - Part 3

In this part we will setup TensorFlow, get the training data in the right format, start training and monitor tensorboard for the models performance.

Setup TensorFlow

Depending on your setup installing TensorFlow may be tricky, luckily there is plenty of community support on this topic. To start first check out the tensorflow/models git repository and follow the TensorFlow Object Detection installation guide. Referencing TensorFlow's installation guide may help as well. It is a good idea to also check your installation by running a sample locally, trying out their jupyter notebook is also a good introduction. Lastly, a lot of the steps that we will take are similar to the process for Distributed Training on the Oxford-IIIT Pets Dataset on Google Cloud. When everything is complete you should have a models/research directory that we will work out of.

Prepare Training Data

In order to use our training data with the object detection API we need to convert our training data to their TFRecord format. First is having a proper label map, label_map.pbtxt is the one that we will use. The next step is to actually create the TFRecord files, before we do that though it is necessary to split your training data into two subsets, training data and evaluation data. Here is a rudamentary script to do that, be sure to update the paths accordingly.

import os
import shutil
import numpy as np

# Take random files from basedir and move them into outdir
basedir = '/mnt/storage/development/count-cars/train-labels'
files = os.listdir(basedir)
outdir = '/mnt/storage/development/count-cars/eval-labels'
random_indexes = np.random.permutation(len(files))
# Take 10% of our data
for i in range(int(len(files) * 0.10)):
    pos = random_indexes[i]
    to_move = os.path.join(basedir, files[pos])
    shutil.move(to_move, outdir)

As mentioned before, an advantage of LabelIMG is its ability to output labels compliant with the Pascal VOC format. The object detection API repository has a python script to convert the Pascal VOC records into the required TFRecord format. I have updated this script to work without using the Pascal VOC files and instead work with our LabelIMG output, it is located in the deeplens-count-vehicles repository Copy that python file to the directory models/research/object_detection/dataset_tools/ (the same location as and run it like so for the training data

python object_detection/dataset_tools/ \
        --label_dir=/mnt/storage/development/count-cars/training-labels \
        --label_map_path=/mnt/storage/development/count-cars/label_map.pbtxt \

and for the evaluation data

python object_detection/dataset_tools/ \
        --label_dir=/mnt/storage/development/count-cars/eval-labels \
        --label_map_path=/mnt/storage/development/count-cars/label_map.pbtxt \

and copy over the label mapping to the data folder

cp /mnt/storage/development/count-cars/label_map.pbtxt /mnt/storage/development/count-cars/data/label_map

Once it is complete you should have three files within a data directory that are required for training within TensorFlow

  1. label_map - mapping of label ids to label strings
  2. eval - TFRecord file of the evaluation data
  3. train - TFRecord file of the training data

Start Training Locally

Next we will setup the requirements for training locally. The object detection API uses a configuration file called a model pipeline that describes the composition of the deep learning neural network. The model config that we will use is based off of the ssd_mobilenet_v1_fpn_shared_box_predictor_640x640_coco14_sync.config within the object detection API repository. More details are available for configuring the object detection training pipeline. The main differences in the two configurations are the image_resizer, we are using 429x240 images, we also take out the data augmentation option that horizontally flips the input images and lastly we updated num_classes to be 11. Before training be sure to download the ssd_mobilenet_v1_fpn_coco model from the model zoo. This model is our base model for our transfer learning. Also, make sure the paths within the config file correspond to the locations of the files on your computer. To start the training run the script after copying it to the models/research directory and updating the file paths.
This script will start the training locally. If you would like to train the model within Google Cloud follow the page running on Google Cloud ML engine.


If you get the error

Traceback (most recent call last):
  File "object_detection/", line 25, in <module>
    from object_detection import model_hparams
ImportError: No module named object_detection

Make sure to properly export the python path like so export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim within the models/research directory.

Model Results

You can see the results of my training runs below. For the most part the model doesn't do too bad with around 82% precision for IoU 0.5 to 0.95 and an average recall of 83.4% for the same IoU. I believe the -1.000 visible for area=small is due to no bounding boxes being considered small. Also, since the lens on the DeepLens has a wide angle the vehicles on the outer edges of the image have smaller widths, which I think may be causing issues with high IoUs. I don't have much expertise in the area of statistics, however after reading more about the evalution statistics I came to the conclusion that when we detect something we are very good at determing that correct vehicle class and when we do detect something it is pretty much always a vehicle, along with a fairly accurate bounding box.
However, due to the lower average recall I came to the conclusion that we do miss labeling vehicles that are sometimes in the image. For our purposes since a vehicle will be driving by we have the opportunity to perform multiple classifications, so having a lower average recall in this case shouldn't affect our goal. Also, the lower average recall is likely due to misclassifying vehicles, most commonly between SUVs that look like cars and vice-versa.

Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=0.19s).
Accumulating evaluation results...
DONE (t=0.06s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.823
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 1.000
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 1.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.813
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.863
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.834
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.834
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.834
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.826
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.863


During training you can start up tensorboard to view the model results as checkpoints are completed. See how to start running tensorboard for more details. Below are sample screenshots from my TensorFlow runs that have trained for multiple hours on a GPU. As you can see within the first couple of steps the average precision and average recall steadidly climb before dropping off to a much lower rate of improvement.

Tensorboard Scalars Output Tensorboard showing the scalars output.

Tensorboard Images Output Tensorboard showing a subset of the evaluation data being labeled.

Continue on to Part 4 to export, optimize and deploy our model to the AWS DeepLens.





Collect Training Data


Deploy to DeepLens