Counting Vehicles - Setup TensorFlow Training - Part 3
Published on 11/25/2018
7 min read
In this part we will setup TensorFlow, get the training data in the right format, start training and monitor tensorboard for the models performance.
Depending on your setup installing TensorFlow may be tricky, luckily there is plenty
of community support on this topic. To start first check out the tensorflow/models git repository and follow the TensorFlow Object Detection installation guide.
TensorFlow's installation guide may help as well.
It is a good idea to also check your installation by running a sample locally,
trying out their jupyter notebook
is also a good introduction. Lastly, a lot of the steps that we will take are similar to the process
for Distributed Training on the Oxford-IIIT Pets Dataset on Google Cloud. When everything is complete
you should have a
models/research directory that we will work out of.
In order to use our training data with the object detection API we need to convert our training data to their TFRecord format. First is having a proper label map, label_map.pbtxt is the one that we will use. The next step is to actually create the TFRecord files, before we do that though it is necessary to split your training data into two subsets, training data and evaluation data. Here is a rudamentary script to do that, be sure to update the paths accordingly.
import os import shutil import numpy as np # Take random files from basedir and move them into outdir basedir = '/mnt/storage/development/count-cars/train-labels' files = os.listdir(basedir) outdir = '/mnt/storage/development/count-cars/eval-labels' random_indexes = np.random.permutation(len(files)) # Take 10% of our data for i in range(int(len(files) * 0.10)): pos = random_indexes[i] to_move = os.path.join(basedir, files[pos]) shutil.move(to_move, outdir)
As mentioned before, an advantage of LabelIMG is its ability to output labels compliant with the Pascal VOC format.
The object detection API repository has a python script to convert the Pascal VOC records into the required
I have updated this script to work without using the Pascal VOC files and instead work with our LabelIMG output, it is located in the deeplens-count-vehicles repository
Copy that python file to the directory
models/research/object_detection/dataset_tools/ (the same location
as create_pascal_tf_record.py) and run it like so for the training data
python object_detection/dataset_tools/create_labelimg_tf_record.py \ --label_dir=/mnt/storage/development/count-cars/training-labels \ --label_map_path=/mnt/storage/development/count-cars/label_map.pbtxt \ --output_path=/mnt/storage/development/count-cars/data/train
and for the evaluation data
python object_detection/dataset_tools/create_labelimg_tf_record.py \ --label_dir=/mnt/storage/development/count-cars/eval-labels \ --label_map_path=/mnt/storage/development/count-cars/label_map.pbtxt \ --output_path=/mnt/storage/development/count-cars/data/eval
and copy over the label mapping to the
cp /mnt/storage/development/count-cars/label_map.pbtxt /mnt/storage/development/count-cars/data/label_map
it is complete you should have three files within a
data directory that are required for training within TensorFlow
- label_map - mapping of label ids to label strings
- eval - TFRecord file of the evaluation data
- train - TFRecord file of the training data
Next we will setup the requirements for training locally. The object detection API uses
a configuration file called a model pipeline that describes the composition of the deep learning
neural network. The model config that we will use
is based off of the ssd_mobilenet_v1_fpn_shared_box_predictor_640x640_coco14_sync.config within the object detection API repository.
More details are available for configuring the object detection training pipeline.
The main differences in the two configurations are the
image_resizer, we are using 429x240 images,
we also take out the data augmentation option that horizontally flips the input images and lastly
num_classes to be 11.
Before training be sure to download the ssd_mobilenet_v1_fpn_coco model from the model zoo.
This model is our base model for our transfer learning. Also, make sure the paths within the config
file correspond to the locations of the files on your computer.
To start the training run the script run.sh after copying it to the
models/research directory and updating the file paths.
This script will start the training locally. If you would like to train the model within Google Cloud follow the page running on Google Cloud ML engine.
If you get the error
Traceback (most recent call last): File "object_detection/model_main.py", line 25, in <module> from object_detection import model_hparams ImportError: No module named object_detection
Make sure to properly export the python path like so
You can see the results of my training runs below. For the most part the model
doesn't do too bad with around 82% precision for IoU 0.5 to 0.95 and an average
recall of 83.4% for the same IoU. I believe the -1.000 visible for area=small
is due to no bounding boxes being considered small.
Also, since the lens on the DeepLens has a wide angle the vehicles on the outer edges of the image
have smaller widths, which I think may be causing issues with high IoUs.
I don't have much expertise
in the area of statistics, however after reading more about the evalution statistics I came to the conclusion
that when we detect something we are very good at determing that correct vehicle class
and when we do detect something it is pretty much always a vehicle, along with a fairly
accurate bounding box.
However, due to the lower average recall I came to the conclusion that we do miss labeling vehicles that are sometimes in the image. For our purposes since a vehicle will be driving by we have the opportunity to perform multiple classifications, so having a lower average recall in this case shouldn't affect our goal. Also, the lower average recall is likely due to misclassifying vehicles, most commonly between SUVs that look like cars and vice-versa.
Running per image evaluation... Evaluate annotation type *bbox* DONE (t=0.19s). Accumulating evaluation results... DONE (t=0.06s). Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.823 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 1.000 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 1.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.813 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.863 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.834 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.834 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.834 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.826 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.863
During training you can start up
tensorboard to view the model results as checkpoints are completed.
See how to start running tensorboard for more details. Below are sample screenshots from my TensorFlow runs that
have trained for multiple hours on a GPU. As you can see within the first couple of steps the
average precision and average recall steadidly climb before dropping off to a much lower rate of improvement.