Counting Vehicles - Model Improvements - Part 5

We will look into some techniques we can use to improve our model, both in terms of precision and recall and the actual inference performance. Also, we will explore some oddities of the model.

Make it better

Improving inference performance

One of the downfalls of our original model is the inference time. Due to the size of the model it only averaged about 1-2 frames per second, so if a vehicle was driving by fast we would only get one or two times to classify the vehicle. This was also an issue because the sweet spot for the model was right in front of the camera and with a low FPS being able to capture the vehicle in that sweet spot had a low probability. If you look at the TensorFlow Object Detection API model zoo you can see that the base model that we chose ssd_mobilenet_v1_fpn_coco had a speed of 56 ms, with a powerful Nvidia GeForce Titan GPU. Since the DeepLens has a much less capable GPU, the Intel Gen9, you can hypothesize that it was 10x slower. Seeing as how we got 1-2 FPS, which would equate to a roughly 500 ms inference time as a very rough measure.

With that in mind, in order to improve the performance of the model we could have opted for smaller input images. We chose images that were 429x240 in RGB, we could have used smaller images, so for example 356x200 in grayscale. This would have reduced the number of computations the GPU would have to do. Take note, that this may have reduced the model's accuracy and recall.

Another option is to just use a different model pipeline with a different base model. This is one option I chose to implement, I opted to use the ssdlite_mobilenet_v2_coco base model, which had a speed of 27 ms, so we could expect it to be roughly twice as fast speed wise. When I implemented it I was able to get 4-6 FPS, while still slower than the input rate of the camera, 10 FPS, it allowed for capturing more instances of a vehicle driving by.

Improving model performance

Another area I wanted to improve upon was the actual performance of the model, in terms of its accuracy and more importantly the recall. As always, collecting and labeling more training data would have likely done this for us, although I opted to not perform this time consuming step. Instead I took the lazy route and explored more data augmentation options. One of the areas that I noticed lower performance was when the sun was actually shining into the window. When I first collected training data in September the sun was at a higher angle in the sky so it didn't really shine into the window, and most of my training data was centered around mid-day. Seeing as now it is November the sun is at a lower angle, causing it to shine into the window and cause different ligtning issues in general. One way I opted to get around this was to introduce some data augmentation. The ones that I chose to specifically implement were random_adjust_brightness, random_adjust_contrast, random_jitter_boxes and ssd_random_crop. The brightness and contrast adjustments were to take care of the different lighting that may take place within the image. If you look below, these are two captures that are about 40 days apart and at the same time of day.

Capture on 11/22/2018 at 16:45 UTC Capture on 10/14/2018 at 16:45 UTC

I opted to also use the random jitter boxes to account for any inaccuracies with labeling the bounding boxes. The random crop was chosen in the event I changed the location of the camera. I'm not sure how much, if any, these data augmentation options actually made the model better, for anything you want to improve you really need to measure and evaluate to determine its effectiveness, while keeping everything else the same.

Model Comparison

Here are the final results of each model, trained and evaluated on the training and evaluation data, to 50k steps. As you can see the MobileNet V2 does receive lower average precision and lower average recall, this is expected as the model is smaller, but remember MobileNet V2 is also much faster when running on the DeepLens. One thing to also keep in mind is that the MobileNetV2 model also has the additional data augmentation options, I am unsure if this actually helped or hurt the model, to evaluate I would have to train a third model without the data augmentation options to compare to.

MobileNet V1 SSD final results 50000 steps

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.823
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 1.000
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 1.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.813
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.863
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.834
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.834
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.834
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.826
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.863

MobileNet V1 SSD 50k Steps

MobileNet V2 SSDLite final results 50000 steps

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.658
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.818
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.802
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.623
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.819
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.678
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.678
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.678
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.643
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.825

MobileNet V2 SSDLite 50k Steps

Sample Examples

Here are some same classification examples of non training and non evaluation data to compare and contrast the models on a handful of examples. As you can see the MobileNet V2 SSDLite does have some classification issues, it has a hard time judging between SUVs and cars, but gets the direction right every time. There is one example that the MobileNet V2 fails to classify, image 6, I imagine this is due to the brightness of the image and scenary behind the car causing issues. Overall, both models don't do too bad, they can handle the season changing from summer, to fall, to winter showing how much the model really focuses on the road. This is also evident in image 10 where my neighbor's SUV is parked in their driveway but fails to get detected despite appearing in the same orientation as cars on the road.


MobileNet V1 Image 1 Image 1 - MobileNet V1 SUV Left 99% (correct) MobileNet V2 Image 1 Image 1 - MobileNet V2 SUV left 99% (correct)


MobileNet V1 Image 3 Image 3 - MobileNet V1 SUV Left 81% (correct) MobileNet V2 Image 3 Image 3 - MobileNet V2 car left 92% (incorrect)


MobileNet V1 Image 6 Image 6 - MobileNet V1 car left 95% (correct) MobileNet V2 Image 6 Image 6 - MobileNet V2 unlabeled (incorrect)


MobileNet V1 Image 9 Image 9 - MobileNet V1 SUV left 98% (correct) MobileNet V2 Image 9 Image 9 - MobileNet V2 car left 92% (incorrect)


MobileNet V1 Image 10 Image 10 - MobileNet V1 car left 93% (correct) MobileNet V2 Image 10 Image 10 - MobileNet V2 car left 91% (correct)


MobileNet V1 Image 11 Image 11 - MobileNet V1 car right 97% (correct) MobileNet V2 Image 11 Image 11 - MobileNet V2 car right 92% (correct)


MobileNet V1 Image 14 Image 14 - MobileNet V1 SUV right 99% (correct) MobileNet V2 Image 14 Image 14 - MobileNet V2 car right 97% (incorrect)


Oddities

One of the downfalls of machine learning in general is that it is only as good as the data you train the model with. Also, with some ingenuity it is possible to construct data to fool the model.

Misclassifications

Below is an example of a misclassification Misclassified Van Left SUV left misclassified as van left

In this case the probability is only 84% so we could have easily thrown this away by increasing our detection threshold to something like 90%. With increasing the detection threshold though we would potentially miss more vehicles, but have better accuracy on the vehicles that we did detect. There is always a trade off with these detection thresholds that will be project specific. In this case the model likely mispredicts this example because the SUV does sort of have the shape of a van, I think in this case the windows in the back seat give it away as an SUV. Another reason for the misprediction would be the color of the SUV, the majority of the vans in the training data are white, while for SUVs the color white is probably evenly distributed with the other colors. Therefore, in this case the model is actually relying on the color of the vehicle to aid in classification. One way to avoid this would be to convert the images to grayscale.

Played for a fool

Toy Car Classified Example of fooling the TensorFlow model

This example was hand constructed, I had a pink toy car that I placed in front of the camera to see if I could get it to detect anything. In this case it labeled the toy car as car left with a probability of rougly 76%. While the toy car is actually a car, it isn't actually going left, it is going right. The key to fooling the model in this case was getting something that looked like the training data, the toy car, and also making it appear the same size in the scene as the training data. If you look above at the misclassified van left, you can see the general area where vehicles would drive and their scale. I tried to replicate this with the toy car, placing it near the center of the scene with roughly the same size. In the model's defense, it did only assign a probability of 76%, so it didn't get fooled that bad. In addition, since the training data never had toy cars within the image the model didn't really know any better that it isn't a real car, further emphasing the importance of training data that accurately depicts what you want to detect.

Thank You

Thank you for reading this series of blog posts on counting vehicles. Feel free to add comments below or email me if you have questions or comments. I plan on adding more content in the coming months so be sure to subscibe to the RSS feed.

comment

Comments

arrow_back

Previous

Deploy to DeepLens

Next

Optimize TF Models
arrow_forward