Introduction

Our Machine Learning system is used to tag the images obtained in the SRPScrape process.

We try to extract two features from the images:

Machine Learning pipes and models

In order to extract the aforementioned data from the images we use two different pipes composed of 3 different models:

Notes:

 Edit
Machine Learning Pipe and Model Historical

First Iteration (End of 2018)

The first implementation was not handled by hoot and we don't have too much information about it. Our current Color detection model was trained for this first iteration and it was not retrained or changed since.

 E

Second Iteration (Feb 2020)

The second iteration was handled by Hoot and the idea of using the crop model was introduced. With this version, we finally decide to use Pipe 1 with a freshly trained Image Type model.

To build this new model a Transfer Learning approach was used. Using Resnet50 architecture trained with Imagenet data set, freezing its inner layers, and modifying the output layers to fit them to a 3 class classification problem.

In order to obtain the best model possible, we experimented with different output layer architectures and dropout values. Due to the nature of the problem we want to solve, we want to have the recall as high as possible over placeholders, to avoid tagging them incorrectly so those images are never used in videos. A scoring system to evaluate the models was created where we scored both the model accuracy and recall over the placeholder data.

The data set was also extended to also contain examples of images of other vehicle types such as boats, bikes, quads, or RVs. As there was a unbalance between the different types of images, some image augmentation techniques were applied.

Third and Fourth Iterations (August 2020)

This iteration was caused due to some dealers having very bad results for the model. In order to improve the results examples from all the faulty dealers were added to the data set and a new model was trained following the same approach as the previous one.

In the Third iteration testing phase, we noticed that the images from a particular advertiser impacted the results for other advertisers so we decided to remove them from the data set of the Fourth iteration.

Fifth Iteration (October 2021)

This iteration comes with a change in approach, testing if Pipe 2 can improve the Dealer vs Stock accuracy/recall.

In order to improve the Stock vs Dealer classification, we are trying to create datasets with cropped images to remove as much background image information in order to test if the vehicles themselves contain enough information to correctly tell them apart.

The new dataset was created as follows:

Sixth Iteration (November 2021)

In this iteration, we maintain the same approach as the previous one. This iteration was caused by two situations:

The new dataset was created as follows:

 Edit