It has been a while since I last looked at the machine learning model developed in Part 1. This time around I wanted to explore the model performance a little more, and see about trying to extend it into something that could be used in the actual EnclosureGenerator application. In case you are jumping in here and skipped the first blog post: This post will focus on a more in-depth exploration of the problem of identifying which face (if any) of a circuit component needs a cutout in an enclosure for that cicuit board.
In the last blog post the model performance was only measured using a general accuracy score and looking at the confusion matrix generated when validating the model. There are other useful metrics/methods for evaulating model performance, one of these is ROC curves. ROC curves (shoutout to Luke for recommending these!) are a graphical way to understand a binary classification model's tradeoff between true and false positives based on a decision threshold. They can also be extended for use in multi-class classification models, but that won't be necessary here.
For a given instance of what is being classified (an image in this case), a binary classification model will assign a "score" to that instance (often the score is simply a probability), generally a value between 0 and 1. In the case of this classification task, if the image being classified has a score close to 1 the model thinks the image IS NOT a cutout face, if the score is close to 0 the model thinks the image IS a cutout face. As the score gets closer to 0.5, the model is less confident in the classification.
Given a list of images, the model will assign a score to each image. Once these scores are determined, it is possible to alter the performance of the final classifications by changing the threshold, or cut-off, that a score must pass in order to be given a classification. For instance, if an image in this classification task had a score of 0.6, then it would be given a classification of "not a cutout face" if the threshold was 0.5. However if the threshold were made to be 0.7, then it would be classified as a cutout face. Altering the classification threshold changes the number of true positives and false positives for a given group of classifications. There is an inherent trade-off here, as the threshold is moved more instances will be classified as one instance or another. This can be done in classification problems a false negative is very costly (medical diagnoses), or vice versa.
An ROC curve graphs the rate of true positives vs. false positives for a binary classification model given a specific threshold. By looking at the shape of the graphed line, it is possible to get a sense of how accurate a model is. The ideal model is at the upper left hand of an ROC curve, meaning it has all true positives and no false positives. Thus, it is possible to see how close the model is to random classification as opposed to being perfectly accurate. The ROC curve for component classifier looks like this:
I was pretty pleased with the ROC curve and the performance of the model in general, but there is still a significant problem with the data: class imbalance. What this means in the context of a classification dataset is that there are more instances of one class than the other class(es). There are ~7000 total images (each representing one orthographic face of a component), but of these images there are only 818 cutout face images. On top of this, there is significant imbalance in the type of connectors used to generate component faces - this is primarily the result of a limited capability on my part to curate a well-rounded library of component CAD models (if anyone has any large libraries of component CAD models, let me know!). In any case, in my dataset there are many pin header components, but relatively few USB or Ethernet connectors, which makes classifying the faces generated from those
Looking at this paper, the authors attempted to improve model performance for an imbalanced class dataset in a variety of ways, primarily by manipulating the number of each kind of class that was used in the creation of the model. Summarizing, they found the most improvement by simply oversampling the imbalanced data. That was simple enough to try, all I needed to do was create enough copies of all of the cutout face images to roughly match the number of non-cutout face images, and run the classifier again. See the results in the figure below:
The results were quite promising, an improvement from ~95% accuracy (per the last blog post) to 97%! All with no tuning of the model or other improvements! I have not yet balanced the expanded dataset to create more numerous copies of those connector types that have few images, that will likely come in a later blog post. For now, a 97% accuracy is acceptable.
The final problem that needed to be solved was how to actually use the FastAI model developed within a web application. FastAI makes it easy to export a model file using the built-in Learner.export() function, which contains all parts of the model - classes, transformations, weights, etc. in a single file.
Future work - Model Use in Production
If you have any recommendations on new features, please take this survey and let me know what you would like in a future release!
In the next post I'll go over the process of deploying this model into a demo web application. Some other improvements I may make to the model:
Classify images as belonging to specific component types rather than simply requiring a cutout face or not a cutout face.
Balance the dataset using a dynamic approach in which those connectors which are least represented are copied more times.