How to Perform Better on Machine Learning Projects
Comedian Chris Rock comes up with novel ideas in small steps. Before his hugely successful shows, he spends as many evenings as possible in small comedy clubs and tries out new jokes. Of course, most of them fall flat, and the audience laughs at him. But there will inevitably be a handful of fireworks among the hundred failed ideas. This process is how Chris Rock prepares for his new shows, up to a year of daily grinding out for an hour of (superb) entertainment. Because: after the show is before the show.
We could learn a lot from Rock’s creation process: discipline, humility, and exposure to a critical, even hostile audience. However, I would like to single out three questions that we can use after concluding a (machine learning) project. These three questions are What have I learned?, What should I avoid the next time?, and What should I again do the next time?Answering these three questions allows us to perform better on the next project. And, because even minor improvements compound, habitually going through them will give us significant momentum. Alas, to the first question.
What have I learned?
When Chris Rock tests his new jokes, he is often seen with a notepad, scribbling down the audience’s reactions. Rock’s careful analysis allows him to see what he has to revise. Similarly, we ask What have I learned? first because the answers form the basis for the next steps. We are looking for an honest evaluation of all our experiences and actions as part of the project. To make this more concrete, let me refer to an example from a previous post. After implementing an augmentation routine on top of a TFRecord dataset, my model’s performance improved by a large margin as measured by the loss and accuracy scores. Content with this boost, I gave no further thought to the input pipeline and focused my attention on other features. Then, later, I noticed that the augmentation steps had introduced a significant bottleneck, increasing the per-epoch time from one minute to a staggering 15 minutes
Let’s analyze this incident together and see what we can learn from it:
- A sole focus on the model’s predictive powers occluded the severe bottleneck. Thus, the lesson learned is that we should monitor not only primary performance indicators (loss, accuracy, etc.) but also check secondary metrics (pipeline throughput, hardware utilization, etc.).
- The augmentation package that I used clearly stated that all operations are performed on the CPU. While I remember having read this, I think that I did not consider the impact of this fact. Thus, the second lesson learned is that we should not mindlessly add cool stuff.
- We should also focus on what we have learned while implementing the augmentation routine. In this case, it was reading data from the TFRecord format and passing it through a custom preprocessing pipeline. Thus, the third lesson learned is how to read data from this specific format and preprocess it.
We can learn at least three things from this single incident: check secondary metrics, caution when adding new features, and preprocessing data from TFRecords. Repeating such a meticulous analysis with all our experiences, we will find numerous helpful pointers for the next project.
Then, after having compiled a (lengthy) list, we can answer the second question.
What should I avoid the next time?
After documenting our missteps and correct decisions in the previous steps, we divide the list into two categories. The first category, Avoid, holds all incidents/lessons learned that we should avoid the next time.
Drawing from the example, we can already put two items into this category. The first item is that we should not blindly focus on a single metric’s value. The second item is that we should not add features just because we deem them cool.
To clarify this category, let me give another example from a previous post. For neural networks, we can optimize countless parameters. The learning rate is one of them. Because I tried to be clever, I thought letting the learning rate vary throughout training would be brilliant. After searching through relevant research, I found Leslie Smith’s paper on cyclical learning rates. Excited about this cool feature, I jumped right into implementing it for my training. However, after several days of coding and optimizing parameters, I realized that I do not really require this feature for this project. Also, letting the learning rate vary introduced additional hyperparameters and increased the project’s complexity. In the end, I dropped the scheduling strategies altogether.
We can rediscover a previous lesson by analyzing this incident: do not mindlessly add cool stuff. This is a natural thing; many incidents can teach similar lessons. Apart from this, we can also find a second related hint: do not get lost in details at the wrong time and place. I meticulously tweaked the learning rate and refactored much of the underlying code in the example. Unfortunately, since all the labor was for naught, the invested time was spent at the wrong place (a secondary feature) and, in the end, lost. This lesson thus fits the Avoid category.
Throughout the project, we might have done several actions that were not beneficial to the overall progress. However, by putting them all into the Avoid category, we build a powerful resource for our future careers. Even better, in the next step, we compile a list of recommended actions, serving as guidelines.
What should I again do the next time?
For comedian Chris Rock, trialing his jokes in front of a critical audience also has the positive aspect of telling him which one to repeat. He might have thought them to be absolute fireworks for some of his sketches, only to see them go by unnoticed. But, some of his seemingly second-tier jokes make the audience crack; it’s these jokes that he then can etch out.
In our case, a careful analysis of the project allows us to find similar insights. Thus, the Avoid category filled by answering the previous question has a sibling, the Repeat category. This category holds all the experiences and actions that contributed positively to the project’s progress or that can be used as blueprints for future projects. From the two examples given, we can also derive positive instructions.
One is that, after having implemented augmentation techniques on top of a TFRecord pipeline, we can re-use our knowledge and build a similar setup the next time. In other words, we might write down: Use the TFRecord format for data storing and build a pipeline on top. The second instruction is to focus our attention, energy, and time — which are exceptionally critical resources — on the core features, as the learning rate example taught us. Only after having constructed a solid base can we advance to the fine details.
As before, the Repeat category gives us a handy list of ideas that we can follow to improve our performance on a project.
TL;DR
After a project, analyze it and gather all lessons learned. Divide them into the ones to avoid the next time and into those to be repeated.
As a concluding note, I want to emphasize that we can already conduct this proposed question-and-answer scheme during the project, not only after it has finished. By doing so, we can optimize our performance on the go and avoid repeating mistakes, turning them into actionable instructions.