The anti-pattern of ad-hoc training

May 27, 2022

In my last article, I wrote about how the use of Jupyter notebooks harms MLOps. Because everyone seems to adore notebooks, I expected most people would strongly disagree. In reality, I found that my thoughts resonated with lots of people. Because the topic is important, I decided to elaborate on it, and in this post talk about the ad-hoc training phenomenon in general.

Before we start, let’s fix the terminology. What is ad-hoc training anyway?

A classical example of ad-hoc training is when you experiment with your model without explicitly tracking code and data. It’s not always done with notebooks but also with Python scripts.

The major downside of ad-hoc training is that it makes it difficult to iterate on such models. It’s hard to fully see the difference between experiments, or go back to the baseline when it’s needed. Not to say of course about the “bus factor” (e.g. if the author of the trained model leaves the team). Despite the downsides, ad-hoc tracking is surprisingly popular. And, there is more than one reason why.

The first reason is of course tooling. There is no standard on how to track data. Also, committing changes to Git before every run is often an overkill.

The second reason is that experimentation is typically preceded by exploration. Before we can come up with a specific idea on how to prepare data or iterate on a trained model, we want to explore the data and play with the model. Exploration always requires a short feedback loop. This is why people often use notebooks and SSH in the first place.

As opposed to ad-hoc training, continuous training is when you explicitly track both code and data for every experiment. While this approach guarantees reproducibility, it’s much harder to follow – due to the very same reasons why the ad-hoc approach is predominating: a lack of tooling and “interactivity addiction”.

A screenshot of the Trebuchet tool. Source: Continuous integration by Martin Fawler, 2000

Why ad-hoc training is an anti-pattern? The problem with it is that starting the moment when you’re about to deploy your model to production, reproducibility is not a choice but a harsh reality. The bad news here is that if you are not involved in the process of deployment of your model, you don’t think it’s your problem.

Because ad-hoc training is easier and also because reproducibility doesn’t affect you directly when you train your model, you’re much less incentivized to care about it, until the very last moment. This is what makes it a perfect anti-pattern. The approach that seems to be the easiest and the most promising turns eventually into a trap.

Is there a way out? I would argue that the way out is in building better tools to track both code and data, and of course in avoiding anti-patterns from early on – regardless of how attractive they look in the beginning.

Did you like the article? Subscribe to MLOps Fluff, and I promise to post more about developer tools, AI, and particularly how to apply it to MLOps.

LLM Fluff

Discussion about this post