From Creator to Caretaker with MLOps - Part 1

The recent meteoric rise of generative AI models doesn’t leave much room for argument: the AI winters are behind us, and machine learning models will continue to permeate every aspect of our lives. And AE is here for it!

At the same time, the speed at which these models are evolving can be daunting, and it’s hard not to get flustered by the pace of the changes we are confronted with. If we want a chance of keeping the AI rocket on course, we need rocket fuel. That's where MLOps comes in, a cutting-edge practice that combines the best of Machine Learning with the best of DevOps.

Now, you might be wondering, what exactly are MLOps and DevOps? No worries, let's ease into it and break it down.

Setting the Scene

Imagine yourself as a mad (data) scientist, content to craft your model creations with the passionate precision of a veritable Victor Frankenstein. Some model architecture tinkering here, some hyperparameter tuning there, et voila, a new specimen is revealed. And every once in a while, your creation passes a final test (set) it lives! A proper, viable model, neat!

mad data scientist mlops
DALL-E 2’s version of ‘A mad scientist in a tower, surrounded by computers, in orange watercolor’
(you know we had to).

Imagine that, as news of your outlandish endeavors travels, you begin to receive some attention from industrialists. These creations of yours... wouldn't it just be swell to implement them in, say, a factory? Your creations, tireless and robust, would be ideal assistants for their working force, taking productivity and efficiency to a whole new level. What a dream! With all these advantages in mind, these industry folk would well be willing to spend a few quid on such an acquisition.

Now, as a mad data scientist, you probably aren't all that used to thinking about these kinds of practicalities. Especially if you're of the academic, Ivory Tower variety. No, just giving birth to all these model creations was a goal in itself. But the new challenge does sound intriguing! So you begin thinking about how to navigate it. And the more you ponder, the more potential problems you encounter.

  • The industry will want your best creations. Unfortunately, being a scientist, you often lose yourself in experimentation. You tend to lose track of what you tried before, and more importantly, what worked. If you want any hope of delivering something performant, you better start taking notes. Imagine concocting the perfect model, but forgetting how you got to that point. Nope, you definitely don't want to be that mad data scientist.

  • As they will now be expected to maintain their performance (and not just perform well on that one little self-gratifying test moment), the model creations will need food. You can't expect them to infinitely trod on with no data calories to burn. And you can't just feed them anything either; you’re dealing with picky eaters. A challenge in its own right!
  • If these industry folk want help with their heavy lifting, they probably want more than one measly model. You will need to scale up your creation process, and quickly! Time to think of ways to make everything run a little more smoothly. You do have some ideas about that already though, something with slides? Pipelines? Curly straws? You'll figure it out.

  • What if Andrew Ng help us one of your creations goes rogue? What if its behavior suddenly becomes erratic? Maybe because of bad food, maybe because the job description changed, there are so many ways to drift! But in the end, you're responsible, so you should be able to spot when this happens (preferably before anything ugly happens). You certainly don't want to ruin this good deal with your new patrons.

Aah, metaphors the poor man's literary device. Nevertheless, you caught most of the principles and ideas that were being referred to, right? No? Sigh, let's go again.

Productionizing Machine Learning Models

Surely you wouldn’t be the first data scientist who, up to a recent point, did not have to worry about putting his models into production. For many data scientists and ML researchers, finding a new model or set of weights is the end goal. Perhaps the model can then be handed off to others (e.g., ML or software engineers), or perhaps it will be documented in a scientific publication. Either way, the job is done. Once these data scientists transition into an industrial setting, however, the game changes or rather, the game should change, as there exists a variety of obstacles 'in the wild' that can weigh the model down in short order.

Challenges for production

So what could such obstacles be, exactly? A quick Google search can quickly point out some historical failures, and what we can learn from them. Here are some interesting ones.

  • Remember Covid-19? Apart from turning our lives upside down, the pandemic and accompanying lockdowns also managed to dramatically disrupt the performance of machine learning algorithms. One example: as people were stuck at home, their spending behavior changed (1000 rolls of toilet paper, anyone?), which tripped up fraud detection models. This concept drift that is, the change in the relationship between the data (purchases) and the outcome (fraud check) effectively rendered the models largely impotent, calling for immediate retraining using fresh, representative data.

  • ChatGPT wasn't the first language model to hit the mainstream internet. Does the name Tay ring a bell? This AI-driven chatbot, which was released in the Twitter ecosystem under the name TayTweets in 2016, survived the real world for about 16 hours. That's how long it took for Tay to turn into a racist, hate-spewing, sexually inappropriate creature. The reason? Since the bot was designed to mimic (or learn from) the behavior of other Twitter users, many of them saw it as a challenge to corrupt the bot with deliberately offensive prompts with success, and in record time. Moreover, the design team did not account for such adversarial attacks and therefore did not include any mechanisms to cope with them.

tay tweets machine learning chatbot
Poor Tay, you were taken too soon.

  • Speaking of bad publicity: a computer vision algorithm designed to verify passport photo requirements was found to systematically reject pictures of Asian people for having 'closed eyes'. Not a good look for the model developers, and definitely something you would want to catch early on. Diversity is important in training and testing, as it helps avoid this kind of bias.

Looking for answers

What do these scenarios have in common? They illustrate how Machine Learning models can fail, and how continuous model monitoring is important. Keeping a faulty model in production can harm a business in many ways. Once a problem has been detected, determining its root cause remains challenging. It requires a thorough understanding of what goes into the creation and deployment of a model. In addition, it implies that we should have version control over all its components, so we can spot where the problem was introduced, and which steps we can take to mitigate it.

Once identified, the issue should then be resolved swiftly. Bugs should be taken care of, faulty data should be replaced, infrastructure issues should be fixed, and all of this in a hurry. This implies the existence of a quasi-automated flow from data ingestion over model training to model deployment which in itself must be rendered safe and robust using CI/CD principles.

Clearly, managing Machine Learning models in production can be a complex task that involves many considerations. Ensuring that the models are accurate, reliable, and up-to-date requires careful planning and execution. Fortunately, ML engineering can learn from the well-established principles of software engineering to address these challenges.

Many of these principles are caught under the DevOps umbrella, which is a set of practices that seeks to unify software development and operations teams to build, test, and deploy software more efficiently and effectively. In the context of ML engineering, DevOps can be applied to the entire Machine Learning lifecycle, from data collection and preparation to model training and deployment. The resulting philosophy, fine-tuned to the world of Machine Learning, was christened MLOps short for 'Machine Learning Operations'.

One of the goals of DevOps is to promote collaboration and communication between different teams and stakeholders. An MLOps team roughly consists of at least:

  • 👷‍♂️ a data engineer, who will create and maintain the data pipelines, ensuring that the data arrives at its destination properly formatted and cleaned,

  • 🧑🏻‍🔬 a data scientist/ML engineer, responsible for the selection, training, optimization, and shipping of the ML models,

  • 👩🏽‍💻 and a software/DevOps engineer, responsible for managing the deployment and operation of the Machine Learning models in a production environment.

Depending on the size of the project, responsibilities may be distributed in other, more granular ways (e.g., by adding an infrastructure engineer, business analyst, or panda-themed group mascotte). Still, regardless of how the load is shared, MLOps strives to break down silos. It encourages cross-domain collaboration, with faster and more reliable deployments of ML models as a result.

Up to this point, we’ve been painting our MLOps picture in broad strokes. But how do these concepts translate into practice? In the next installments of this series,
we will delve deeper into some of the core principles of MLOps.

  • Reproducibility

  • Automation

  • CI/CD pipelines

  • Monitoring & continuous training

Finally, we will take a look at how AE tackles these subjects, and go over our tools and frameworks of choice. So strap in!

mlops machine learning panda rocket ride
DALL-E 2’s take on ‘a scared, screaming panda tied to a rocket, in orange watercolor.’
Who said beating a dead horse couldn’t be fun?