Machine Learning Ops (MLOps) describes a suite of best practices that successfully help a business run artificial intelligence. It consists of the skills, workflows, and processes to create, run, and maintain machine learning models to help various operational processes within organizations.
- Understanding Machine Learning Ops
- The four guiding principles of Machine Learning Ops
- Implementing MLOps into business operations
- MLOps and AIaaS
- The ML Process
- Machine learning ops examples
- Key takeaways
- Connected Agile Frameworks
Understanding Machine Learning Ops
Machine Learning Ops is a relatively new concept because the commercial application of artificial intelligence (AI) is also an emerging process.
Indeed, AI burst onto the scene less than a decade ago after a researcher employed it to win an image-recognition contest.
Since that time, artificial intelligence can be seen in:
- Translating websites into different languages.
- Calculating credit risk for mortgage or loan applications.
- Re-routing of customer service calls to the appropriate department.
- Assisting hospital staff in analyzing X-rays.
- Streamlining supermarket logistic and supply chain operations.
- Automating the generation of text for customer support, SEO, and copywriting.
As AI becomes more ubiquitous, so too must the machine learning that powers it. MLOps was created in response to a need for businesses to follow a developed machine learning framework.
Based on DevOps practices, MLOps seeks to address a fundamental disconnect between carefully crafted code and unpredictable real-world data. This disconnect can lead to issues such as slow or inconsistent deployment, low reproducibility, and a reduction in performance.
The four guiding principles of Machine Learning Ops
As noted, MLOps is not a single technical solution but a suite of best practices, or guiding principles.
Following is a look at each in no particular order:
- Machine learning should be reproducible. That is, data must be able to audit, verify, and reproduce every production model. Version control for code in software development is standard. But in machine learning, data, parameters, and metadata must all be versioned. By storing model training artifacts, the model can also be reproduced if required.
- Machine learning should be collaborative. MLOps advocates that machine learning model production is visible and collaborative. Everything from data extraction to model deployment should be approached by transforming tacit knowledge into code.
- Machine learning should be tested and monitored. Since machine learning is an engineering practice, testing and monitoring should not neglected. Performance in the context of MLOps incorporates predictive importance as well as technical performance. Model adherence standards must be set and expected behaviour made visible. The team should not rely on gut feelings.
- Machine learning should be continuous. It’s important to realize that a machine learning model is temporary and whose lifecycle depends on the use-case and how dynamic the underlying data is. While a fully automated system may diminish over time, machine learning must be seen as a continuous process where retraining is made as easy as possible.
Implementing MLOps into business operations
In a very broad sense, businesses can implement MLOps by following a few steps:
Step 1 – Recognise stakeholders
MLOps projects are often large, complex, multi-disciplinary initiatives that necessitate the contributions of different stakeholders. These include obvious stakeholders such as machine learning engineers, data scientists, and DevOps engineers. However, these projects will also require collaboration and cooperation from IT, management, and data engineers.
Step 2 – Invest in infrastructure
There are a raft of infrastructure products on the market, and not all are born equal.
In deciding with product to adopt, a business should consider:
- Reproducibility – the product must make data science knowledge retention easier. Indeed, ease of reproducibility is governed by data version control and experiment tracking.
- Efficiency – does the product result in time or cost savings? For example, can machine learning remove manual work to increase pipeline capability?
- Integrability – will the product integrate nicely with existing processes or systems?
Step 3 – Automation
Before moving into production, machine learning projects must be split into smaller, more manageable components. These components must be related but able to be developed separately.
The process of separating a problem into various components forces the product team to follow a joined process. This encourages the formation of a well-defined language between engineers and data scientists, who work collaboratively to create a product capable of updating itself automatically. This ability is akin to the DevOps practice of continuous integration (CI).
MLOps and AIaaS
MLOps consists of various phases built on top of an AI platform, where models will need to be prepared (via data labeling, Big Query datasets, Cloud Storage), built, validated, and deployed.
And MLOps is a vast world, made of many moving parts.
Indeed, before the ML code can be operated, as highlighted on Google Cloud, a lot is spent on “configuration, automation, data collection, data verification, testing and debugging, resource management, model analysis, process and metadata management, serving infrastructure, and monitoring.”
The ML Process
ML models follow several steps, an example is: Data extraction > Data analysis > Data preparation > Model training > Model evaluation > Model validation > Model serving > Model monitoring.
Machine learning ops examples
Below are a couple of examples of how machine learning ops are being applied at companies such as Uber and Booking.com.
Uber Michelangelo is the name given to Uber’s machine learning platform that standardizes the workflow across teams and improves coordination.
Before Michelangelo was developed, Uber faced difficulties implementing machine learning models because of the vast size of the company and its operations.
While data scientists were developing predictive models, engineers were also creating bespoke, one-off systems that used these models in production.
Ultimately, the impact of machine learning at Uber was limited to whatever scientists and engineers could build in a short timeframe with predominantly open-source tools.
Michelangelo was conceived to provide a system where reliable, uniform and reproducible pipelines could be built for the creation and management of prediction and training data at scale.
Today, the MLOps platform standardizes workflows and processes via an end-to-end system where users can easily build and operate ML systems.
While Michelangelo manages dozens of models across the company for countless use cases, its application to UberEATS is worth a quick mention.
Here, machine learning was incorporated into meal delivery time predictions, restaurant rankings, search rankings, and search autocomplete.
Calculating meal delivery time is seen as particularly complex and involves many moving parts, with Michelangelo using tree regression models to make end-to-end delivery estimates based on multiple current and historical metrics.
Booking.com is the largest online travel agent website in the world with users able to search for millions of different accommodation options.
Like Uber, Booking.com needed a complex machine learning solution that could be deployed at scale.
To understand the company’s predicament, consider a user searching for accommodation in Paris.
At the time of writing, there are over 4,700 establishments – but it would be unrealistic to expect the user to look at all of them.
So how does Booking.com know which options to show?
At a somewhat basic level, machine learning algorithms list hotels based on inputs such as location, review rating, price, and amenities.
The algorithms also consider available data about the user, such as their propensity to book certain types of accommodation and whether or not the trip is for business or pleasure.
More complex machine learning is used to avoid the platform serving up results that consist of similar hotels.
It would be unwise for Booking.com to list 10 3-star Parisian hotels at the same price point on the first page of the results.
To counter this, machine learning incorporates aspects of behavioral economics such as the exploration-exploitation trade-off.
The algorithm will also collect data on the user as they search for a place to stay.
Perhaps they spend more time looking at family-friendly hotels with a swimming pool, or maybe they are showing a preference for a bed and breakfast near the Eiffel Tower.
An important but sometimes overlooked aspect of the Booking.com website are the accommodation owners and hosts.
This user group has its own set of interests that sometimes conflict with holidaymakers and the company itself.
In the case of the latter, machine learning will play an increasingly important role in Booking.com’s relationship with its vendors and by extension, its long-term viability.
Booking.com today is the culmination of 150 successful customer-centric machine learning applications developed by dozens of teams across the company.
These were exposed to hundreds of millions of users and validated via randomized but controlled trials.
The company concluded that the iterative, hypothesis-driven process that looked to other disciplines for inspiration was key to the initiative’s success.
- Machine Learning Ops encompasses a set of best practices that help organizations successfully incorporate artificial intelligence.
- Machine Learning Ops seeks to address a disconnect between carefully written code and unpredictable real-world data. In so doing, MLOps can improve the efficiency of machine learning release cycles.
- Machine Learning Ops implementation can be complex and as a result, relies on input from many different stakeholders. Investing in the right infrastructure and focusing on automation are also crucial.
Connected Agile Frameworks
Main Free Guides: