mlops

MLOps: Machine Learning Ops And Why It Matters In Business

Machine Learning Ops (MLOps) describes a suite of best practices that successfully help a business run artificial intelligence. It consists of the skills, workflows, and processes to create, run, and maintain machine learning models to help various operational processes within organizations.

Understanding Machine Learning Ops

Machine Learning Ops is a relatively new concept because the commercial application of artificial intelligence (AI) is also an emerging process.

Indeed, AI burst onto the scene less than a decade ago after a researcher employed it to win an image-recognition contest.

Since that time, artificial intelligence can be seen in:

  • Translating websites into different languages.
  • Calculating credit risk for mortgage or loan applications.
  • Re-routing of customer service calls to the appropriate department.
  • Assisting hospital staff in analyzing X-rays.
  • Streamlining supermarket logistic and supply chain operations.
  • Automating the generation of text for customer support, SEO, and copywriting.

As AI becomes more ubiquitous, so too must the machine learning that powers it. MLOps was created in response to a need for businesses to follow a developed machine learning framework. 

Based on DevOps practices, MLOps seeks to address a fundamental disconnect between carefully crafted code and unpredictable real-world data. This disconnect can lead to issues such as slow or inconsistent deployment, low reproducibility, and a reduction in performance.

The four guiding principles of Machine Learning Ops

As noted, MLOps is not a single technical solution but a suite of best practices, or guiding principles.

Following is a look at each in no particular order:

  1. Machine learning should be reproducible. That is, data must be able to audit, verify, and reproduce every production model. Version control for code in software development is standard. But in machine learning, data, parameters, and metadata must all be versioned. By storing model training artifacts, the model can also be reproduced if required.
  2. Machine learning should be collaborative. MLOps advocates that machine learning model production is visible and collaborative. Everything from data extraction to model deployment should be approached by transforming tacit knowledge into code.
  3. Machine learning should be tested and monitored. Since machine learning is an engineering practice, testing and monitoring should not neglected. Performance in the context of MLOps incorporates predictive importance as well as technical performance. Model adherence standards must be set and expected behaviour made visible. The team should not rely on gut feelings.
  4. Machine learning should be continuous. It’s important to realize that a machine learning model is temporary and whose lifecycle depends on the use-case and how dynamic the underlying data is. While a fully automated system may diminish over time, machine learning must be seen as a continuous process where retraining is made as easy as possible.

Implementing MLOps into business operations

In a very broad sense, businesses can implement MLOps by following a few steps:

Step 1 – Recognise stakeholders

MLOps projects are often large, complex, multi-disciplinary initiatives that necessitate the contributions of different stakeholders. These include obvious stakeholders such as machine learning engineers, data scientists, and DevOps engineers. However, these projects will also require collaboration and cooperation from IT, management, and data engineers.

Step 2 – Invest in infrastructure

There are a raft of infrastructure products on the market, and not all are born equal.

In deciding with product to adopt, a business should consider:

  • Reproducibility – the product must make data science knowledge retention easier. Indeed, ease of reproducibility is governed by data version control and experiment tracking.
  • Efficiency – does the product result in time or cost savings? For example, can machine learning remove manual work to increase pipeline capability?
  • Integrability – will the product integrate nicely with existing processes or systems?

Step 3 – Automation

Before moving into production, machine learning projects must be split into smaller, more manageable components. These components must be related but able to be developed separately. 

The process of separating a problem into various components forces the product team to follow a joined process. This encourages the formation of a well-defined language between engineers and data scientists, who work collaboratively to create a product capable of updating itself automatically. This ability is akin to the DevOps practice of continuous integration (CI).

MLOps and AIaaS

aiaas
Artificial Intelligence as a Service (AlaaS) helps organizations incorporate artificial intelligence (AI) functionality without the associated expertise. Usually, AIaaS services are built upon cloud-based providers like Amazon AWS, Google Cloud, Microsoft Azure, and IMB Cloud, used as IaaS. The AI service, framework, and workflows built upon these infrastructures are offered to final customers for various use cases (e.g., inventory management services, manufacturing optimizations, text generation).
AI Platform diagram
Source: cloud.google.com

MLOps consists of various phases built on top of an AI platform, where models will need to be prepared (via data labeling, Big Query datasets, Cloud Storage), built, validated, and deployed.

And MLOps is a vast world, made of many moving parts.

Source: cloud.google.com

Indeed, before the ML code can be operated, as highlighted on Google Cloud, a lot is spent on “configuration, automation, data collection, data verification, testing and debugging, resource management, model analysis, process and metadata management, serving infrastructure, and monitoring.”

The ML Process

ML models follow several steps, an example is: Data extraction > Data analysis > Data preparation > Model training > Model evaluation > Model validation > Model serving > Model monitoring.

Machine learning ops examples

Below are a couple of examples of how machine learning ops are being applied at companies such as Uber and Booking.com.

Uber

uber-business-model
Uber is a two-sided marketplace, a platform business model that connects drivers and riders, with an interface that has elements of gamification, that makes it easy for two sides to connect and transact. Uber makes money by collecting fees from the platform’s gross bookings.

Uber Michelangelo is the name given to Uber’s machine learning platform that standardizes the workflow across teams and improves coordination.

Before Michelangelo was developed, Uber faced difficulties implementing machine learning models because of the vast size of the company and its operations.

While data scientists were developing predictive models, engineers were also creating bespoke, one-off systems that used these models in production.

Ultimately, the impact of machine learning at Uber was limited to whatever scientists and engineers could build in a short timeframe with predominantly open-source tools.

Michelangelo was conceived to provide a system where reliable, uniform and reproducible pipelines could be built for the creation and management of prediction and training data at scale.

Today, the MLOps platform standardizes workflows and processes via an end-to-end system where users can easily build and operate ML systems.

While Michelangelo manages dozens of models across the company for countless use cases, its application to UberEATS is worth a quick mention.

uber-eats-business-model
Uber Eats is a three-sided marketplace connecting a driver, a restaurant owner, and a customer with the Uber Eats platform at the center. The three-sided marketplace moves around three players: Restaurants pay commission on the orders to Uber Eats; Customers pay the small delivery charges, and at times, cancellation fees; Drivers earn through making reliable deliveries on time.

Here, machine learning was incorporated into meal delivery time predictions, restaurant rankings, search rankings, and search autocomplete. 

Calculating meal delivery time is seen as particularly complex and involves many moving parts, with Michelangelo using tree regression models to make end-to-end delivery estimates based on multiple current and historical metrics.

Booking.com

Booking.com is the largest online travel agent website in the world with users able to search for millions of different accommodation options.

Like Uber, Booking.com needed a complex machine learning solution that could be deployed at scale.

To understand the company’s predicament, consider a user searching for accommodation in Paris.

At the time of writing, there are over 4,700 establishments – but it would be unrealistic to expect the user to look at all of them.

So how does Booking.com know which options to show? 

At a somewhat basic level, machine learning algorithms list hotels based on inputs such as location, review rating, price, and amenities.

The algorithms also consider available data about the user, such as their propensity to book certain types of accommodation and whether or not the trip is for business or pleasure.

More complex machine learning is used to avoid the platform serving up results that consist of similar hotels.

It would be unwise for Booking.com to list 10 3-star Parisian hotels at the same price point on the first page of the results.

To counter this, machine learning incorporates aspects of behavioral economics such as the exploration-exploitation trade-off.

The algorithm will also collect data on the user as they search for a place to stay.

Perhaps they spend more time looking at family-friendly hotels with a swimming pool, or maybe they are showing a preference for a bed and breakfast near the Eiffel Tower.

An important but sometimes overlooked aspect of the Booking.com website are the accommodation owners and hosts.

This user group has its own set of interests that sometimes conflict with holidaymakers and the company itself.

In the case of the latter, machine learning will play an increasingly important role in Booking.com’s relationship with its vendors and by extension, its long-term viability.

Booking.com today is the culmination of 150 successful customer-centric machine learning applications developed by dozens of teams across the company.

These were exposed to hundreds of millions of users and validated via randomized but controlled trials.

The company concluded that the iterative, hypothesis-driven process that looked to other disciplines for inspiration was key to the initiative’s success.

Key takeaways

  • Machine Learning Ops encompasses a set of best practices that help organizations successfully incorporate artificial intelligence.
  • Machine Learning Ops seeks to address a disconnect between carefully written code and unpredictable real-world data. In so doing, MLOps can improve the efficiency of machine learning release cycles.
  • Machine Learning Ops implementation can be complex and as a result, relies on input from many different stakeholders. Investing in the right infrastructure and focusing on automation are also crucial. 

Connected Agile Frameworks

AIOps

aiops
AIOps is the application of artificial intelligence to IT operations. It has become particularly useful for modern IT management in hybridized, distributed, and dynamic environments. AIOps has become a key operational component of modern digital-based organizations, built around software and algorithms.

Agile Methodology

agile-methodology
Agile started as a lightweight development method compared to heavyweight software development, which is the core paradigm of the previous decades of software development. By 2001 the Manifesto for Agile Software Development was born as a set of principles that defined the new paradigm for software development as a continuous iteration. This would also influence the way of doing business.

Agile Project Management

agile-project-management
Agile project management (APM) is a strategy that breaks large projects into smaller, more manageable tasks. In the APM methodology, each project is completed in small sections – often referred to as iterations. Each iteration is completed according to its project life cycle, beginning with the initial design and progressing to testing and then quality assurance.

Agile Modeling

agile-modeling
Agile Modeling (AM) is a methodology for modeling and documenting software-based systems. Agile Modeling is critical to the rapid and continuous delivery of software. It is a collection of values, principles, and practices that guide effective, lightweight software modeling.

Agile Business Analysis

agile-business-analysis
Agile Business Analysis (AgileBA) is certification in the form of guidance and training for business analysts seeking to work in agile environments. To support this shift, AgileBA also helps the business analyst relate Agile projects to a wider organizational mission or strategy. To ensure that analysts have the necessary skills and expertise, AgileBA certification was developed.

Business Model Innovation

business-model-innovation
Business model innovation is about increasing the success of an organization with existing products and technologies by crafting a compelling value proposition able to propel a new business model to scale up customers and create a lasting competitive advantage. And it all starts by mastering the key customers.

Continuous Innovation

continuous-innovation
That is a process that requires a continuous feedback loop to develop a valuable product and build a viable business model. Continuous innovation is a mindset where products and services are designed and delivered to tune them around the customers’ problem and not the technical solution of its founders.

Design Sprint

design-sprint
A design sprint is a proven five-day process where critical business questions are answered through speedy design and prototyping, focusing on the end-user. A design sprint starts with a weekly challenge that should finish with a prototype, test at the end, and therefore a lesson learned to be iterated.

Design Thinking

design-thinking
Tim Brown, Executive Chair of IDEO, defined design thinking as “a human-centered approach to innovation that draws from the designer’s toolkit to integrate the needs of people, the possibilities of technology, and the requirements for business success.” Therefore, desirability, feasibility, and viability are balanced to solve critical problems.

DevOps

devops-engineering
DevOps refers to a series of practices performed to perform automated software development processes. It is a conjugation of the term “development” and “operations” to emphasize how functions integrate across IT teams. DevOps strategies promote seamless building, testing, and deployment of products. It aims to bridge a gap between development and operations teams to streamline the development altogether.

Dual Track Agile

dual-track-agile
Product discovery is a critical part of agile methodologies, as its aim is to ensure that products customers love are built. Product discovery involves learning through a raft of methods, including design thinking, lean start-up, and A/B testing to name a few. Dual Track Agile is an agile methodology containing two separate tracks: the “discovery” track and the “delivery” track.

Feature-Driven Development

feature-driven-development
Feature-Driven Development is a pragmatic software process that is client and architecture-centric. Feature-Driven Development (FDD) is an agile software development model that organizes workflow according to which features need to be developed next.

eXtreme Programming

extreme-programming
eXtreme Programming was developed in the late 1990s by Ken Beck, Ron Jeffries, and Ward Cunningham. During this time, the trio was working on the Chrysler Comprehensive Compensation System (C3) to help manage the company payroll system. eXtreme Programming (XP) is a software development methodology. It is designed to improve software quality and the ability of software to adapt to changing customer needs.

Lean vs. Agile

lean-methodology-vs-agile
The Agile methodology has been primarily thought of for software development (and other business disciplines have also adopted it). Lean thinking is a process improvement technique where teams prioritize the value streams to improve it continuously. Both methodologies look at the customer as the key driver to improvement and waste reduction. Both methodologies look at improvement as something continuous.

Lean Startup

startup-company
A startup company is a high-tech business that tries to build a scalable business model in tech-driven industries. A startup company usually follows a lean methodology, where continuous innovation, driven by built-in viral loops is the rule. Thus, driving growth and building network effects as a consequence of this strategy.

Kanban

kanban
Kanban is a lean manufacturing framework first developed by Toyota in the late 1940s. The Kanban framework is a means of visualizing work as it moves through identifying potential bottlenecks. It does that through a process called just-in-time (JIT) manufacturing to optimize engineering processes, speed up manufacturing products, and improve the go-to-market strategy.

Rapid Application Development

rapid-application-development
RAD was first introduced by author and consultant James Martin in 1991. Martin recognized and then took advantage of the endless malleability of software in designing development models. Rapid Application Development (RAD) is a methodology focusing on delivering rapidly through continuous feedback and frequent iterations.

Scaled Agile

scaled-agile-lean-development
Scaled Agile Lean Development (ScALeD) helps businesses discover a balanced approach to agile transition and scaling questions. The ScALed approach helps businesses successfully respond to change. Inspired by a combination of lean and agile values, ScALed is practitioner-based and can be completed through various agile frameworks and practices.

Spotify Model

spotify-model
The Spotify Model is an autonomous approach to scaling agile, focusing on culture communication, accountability, and quality. The Spotify model was first recognized in 2012 after Henrik Kniberg, and Anders Ivarsson released a white paper detailing how streaming company Spotify approached agility. Therefore, the Spotify model represents an evolution of agile.

Test-Driven Development

test-driven-development
As the name suggests, TDD is a test-driven technique for delivering high-quality software rapidly and sustainably. It is an iterative approach based on the idea that a failing test should be written before any code for a feature or function is written. Test-Driven Development (TDD) is an approach to software development that relies on very short development cycles.

Timeboxing

timeboxing
Timeboxing is a simple yet powerful time-management technique for improving productivity. Timeboxing describes the process of proactively scheduling a block of time to spend on a task in the future. It was first described by author James Martin in a book about agile software development.

Scrum

what-is-scrum
Scrum is a methodology co-created by Ken Schwaber and Jeff Sutherland for effective team collaboration on complex products. Scrum was primarily thought for software development projects to deliver new software capability every 2-4 weeks. It is a sub-group of agile also used in project management to improve startups’ productivity.

Scrum Anti-Patterns

scrum-anti-patterns
Scrum anti-patterns describe any attractive, easy-to-implement solution that ultimately makes a problem worse. Therefore, these are the practice not to follow to prevent issues from emerging. Some classic examples of scrum anti-patterns comprise absent product owners, pre-assigned tickets (making individuals work in isolation), and discounting retrospectives (where review meetings are not useful to really make improvements).

Scrum At Scale

scrum-at-scale
Scrum at Scale (Scrum@Scale) is a framework that Scrum teams use to address complex problems and deliver high-value products. Scrum at Scale was created through a joint venture between the Scrum Alliance and Scrum Inc. The joint venture was overseen by Jeff Sutherland, a co-creator of Scrum and one of the principal authors of the Agile Manifesto.

Read Also: Business Models Guide, Sumo Logic Business Model, Snowflake

InnovationAgile MethodologyLean StartupBusiness Model InnovationProject Management.

Read: AI Industry, Blockchain Economics, Cloud Business Models, C3.ai Business Model, Snowflake Business Model.

Main Free Guides:

Scroll to Top
FourWeekMBA