Sundeep Teki, PhD

AI Consultant (16 years@ Amazon Alexa AI, Oxford, UCL)

Developing AI/ML Projects for Business - Best Practices

Published May 12, 2021

Only 10% of AI/ML projects have created positive financial impact according to a recent surveyof 3,000 executives.

Given these odds, it seems that building a profit generating ML project requires a lot of work across the entire organization, from planning to production.

In this article, I’ll share best practices for businesses to ensure that their investments in Machine Learning and Artificial Intelligence are actually profitable, and create significant value for the entire organization.

Best practices for identifying AI use cases

Most AI projects fail at the very first hurdle – poor understanding of the business problems that can be solved with AI. This is the main bottleneck in successful deployment of AI.

This problem is compounded by the early stages of organizational intuition for AI, and for how it can be leveraged to solve critical business problems [2].

What does this mean? Well, not every problem can be feasibly solved with AI. To understand if your particular problem can, you need tried and tested practices and approaches.

If you’re looking for best practices on building AI teams that deliver successful outcomes, see my previous article [3] How to Build Machine Learning Teams That Deliver.

AI use cases

AI has transformed industries. It automates routine and manual processes, and provides crucial predictive insights to almost all business functions. Table 1 shows a list of some of the business use cases that have been successfully addressed using AI.

INPUT	OUTPUT	APPLICATION	INDUSTRY
Voice recording	Text transcript	Speech recognition	Customer service
Faces	Identity	Face recognition	Security
Transactions	Fraud transactions	Fraud detection	Banking
Purchase history	Recommendations	Recommender	Ecommerce
Clinical symptoms	Diagnosis	Diagnosis prediction	Healthcare
Customer reviews	Customer sentiment	Sentiment analysis	Service industry
News articles	Summary of news	Text summarization	Press & Media

Table 1. Business use cases solved by AI

Brainstorming appropriate business problems should ideally be done together with business leaders, product managers, and any available subject matter experts. The list of business problems sourced across the organization should then be vetted, and analyzed for potential solutions using AI.

Not every business problem should be solved with AI. Oftentimes, a rule-based or engineered solution is good enough. Additionally, a lot of business problems can be mined from customer reviews or feedback, which typically points to broken business processes that need to be fixed.

In table 2, you can see a checklist of questions, both technical and commercial, to determine whether a business problem is relevant for AI.

DATA
Kind of data
Size of data
Domain
Frequency of data
Annotations
Data flow
Privacy
AI RELEVANCE
Market validation
Research validation
Choice of models
Model deployment
Inference type
Model performance
Production metrics
BUSINESS
User research
Product / feature
Business metrics
Timeline
Roadmap
Budget
Team
Bandwidth

Table 2. A checklist of data, model and business questions to validate a business problem with a potential AI solution.

KPIs and Metrics

As part of the planning process, the appropriate model and business metric for each potential use case should be discussed. Work backwards from the expected outcome, and it’ll be easier to crystallize which particular metric to optimize.

To illustrate this, in table 3 I prepared a list of AI use cases and corresponding model and business metrics. For the success of an AI project, it’s ultimately important to ensure the business metric and goals are achieved.

INPUT	OUTPUT	TECHNICAL METRIC	BUSINESS METRIC
Voice recording	Text transcript	Word Error Rate	CSAT, AHT, NPS
Faces	Identity	Recognition Rate	Domain-dependent e.g. ID of criminals or stalkers etc.
Transactions	Fraud transactions	F1, Precision, Recall	Revenue loss, Fraud to Sales Ratio etc.
Purchase history	Recommendations	Mean Average Precision at K	Uplift in Average Revenue Per User, or Number of items added to cart etc.
Clinical symptoms	Diagnosis	F1, Precision, Recall	Savings in doctors’ time and number of appointments etc.
Customer reviews	Customer sentiment	F1, Precision, Recall	NPS
News articles	Summary of news	Rouge Score	CTR, Views

Table 3. Example AI use cases and their technical and business metrics.

Prioritization

We have a set of business problems. They’ve been reviewed and documented after careful consideration of the criteria listed in Table 2, and analysis of appropriate business metrics as in Table 3. The candidate list of use cases needs to be prioritized, or ranked, in terms of impact and relevance to the overarching business strategy and goals.

From a detailed written document describing comprehensive facets of the business use case and potential AI-based solutions, it’s useful to have objective criteria to quantify all the proposed use cases on the same scale. Here, it’s crucial for product managers and business leaders to have their own intuition about how AI works in practice, or rely on the judgment of a product-focused technical or domain expert. Whilst it’s easy to rank projects on certain success criteria, it’s not so straightforward to rate the risk associated with AI projects.

A balanced metric ought to consider and weigh the likelihood and impact of a successful outcome of the AI projects versus the risk of it failing or not generating enough impact. Risks to the project might be related to organizational aspects, domain-specific aspects of the AI problem, or related to external factors beyond the remit of the business. Once a suitable balanced metric is defined, it aligns all stakeholders and leadership, who are then able to form their own subjective views based on the objective scores.

A lot of factors need to be considered before a ‘yes’ or ‘no’ decision is made for a particular AI project, as well as the number of AI-relevant projects selected for a defined period. Securing buy-in from the leadership is difficult. Certain final executive decisions might appear subjective or not data-driven, but it’s still absolutely critical to go through the aforementioned planning process to present each AI project in the best light possible, and maximize the likelihood of the AI project being selected for execution.

Best practices for planning AI use cases

As part of the planning process with cross-functional teams, it’s important for organizations to have a streamlined mechanism for defining the AI product vision or roadmap, the bandwidth, specific roles and responsibilities of individual contributors and managers in each team, as well as the technical aspects (data pipelines, modeling stack, infrastructure for production and maintenance).

In this section, I’ll describe the details of specific planning steps essential to build a successful AI product.

AI product requirements

For each identified use case, it’s necessary to draw the roadmap for how the product will evolve from its baseline version to a more mature product over time. In Table 4, I outline a set of essential questions and criteria to fulfil for creating a comprehensive AI roadmap for each use case.

AI ROADMAP
PR-FAQ / User stories
PRD
Customer surveys
Milestones
Risk factors
Business metrics
Technical metrics
Release criteria

Table 4. A list of factors to address for the AI product roadmap.

PR-FAQ (Press Release – Frequently Asked Questions) and PRD (Product Requirements Document) are two critical documents that are generally prepared during the initial stages of product ideation and conception. Pioneered by Amazon, these two documents serve as the north star for all concerned teams to align themselves with and build and scale the product accordingly. It’s absolutely essential that all stakeholder teams contribute meaningfully to these documents and share their specific domain expertise to craft a meticulous document for executive review.

It’s necessary for all stakeholder team managers to review and contribute to the document, so that any team- or domain-specific intrinsic biases of product development are laid bare and addressed accordingly. Typically, teams should rely on data-driven intuition for product development. In the absence of in-house data, intuition for the AI product can be borrowed from work done by other companies or research in the same field [2, 4].

Data requirements

As the roadmap is defined and finalized after stakeholder meetings, it’s always beneficial to have an MVP or a basic prototype of the AI product ready to validate initial assumptions and present to the leadership. This exercise also helps to streamline the data and engineering pipelines necessary to acquire, clean and process the data and train the model to obtain the MVP.

The MVP should not be a highly sophisticated model. It should be basic enough to successfully transform the input data to a model prediction, and trained on a minimal set of training data. If the MVP is hosted as an API, each of the cross-functional stakeholder teams can explore the product and build intuition for how the AI product might be better developed for the end customer.

From a data perspective, the machine learning team can dive deeper into the minimal training data, and do a careful analysis of the data as listed in Table 5.

DATA CHECKS
Features
Distribution of data
Outliers, missing and null values
Feature selection / engineering
Data labels
Data augmentation
Data splits
Data versioning
Data format
Data storage and access
Data pipeline

Table 5. A list of data quality and feasibility checks for the AI MVP.

Model requirements

After systematic assessment of the data quality, features, statistics, labels and other checks as listed in Table 5, the Machine Learning team can start building the prototype / MVP model. The best approach at the early stages of product development is to act with speed rather than accuracy. The initial (baseline) model should be simple enough to demonstrate that the model works, the data and modeling pipelines are bug-free, and the model metrics indicate that the model performs significantly better than chance.

Machine learning use cases and products have become increasingly complex over the years. Whilst linear regression and binary or multi-class classification models were once too common, there are newer classes of models that are faster to train, and generalize better on real-world test data. For the ML scientist or engineer, no two use cases may be built using an identical tech stack of tools and libraries. Depending on the characteristics of the data relevant for the AI use case (see Table 2), the data science team must define the modeling stack specific to each use case (see Table 6 below).

AI MODEL CHECKS
ML problem statement
Classical ML models
Deep learning models
Ensemble models
Hyperparameter optimization
Model versioning and formats
Experimental metadata
Error analysis
Modeling pipeline
Acceptance criteria
Deployment pipeline
Model monitoring

Table 6. A list of AI model and feasibility checks for the AI MVP.

Best practices for executing AI use cases

After identifying and planning for promising AI use cases, the next step is to actually execute the projects. It might seem that execution is a straightforward process, where the machine learning team gets to weave their magic. But, simply ‘building models’ is not enough for successful deployment. Model building has to be done in a collaborative and iterative fashion:

involving feedback from users of the product as well as cross-functional teams,
incorporating any new or revised feature requests from product teams,
updating initial hypotheses for the use case based on any changes in the business or operating environment,
only then launching the product to users.

Shipping the model to production is a major milestone to celebrate, document and share within the organization – but the work doesn’t stop there. It’s crucial to monitor how the model performs on real world data from customers, and periodically apply fixes or update the model so that it doesn’t get stale along with changes in:

distribution of data,
nature of use cases,
customer behaviour,
etc..

In the next section, I will discuss the best practices for the operational aspects of executing and deploying AI models successfully and realizing the proposed commercial value.

Reviews and feedback

Once the AI project has kickstarted, it’s essential for the machine learning team to have both periodic as well as ad-hoc review meetings with stakeholders, including product teams and business leadership. The documents prepared during the planning phase (PR-FAQ and PRD) serve as the context in which any updates or changes should be addressed.

The goal of regular meetings is to assess the state of progress vis-a-vis the product roadmap, and address any changes in:

product or business strategy,
organizational structure,
resources allocated to the project.

While planning is important, most corporate projects don’t go as initially planned. It’s important to be nimble and agile, respond to any new information (regarding technical, product or business aspects), and re-align towards a common path forward. For example, the 2020 lockdowns severely impacted the economy. In light of such high-impact unexpected events, it’s critical to adapt and change strategy for AI use cases as well.

In addition to regular internal feedback, it’s good to keep in touch with the end users of the product throughout the AI lifecycle. In the initial stages (user research, definition of target user personas and their demographics), and especially in product design and interaction with the model predictions. A core group of users from the target segment should be maintained to obtain regular feedback across all stages of product development.

Once an MVP is ready, users can be very helpful in providing early feedback that can often bring to light several insights and uncover any biases or shortcomings. When the AI model is ready to be shipped and different model versions are to be evaluated, user feedback can again be very insightful. User insights about the design, ease of use, perceived speed and overall user flow can help the product team to refine the product strategy as needed.

Building iteratively

From the technical perspective, the model building process is usually an iterative one. After establishing a robust baseline, the team gets insight into how far the model performance is from the established acceptance criteria. In the early stages of model building, the focus should primarily be on accuracy rather than latency.

At each stage of model development, a comprehensive analysis of model errors on the validation set can reveal important insights into the model shortcomings, and how to address them. The errors should also be reviewed in conjunction with subject matter experts, to evaluate any errors in data annotation as well as any specific patterns in the errors.

If the model is prone to a particular kind of error, it might need additional features. Or it might need to be changed to a model based on a different objective function, or underlying principle, to overcome these errors. This repetitive process helps the machine learning team to consolidate their intuition about the use case, think outside the box, and propose new creative ideas or algorithms to achieve the desired metrics.

During the course of model building, machine learning practitioners should systematically document every experiment and the corresponding results. A structured approach is helpful not only for the particular use case, but also helps build organizational knowledge that can be helpful to onboard new hires, or serve as shining examples of successful AI deployment.

Deployment and maintenance

Once the candidate machine learning model is ready and benchmarked thoroughly on the validation and test sets, errors analyzed, and the acceptance criteria met, the model may be taken to production. There’s a huge difference between the model training and deployment environments. The format in which the model is trained may not be compatible with taking the model to production, and need to be appropriately serialized and converted to the right format.

In an environment that simulates the production settings, model accuracy and latency should be validated again on the hold-out dataset. Deployment should be done incrementally by surfacing the model to a small portion of real-world traffic or input to the model, ideally to be tested first by internal or core user groups.

Once the deployment pipeline has been rigorously tested and vetted by the MLOps team, more traffic can be directed to the model. In scenarios where one or more candidate models are available, A/B testing of these models should be done systematically, and evaluated for statistically significant differences to determine the winning model.

Post-deployment, it’s important to ensure that all the input-output pairs are collected and archived appropriately within the data ecosystem. The launched model should be periodically assessed and the distribution of the real-world data compared with the distribution of the training data to assess for data and model drifts. In such cases, an active learning pipeline that feeds some of the real-world test samples back into the original training dataset helps to alleviate the shortcomings of the deployed model.

Finally, once the model production environment and all pipelines are stable, the machine learning and product teams should evaluate the business metrics and KPIs to assess whether the metrics meet the predefined success criteria or not. In case it does, then only can the use case be deemed to be a success and a summary of the overall use case and results should be documented and shared internally with every stakeholder and the business leadership.

Wrapping up

If machine learning, product and business teams in startups and enterprises adopt a systematic approach and follow the best practices as laid out in this article, then the likelihood of successful AI outcomes can only increase.

Adequate upfront preparation is crucial. Without it, teams won’t be able to rectify any errors or respond to changes, nor realize the massive commercial potential that AI can deliver.

References

READ NEXT

MLOps at GreenSteam: Shipping Machine Learning [Case Study]

7 mins read | Tymoteusz Wołodźko | Posted March 31, 2021

GreenSteam is a company that provides software solutions for the marine industry that help reduce fuel usage. Excess fuel usage is both costly and bad for the environment, and vessel operators are obliged to get more green by the International Marine Organization and reduce the CO2 emissions by 50 percent by 2050.

Greensteam logo

Even though we are not a big company (50 people including business, devs, domain experts, researchers, and data scientists), we have already built several machine learning products over the last 13 years that help some major shipping companies make informed performance optimization decisions.

MLOps shipping

In this blog post, I want to share our journey to building the MLOps stack. Specifically, how we:

dealt with code dependencies
approached testing ML models
built automated training and evaluation pipelines
deployed and served our models
managed to keep human-in-the-loop in MLOps

Continue reading ->

AI Startups Strategy Machine learning

Report

Enjoy this post? Give Sundeep Teki, PhD a like if it's helpful.

Sundeep Teki, PhD

AI Consultant (16 years@ Amazon Alexa AI, Oxford, UCL)

Hi! I'm Sundeep, a Leader in Generative AI with 16+ years of diverse international experience in USA, UK, France and India across leading organisations at Big Tech (Amazon Alexa AI), unicorn B2C startup (Swiggy), early-stage B2B S...

Discover and read more posts from Sundeep Teki, PhD

get started