Machine Learning: Why Scaling Matters

One of the major technological advances in the last decade is the progress in research of machine learning algorithms and the rise in their applications. We frequently hear about machine learning algorithms doing real-world tasks with human-like (or in some cases even better) efficiency.

While you might already be familiar with how various machine learning algorithms function and how to implement them using libraries & frameworks like PyTorch, TensorFlow, and Keras, doing so at scale is a more tricky game.

This two-part series answers why scalability is such an important aspect of real-world machine learning and sheds light on the architectures, best practices, and some optimizations that are useful when doing machine learning at scale. In this first post, we'll talk about scalability, its importance, and the machine learning process.

In part 2, we'll go more in-depth about the common issues that you may face, such as picking the right framework/language, data collection, model training, different types of architecture, and other optimization methods.

Basic familiarity with machine learning, i.e., understanding of the terms and concepts like neural network (NN), Convolutional Neural Network (CNN), and ImageNet is assumed while writing this post.

The Need for Scalability

Whenever we see applications of machine learning — like automatic translation, image colorization, playing games like Chess, Go, and even DOTA-2, or generating real-like faces — such tasks require model training on massive amounts of data (more than hundreds of GB), and very high processing power (on specialized hardware-accelerated chips like GPUs and ASICs).

We can't simply feed the ImageNet dataset to the CNN model we trained on our laptop to recognize handwritten MNIST digits and expect it to give decent accuracy a few hours of training.

Machine learning has existed for years, but the rate at which developments in machine learning and associated fields are happening, scalability is becoming a prominent topic of focus.

Jump to the next sections: Why Scalability Matters | The Machine Learning Process | Scaling Challenges

The spread of the internet

The internet has been reaching the masses, network speeds are rising exponentially, and the data footprint of an average "internet citizen" is rising too, which means more data for the algorithms to learn from. Products related to the internet of things is ready to gain mass adoption, eventually providing more data for us to leverage.

Rise of hardware

Due to better fabricating techniques and advances in technology, storage is getting cheaper day by day. Moore's law continued to hold for several years, although it has been slowing now. The efficiency and performance of the processors have grown at a good rate enabling us to do computation intensive task at low cost.

Rise of DevOps

The last decade has not only been about the rise of machine learning algorithms, but also the rise of containerization, orchestration frameworks, and all other things that make organization of a distributed set of machines easy.

Why Scalability Matters

Scalability matters in machine learning because:

Training a model can take a long time.

Computer guy waiting

A model can be so big that it can't fit into the working memory of the training device.
Even if we decide to buy a big machine with lots of memory and processing power, it is going to be somehow more expensive than using a lot of smaller machines. In other words, vertical scaling is expensive.

Why you should care about scaling

Scalability is about handling huge amounts of data and performing a lot of computations in a cost-effective and time-saving way. Here are the inherent benefits of caring about scale:

Productivity: A lot of machine learning these days happens in the form of experiments, such as solving a novel problem with a novel architecture (algorithm). A pipeline with fast executions of every stage (training, evaluation, and deployments) will enable us to try more things and be more creative.
Modularity, portability, and composition: It'd be beneficial if the results of the training and the trained model can be leveraged by other teams.
Cost reduction: It never hurts to optimize for= costs. Scaling helps utilize available resources to a maximum, and makes a trade-off between marginal cost and accuracy.
Minimizing human involvement: The pipeline should be as automated as possible so that humans can step out and enjoy coffee while machines do their tasks.

For instance, 25% of engineers at Facebook work on training models, training 600k models per month. Their online prediction service makes 6M predictions per second. Baidu's Deep Search model training involves computing power of 250 TFLOP/s on a cluster of 128 GPUs. So we can imagine how important is it for such companies to scale efficiently and why scalability in machine learning matters these days.

Let's try to explore what are the areas that we should focus on to make our machine learning pipeline scalable. First, let's go over the typical process.

The Machine Learning Process

To better understand the opportunities to scale, let's quickly go through the general steps involved in a typical machine learning process:

1. Domain understanding

The first step is usually to gain an in-depth understanding of the problem, and its domain. In this step, we consider the constraints of the problem, think about the inputs and outputs of the solution that we are trying to develop, and how the business is going to interpret the results.

2. Data collection and warehousing

The next step is to collect and preserve the data relevant to our problem. The amount of data that we need depends on the problem we're trying to solve. For example, training a general image classifier on thousands of categories will need a huge data of labeled images (just like ImageNet).

3. Exploratory data analysis and feature engineering

Next step usually is performing some statistical analysis on the data, handling outliers, handling missing values, and removing highly correlated features to subset of data that we'll be feeding to our machine learning algorithm.

4. Modeling (training)

Now comes the part when we train a machine learning model on the prepared data. Depending on our problem statement and the data we have, we might have to try a bunch of training algorithms and architectures to figure out what fits our use-case the best.

5. Evaluation (testing)

It's time to evaluate model performance. Usually, we have to go back and forth between modeling and evaluation a few times (after tweaking the models) before getting the desired performance for a model.

6. Deploying (inference)

Finally, we prepare our trained model for the real world. We may want to integrate our model into existing software or create an interface to use its inference.

Scaling Challenges

Okay, now let's list down some focus areas for scaling at various stages in various machine learning processes. We'll go more into details about the challenges (and potential solutions) to scaling in the second post.

Data handling

Data is iteratively fed to the training algorithm during training, so the memory representation and the way we feed it to the algorithm will play a crucial role in scaling. The journey of the data, from the source to the processor, for performing computations for the model may have a lot of opportunities for us to optimize.

Model Training

Model training consists of a series of mathematical computations that are applied on different (or same) data over and over again. This iterative nature can be leveraged to parallelize the training process, and eventually, reduce the time required for training by deploying more resources.

However, simply deploying more resources is not a cost-effective approach. We also need to focus on improving the computation power of individual resources, which means faster and smaller processing units than existing ones.

Focusing on the research of newer algorithms that are more efficient than the existing ones, we can reduce the number of iterations required to achieve the same performance, hence enhance scalability.

We can also try to reduce the memory footprint of our model training for better efficiency. Also, there are these questions to answer:

Are the extra X layers worth it?
Is an extra Y amount of data really improving the model performance?
When should we stop training?

Evaluation, experimentation, and deployment

Apart from being able to calculate performance metrics, we should have a strategy and a framework for trying out different models and figuring out optimal hyperparameters with less manual effort.

The models we deploy might have different use-cases and extent of usage patterns. Our systems should be able to scale effortlessly with changing demands for the model inference.

Continued in Part 2

Last updated on Feb 05, 2020

Engineering & Technical Insights