Mark McKenzie

TensorFlow Serving practical introduction.

Published Aug 02, 2018

TensorFlow has been open to the public for about a year now and the buzz is real. Everyone is building models using TensorFlow, and we even have some state of the art examples such as: Parsey McParseface. As a data scientist at Old Street Labs(OSL) one of the most common questions I’m asked is how we deploy models and manage those in a production environment.

Enter, TensorFlow Serving.

TensorFlow Serving makes it simple and quick to deploy your models into a production environment. Not too dissimilar to how you would deploy any other microservice. It also works with TensorFlow models out of the box without much modification.

The purpose of this tutorial is to provide a brief introduction into TensorFlow Serving by walking through the process of deploying a TensorFlow model inside a Docker container.

We’re going to be using the TensorFlow Inception model and gRPC. The TensorFlow Inception model provides everything that we need to restore the inference graph from well trained variables. We can then export it without spending hours training something ourselves.

Prerequisites:

Docker installed and running on your machine

See here for instructions on installing/running Docker for your machine.

Tensorflow Serving

Let’s get started and build a container for our TensorFlow serving model.

git clone --recursive [https://github.com/tensorflow/serving](https://github.com/tensorflow/serving)

cd serving

Building from a Dockerfile means we don’t have to manually build all the required dependencies. I would suggest that you build from source at least once if you’re serious about TensorFlow. The next command might take some time so go grab a coffee.

docker build --pull -t $USER/tensorflow-serving-devel -f ./tensorflow_serving/tools/docker/Dockerfile.devel .

You will know everything is working if your output looks something like the following:

Removing intermediate container 2d49b9326244
Successfully built 0e4f25623cd1

Let’s go ahead and run that container:

docker run --name=tensorflow_container -it $USER/tensorflow-serving-devel

It’s now time to clone TensorFlow Serving inside our container. Then we can ensure that everything is running as expected.

git clone --recursive [https://github.com/tensorflow/serving](https://github.com/tensorflow/serving)

cd serving/tensorflow

./configure

At this stage you will have to complete a brief setup for TensorFlow. For the purposes of this tutorial there’s no need to build support for GPU and all the default values should be applicable.

cd ..

bazel build -c opt tensorflow_serving/...

Building will take some time to complete. Once this has completed we are ready to go ahead and test out the build by running the model server.

bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server

Your output should be something similar to:

Usage: model_server [--port=8500] [--enable_batching] [--model_name=my_name] --model_base_path=/path/to/export

Inception Model

Now we can go ahead export the Inception Model ready for deployment.

curl -O [http://download.tensorflow.org/models/image/imagenet/inception-v3-2016-03-01.tar.gz](http://download.tensorflow.org/models/image/imagenet/inception-v3-2016-03-01.tar.gz)

tar xzf inception-v3-2016-03-01.tar.gz

bazel-bin/tensorflow_serving/example/inception_export --checkpoint_dir=inception-v3 --export_dir=inception-export

At this stage you should get notification that the model was successfully loaded and exported:

Successfully loaded model from inception-v3/model.ckpt-157585 at step=157585.
Successfully exported model to inception-export

If you wanted to export your own model you need to make use of the TensorFlow Serving Exporter module. A short tutorial can be found here. If you exported your own model rather than the Inception example you can proceed with this tutorial replacing the relevant names.

Putting it all together

Time to run our Inception model and the gRPC server locally.

bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server --port=9000 --model_name=inception --model_base_path=inception-export &> inception_log &

You now have a Docker container serving your model ready to be consumed over gRPC.

Go ahead and try it out!

sudo apt-get -qq update
sudo apt-get install wget

wget [https://upload.wikimedia.org/wikipedia/en/a/ac/Xiang\_Xiang\_panda.jpg](https://upload.wikimedia.org/wikipedia/en/a/ac/Xiang_Xiang_panda.jpg)

bazel-bin/tensorflow_serving/example/inception_client --server=localhost:9000 --image=./Xiang_Xiang_panda.jpg

If you run into any issues here it’s more than likely to do with a hard timeout set in the TensorFlow client file. For those on less than cutting edge hardware you can change this timeout in the client file. By default this is set to 10seconds.

The relevant section of the client file is line 51 inside inception_client.py, located here: bazel-bin/tensorflow_serving/example/inception_client.runfiles/tf_serving/tensorflow_serving/example/inception_client.py.

result = stub.Predict(request, 10.0) # change this value

If everything worked as expected you should see output from your model!

outputs {
  key: "classes"
  value {
    dtype: DT_STRING
    tensor_shape {
      dim {
        size: 1
      }
      dim {
        size: 5
      }
    }
    string_val: "giant panda, panda, panda bear, coon bear, Ailuropoda melanoleuca"
    string_val: "indri, indris, Indri indri, Indri brevicaudatus"
    string_val: "lesser panda, red panda, panda, bear cat, cat bear, Ailurus fulgens"
    string_val: "gibbon, Hylobates lar"
    string_val: "sloth bear, Melursus ursinus, Ursus ursinus"
  }
}
outputs {
  key: "scores"
  value {
    dtype: DT_FLOAT
    tensor_shape {
      dim {
        size: 1
      }
      dim {
        size: 5
      }
    }
    float_val: 8.98223686218
    float_val: 5.39600038528
    float_val: 5.00718212128
    float_val: 2.93680524826
    float_val: 2.78477811813
  }
}

Moving forward

I hope that I’ve demonstrated some of the power that you can find within TensorFlow Serving. Most people I speak to are using TensorFlow in one way or another. TensorFlow Serving provides an interface for managing production models that works seamlessly with TensorFlow. There’s little justification for using one without the other.

If you like what you’ve seen so far I suggest you check out the TensorFlow Serving documentation. Also have a look at the advanced example.

We’re always on the lookout for data scientists and talented engineers at Old Street Labs. You can find open roles here. We’re also happy to create roles for exceptional individuals. If this describes you feel free to get in touch!

mark.mckenzie@oldstlabs.com

Tensorflow Machine learning

Report

Enjoy this post? Give Mark McKenzie a like if it's helpful.

Mark McKenzie

Discover and read more posts from Mark McKenzie

get started