TensorFlow Serving our retrained image classifier

Here we’ll look at exporting our previously trained dog and cat classifier and call that with local or remote files to test it out. To do this, I’ll use TensorFlow Serving in a docker container and use a python client to call to the remote host.

_Update 12th June, 2018: I used the gRPC interface here, but TensorFlow serving now has a REST API that could be beneficial or of more interest_


  • TensorFlow serving client pip package:
    • python3: pip3 install -U tensorflow-serving-api-python3
    • python2: pip install -U tensorflow-serving-api
  • Docker

Exporting our trained model

First thing is to use our estimator from that we previously trained and we want to export the model. The primary thing we need to do here is define a serving_input_receiver_fn that defines how calls to our model are deserialized and mapped. First the code and then and explanation:

def serving_input_receiver_fn():
    feature_spec = {
        'image': tf.FixedLenFeature([], dtype=tf.string)
    default_batch_size = 1
    serialized_tf_example = tf.placeholder(
        dtype=tf.string, shape=[default_batch_size], 
    received_tensors = { 'images': serialized_tf_example }
    features = tf.parse_example(serialized_tf_example, feature_spec)
    fn = lambda image: _img_string_to_tensor(image, input_img_size)
    features['image'] = tf.map_fn(fn, features['image'], dtype=tf.float32)
    return tf.estimator.export.ServingInputReceiver(features, received_tensors)

estimator.export_savedmodel('export', serving_input_receiver_fn)

Firstly we define the features in the incoming protobuffer. These should match the feature dictionary you returned from your dataset when training. In our case, that’s just one tensor with key ‘image’

feature_spec = {
    'image': tf.FixedLenFeature([], dtype=tf.string)

Next we define a placeholder for this incoming tensor to be fed into. We aren’t using our dataset tensors now so must define this entrypoint to the start of our graph

default_batch_size = 1
serialized_tf_example = tf.placeholder(
    dtype=tf.string, shape=[default_batch_size], 

Next we parse the incoming example and then map the incoming image data to be decoded and resized as we did in our input data pipeline

received_tensors = { 'images': serialized_tf_example }
features = tf.parse_example(serialized_tf_example, feature_spec)
fn = lambda image: _img_string_to_tensor(image, input_img_size)
features['image'] = tf.map_fn(fn, features['image'], dtype=tf.float32)

For reference, the mapping function as we defined previously is

def _img_string_to_tensor(image_string, image_size=(299, 299)):
    image_decoded = tf.image.decode_jpeg(image_string, channels=3)
    # Convert from full range of uint8 to range [0,1] of float32.
    image_decoded_as_float = tf.image.convert_image_dtype(image_decoded, dtype=tf.float32)
    # Resize to expected
    image_resized = tf.image.resize_images(image_decoded_as_float, size=image_size)
    return image_resized

Finally we return the input receiver and then use our function as the entry point for serving. This will then save our model and frozen variables into a local folder called ‘exports’

Inspecting our saved model

TensorFlow comes with a CLI to inspect and execute a saved model and is available as saved_model_cli. Taking a look at what we’ve got:

$ saved\_model\_cli show --dir exports/1525524220

The given SavedModel contains the following tag-sets: serve

$ saved\_model\_cli show --dir exports/1525524220 --tag_set serve

The given SavedModel MetaGraphDef contains SignatureDefs with the following keys: SignatureDef key: “classification” SignatureDef key: “predict” SignatureDef key: “regression” SignatureDef key: “serving_default”

$ saved\_model\_cli show --dir exports/1525524220 --all

# A lot of output, but shows expected input shapes and outputs tensors.

This CLI can be very handy to build up your client code to be calling the grpc endpoints as expected. It can also be used to run the saved model, but I’ve gone straight to doing that with python below.

Hosting our model

We’ll build a docker image as our serving host and below is the Dockerfile I used which installs tensorflow model server. It also runs a stop ADD exports /model to copy our exports into the container

FROM ubuntu:16.04

# Install general packages
RUN apt-get update && apt-get install -y \
        curl \
        libcurl3-dev \
        unzip \
        wget \
        && \
        apt-get clean && \
        rm -rf /var/lib/apt/lists/*

# New installation of tensorflow-model-server
RUN TEMP_DEB="$(mktemp)" \
        && wget -O "$TEMP_DEB" 'http://storage.googleapis.com/tensorflow-serving-apt/pool/tensorflow-model-server-1.5.0/t/tensorflow-model-server/tensorflow-model-server_1.5.0_all.deb' \
        && dpkg -i "$TEMP_DEB" \
        && rm -f "$TEMP_DEB"

ADD exports /model

ENTRYPOINT [ "tensorflow_model_server", "--port=8500", "--model_base_path=/model" ]

With this we run build and then run a host that will be available at localhost:8500 with:

docker build -q -t damienpontifex/dogscats-serving .
docker run --rm -d -p 8500:8500 --name dogscats damienpontifex/dogscats-serving

Client calling our classifier

Calling our serving host requires grpc and is explained when I setup serving with a canned estimator. The main difference is we are passing a local or remote image path which we’ll read in and send off as binary data to our server

import os
import urllib
import tensorflow as tf
from grpc.beta import implementations
from tensorflow_serving.apis import predict_pb2, prediction_service_pb2, model_pb2

def make_request(stub, file_path):
    request = predict_pb2.PredictRequest()
    request.model_spec.name = 'default'
    if file_path.startswith('http'):
        data = urllib.request.urlopen(file_path).read()
        with open(file_path, 'rb') as f:
            data = f.read()
    feature_dict = {
        'image': tf.train.Feature(bytes_list=tf.train.BytesList(value=[data]))
    example = tf.train.Example(features=tf.train.Features(feature=feature_dict))
    serialized = example.SerializeToString()
    request.inputs['inputs'].CopyFrom(tf.contrib.util.make_tensor_proto(serialized, shape=[1]))
    result_future = stub.Predict.future(request, 10.0)
    prediction = result_future.result()
    predicted_classes = list(zip(prediction.outputs['classes'].string_val, prediction.outputs['scores'].float_val))
    predicted_classes = list(reversed(sorted(predicted_classes, key = lambda p: p[1])))
    return predicted_classes

channel = implementations.insecure_channel('localhost', 8500)
stub = prediction_service_pb2.beta_create_PredictionService_stub(channel)

With this setup, we can call our serving host with a couple of calls, each showing a local dog picture and a remote cat picture.

dog_path = os.path.expanduser('~/Downloads/Dog_CTA_Desktop_HeroImage.jpg')
output = make_request(stub, dog_path)
# [(b'dogs', 0.9998657703399658), (b'cats', 0.00013415749708656222)]

output = make_request(
# [(b'cats', 0.9999951124191284), (b'dogs', 4.922635525872465e-06)]

Each of these calls took about 175ms to my localhost running on my MacBook Pro and returned > 99% prediction of the correct class :)