How to serve ludwig models from a non python runtime? #6

Peter Hausel pk11 · Feb'19

2 #1

Hi, great project! Thanks for open sourcing it!

There is a discussion on model serving here https://github.com/uber/ludwig/issues/55#issuecomment-463415378 this thread also offers some insights into pre and post processing. It sounds like the SavedModel as well as the mapping between user input->tensors(train_set_metadata) are saved

My question is whether train_set_metadata contains all preprocessing and postprocessing logic and everything else saved inside the SavedModel? Specifically, how one could recreate the preprocessing and postprocessing steps for prediction from a runtime that tf supports (say java) that it’s not python?

Also for model serving internally uber exposing ludwig models via python and some sort of service layer on top of that?

replies 6
views 5.9K
likes 0

Piero Molino w4nderlust · Feb'19

1 #2

Thanks for your interest Peter, I believe that between train_set_metadata.json and hyperparameters.json you have the full specification. In hyperparameters you should have also the preprocessing parameters, that are needed by an hypothetical reimplementation in another language to work (for instance what is the maximum number of words in a text feature), while in the train_set_metadata you have mapping like from string to integer to map a piece of text into the integers in the ndarrays that are then fed into the tensorflow model itself.

Let me add that in future version, also depending of the feedback of you users and use cases you have, I may decide to organize things in a way that is more comfortable, so feel free to suggest.

At Uber we have two strategies, one is using [https://eng.uber.com/michelangelo-pyml/]([object Object]) and the other one is actually loading models with java tensorflow and use a spark data transformation pipeline that is equivalent to the preprocessing Ludwig does. The first solution is more elegant and more generic (support also pytorch models and any python thing really) while the second one is more clunky and error prone, but enables increased scalability and a more resource effective way to deal with high numbers of queries per second.

Peter Hausel pk11 · Feb'19 Author

Thanks for the detailed explanation. I will try to create a prototype handling ludwig model serving with java(using java tf). For additional context: in our case, we would need to deploy models in java services due to latency requirements. Similar to your spark deployment option.

while the second one is more clunky and error prone, but enables increased scalability and a more resource effective way to deal with high numbers of queries per second.

These tradeoffs resonated with me. We had similar experiences when we switched to serving tf models via java.

Piero Molino w4nderlust · Feb'19

I’m not an expert in the matter, but if latency is paramount, have you considered serving with c++? The amount of work for implementing the preprocessing and postprocessing should be the same in c++ and Java.

Anyway, let me know if there are unclear things. You can take a look directly at the data/preprocessing.py and data/postprocessing.py packages (and the different pre/postprocessing done by each feature module) and it should be pretty easy to replicate.

If by any chance you end up releasing it as open source, it would be really cool to have the first piece built on top of Ludwig! :)

Peter Hausel pk11 · Feb'19 Author

17 #5

I’m not an expert in the matter, but if latency is paramount, have you considered serving with c++? The amount of work for implementing the preprocessing and postprocessing should be the same in c++ and Java.

good point on suggesting using c++, however, sadly our service environment only supports jvm runtimes, so I would start with Java for now. As you said though, a c++ implementation should be straightforward after mine.

Anyway, here is where I am right now:

created a Text Clarification Ludwig model(first example). (I wanted to start with an example where there is a string input since that's usually mapped to multiple tensors internally and the pre processing logic can be relatively complex.)
restored the train model from the checkpoint and observed the tf graph. Identified the following placeholder as the relevant one

(<tf.Tensor 'text/text_placeholder:0' shape=(?, ?) dtype=int32>,)

this informed me that the text is encoded as int32 tensors
3) then identified the features that are being created for text features:

ludwig/features/text_feature.py#L119 @dfe0c6d

        )
        word_max_len = min(
            preprocessing_parameters['word_sequence_length_limit'],
            word_max_len
        )
        return {
            'char_idx2str': char_idx2str,
            'char_str2idx': char_str2idx,
            'char_str2freq': char_str2freq,
            'char_vocab_size': len(char_idx2str),

The content has been truncated. see original

then looked at train_set_metadata and it seems like the logic I would need to implement is something like this:

split the input string across word boundaries
create word_max_sequence_length int32 tensor by finding max word length based on my Collection<String> or runtime argument word_sequence_length_limit (whichever is smaller)
create a word_vocab_size tensor by parsing train_set_metadata.json, finding word_idx2str array and returning array length
then word by word create two tensors word_str2idx and word_str2freq using train_set_metadata.

Three questions:

Is the above accurate?
What's the order of these tensors?
Also, as for the char features, do I need to parse each char in a word and create char_str2idx and char_str2freq tensors too?

🙏

Peter Hausel pk11 · Feb'19 Author

3 #6

OK, just to update this thread: it seems the above is definitely the wrong track, since that code path is mainly about how to populate train_set_metadata, not about how to convert input string into tensors. Right now investigating

ludwig/models/model.py#L752 @dfe0c6d


        while not batcher.last_batch():
            batch = batcher.next_batch()
            result = session.run(
                output_nodes,
                feed_dict=self.feed_dict(
                    batch,
                    regularization_lambda=regularization_lambda,
                    dropout_rate=0.0,
                    is_training=is_training

The content has been truncated. see original

where seemingly we pass dataset dict we are building here:

ludwig/data/preprocessing.py#L140 @dfe0c6d

    data = {}
    for feature in features:
        add_feature_data = get_from_registry(
            feature['type'],
            base_type_registry
        ).add_feature_data
        if 'preprocessing' in feature:
            preprocessing_parameters = merge_dict(
                global_preprocessing_parameters[feature['type']],
                feature['preprocessing']

The content has been truncated. see original

but any pointer on how generally input values getting mapped to tensors would be greatly appreciated.

Piero Molino w4nderlust · Feb'19

You are on the right track I believe. What you should look at is the TextInputFeature class inside features/text_features.py . There you'll find both the function that creates the metadata mappings (word_str2idx etc.) called feature_meta (there's also get_feature_meta the only thing it does is setting the values from the other function) and the function that uses this metadata to actually create the tensors, which is feature_data. At prediction time you don't need feature_metadata, as you already have all those mappings inside the trainin_set_metadata.json file. So You just need to replicate the feature_data function that uses those mappings to obtain numpy ndarrays.
Let me know if you need further clarifications.