Exciting Directions for Mobile ML
Focusing on the Developer Experience
Welcome to NatML Weekly! This is the very first post of the series, going out every Wednesday at 11am ET. The goal of this newsletter is to give you an eclectic view of the latest in interactive media, machine learning, and mobile development.
Today, we’ll be doing things a bit differently—we’ll be imagining the future. Specifically, we will be exploring what applied ML on mobile might look like for developers in months-to-years, then work backwards to present day. Let’s begin:
Machine Learning for All
Coming from a computer vision research background, I’ve been keeping up with the latest from research labs across the country and the world. Everyday, there are hundreds of research papers uploaded to ArXiV (pronounced “archive”), a very popular open repository of research. Many of these are deep learning papers, often accompanied by open-source code and pre-trained models. Because of this, there’s an almost-abundant supply of models for the most common ML use-cases today: face detection, background removal, and so on.
But there is a massive lag in applying these models in industry, especially when it comes to interactive media applications. In a perfect world, all of these models would be available to developers to integrate and deploy very quickly. And because media developers often bring some of the most creative uses of technology, we would expect to see entirely novel ways of using these ML models across apps, games, AR/VR, and more.
NatML and NatML Hub are designed with this in mind. NatML standardizes how ML models are used in code, exposing primitives called “Predictors” that perform model pre- and post-processing for you, returning familiar data types:
|// Fetch the MediaPipe BlazeFace model data from Hub|
|var modelData = await MLModelData.FromHub("@natsuite/blazeface");|
|// Deserialize the model|
|var model = modelData.Deserialize();|
|// Create a BlazeFace predictor|
|var predictor = new BlazeFacePredictor(model);|
|// Detect faces in an image|
|Texture2D image = ...;|
|Rect faces = predictor.Predict(image);|
And on the other hand, NatML Hub provides a marketplace to find or publish predictors for different tasks.
Any Model, Anywhere
Some ML models are very (very) heavy, so much so that it’s practically impossible to run them on today’s mobile devices. Top of mind are a certain class of image style transfer models, like StyleGAN:
Mobile developers shouldn’t be prevented from using these models just because mobile devices can’t run them. In a perfect world, a developer should be able to run ML models irrespective of where the models actually run.
NatML Hub will soon offer the ability to make server-side ML predictions, with an almost-identical API for developers:
|// Fetch the StyleGAN model data from Hub|
|var modelData = await MLModelData.FromHub("@natsuite/style-gan");|
|// Create the Hub model // This is actually being created server-side|
|var model = modelData.Deserialize();|
|// Create the StyleGAN Hub predictor|
|var predictor = new StyleGANHubPredictor(model);|
|// Run style transfer|
|Texture2D content = ..., style = ...;|
|Texture2D result = await predictor.Predict(content, style);|
Different Environments, Same Code
The final and most exciting area of development for mobile ML has to do with numerical computing. Almost every model you have come across has been developed with one of the popular deep learning frameworks used by researchers in field: PyTorch, TensorFlow, NumPy. Each of these frameworks has very elaborate API’s for performing math and tensor operations.
When bringing ML models to mobile platforms—whether using TensorFlow Lite, NatML, or whatever else—developers often have to write very low-level loops and indexing code to interpret the model output data into familiar types. It’s often much better to think in terms of higher-level tensor ops; in fact, these high-level tensor ops are how the models are used in their original deep learning frameworks:
|# Load the pre-trained MobileNet v3 model|
|model = torchvision.models.mobilenet_v3_small(pretrained=True).eval()|
|# Perform inference|
|input = torch.randn(1, 3, 224, 224)|
|logits = model(input)|
|# Get the output label|
|result = logits.argmax(dim=1).item()|
|labels = ["cat", "dog", ...]|
|result_label = labels[result]|
In this example, our goal is to find the index which has the highest probability value (a.k.a `argmax`). But because these math operations don’t exist in mobile machine learning runtimes, developers have to resort to writing very complex code.
NatML will be adding tensor and math operations that mirror the PyTorch Tensor API:
|// Create an array feature|
|var logits = new MLArrayFeature<float>(...); // shape: (1, 1000)|
|// Perform argmax|
|var result = logits.ArgMax(dim: 1).Item;|
|var resultLabel = labels[result];|
With a full-feature tensor API, ML engineers bringing their models to mobile apps can simply copy their raw deep learning PyTorch code into C#—verbatim—then refactor some methods to follow C# naming conventions. The tensor API is still very early, but will be fleshed out in upcoming releases.
As you can probably tell, these predictions for the future of mobile ML are what drive the design decisions for NatML and Hub today. In our next post, we’ll be taking a step back from the nitty gritty details and focusing more on interesting models that have caught our eye, and how they might be applied in creative prototypes. If you’d like us to write about something or have any thoughts, make sure to comment on this post or reach out over email.