Happy hump day! Today, we’ll be discussing exciting developments coming to NatDevice soon. Many of these developments are designed specifically for building high-performance machine learning applications, just like Snapchat.
If you are unfamiliar, NatDevice is our cross-platform media device API, providing an extensive but simple API for working with device cameras and microphones. With the introduction of NatML, it became clear that providing seamless integrations will allow app developers quickly build computer vision apps in Unity very, very quickly. So let’s jump right in to some of these developments:
Device Outputs: Modular Computer Vision
Taking inspiration from NatCorder, our cross-platform video recording API, NatDevice will be introducing the concept of device outputs. A device output is a modular, lightweight primitive that receives raw pixel buffers or sample buffers from a camera or audio device (respectively), and transforms them into a different form:
// Create a device query to discover the rear camera | |
var query = new MediaDeviceQuery(MediaDeviceCriteria.RearCamera); | |
// Get the rear camera | |
var device = query.current as CameraDevice; | |
// Create a texture output to convert pixel buffers to textures | |
var textureOutput = new TextureOutput(); | |
// Start the camera preview | |
device.StartRunning(textureOutput); | |
// Get the preview texture from the texture output | |
Texture2D previewTexture = await textureOutput; |
This opens up a wide variety of processing capabilities, including image resizing, image encoding (JPEG, PNG), video recording (by connecting to NatCorder for example), audio analysis (FFT and more), and so on. Better yet, all outputs are extremely modular, with the only requirement being that they must define an implicit conversion to an Action<CameraImage>
or Action<AudioBuffer>
:
// We can perform an FFT on the microphone data... | |
var device = ... as AudioDevice; | |
// Using an audio spectrum output... | |
var spectrumOutput = new AudioSpectrumOutput(); | |
// But we can also do more with the audio data from the mic... | |
device.StartRunning(audioBuffer => { | |
// Update the spectrum output | |
spectrumOutput.Update(audioBuffer); | |
// But do other stuff with the audio buffer | |
DoOtherStuff(audioBuffer); | |
}); |
Camera, Lens, and Frame Metadata
NatDevice will also be extended to provide more information with each camera frame that is provided. This will include the camera intrinsic matrix, exposure duration, and any other information provided by the native camera API. This information can be critical for certain computer vision and machine learning pipelines, like those that rely on simulating a weak perspective camera for 3D bounding box estimation or 2D-to-3D deprojection.
Furthermore, the camera API will be redesigned to take advantage of zero-copy memory, giving the application access to the native camera preview buffer provided directly from the underlying camera API—no processing applied. We will then provide device outputs which perform the same underlying post-processing that are currently done opaquely, like YUV-to-RGBA conversion and rotation. With this new API, developers will have much more control over the performance of their vision pipelines with a fully-transparent API.
With the introduction of NatML, it has become quite clear to us that the future of computer vision on mobile is modular, high performance, highly-configurable, and highly-sharable. With an abundance of ML models for doing everything from changing your hair color to detecting objects, the reign of classic computer vision algorithms—implemented with the likes of OpenCV— is quickly coming to an end.