Successful deployment of AI into edge devices can deliver many benefits, including lower latency and network requirements as well as increased efficiency, privacy, and security. But there are also substantial challenges. Data must be collected from heterogeneous sources and made available to users, the AI model must be deployed and then protected on the edge device, and that fleet of deployed devices needs to be managed.
Eclipse Kura was created to help tackle this challenge. And release 5.1.0, published in March 2022, brought a powerful new capability to users: a set of APIs for managing Inference Engines, the first implementation of which is based on the Nvidia Triton Inference Server. In this article, we’ll explain how this new feature and implementation make Kura more powerful and useful.
Build Powerful Tools Using New Capabilities
A great example of how these APIs and the implementation empower users is the ability to build complex but highly useful tools, like deep-learning anomaly detectors. A full explanation of the entire process is available in our documentation, so in this article we’ll just highlight how Kura’s new capabilities enable the process.
Anomalies, of course, are data samples or observations that differ so much from the average or expected that they suggest some hidden or unknown mechanism is at work in your system. There are many different applications of anomaly detection, from intrusion detection to quality monitoring. But the main reason anomaly detection is important is because anomalies in data translate to significant and even critical actionable information. Naturally, there are various ways to detect anomalies. In this example, we’ll focus on using autoencoders.
Autoencoders are semi-supervised or self-supervised ML algorithms that are trained to encode data and then reconstruct them back to the original representation. They rely on two components: an Encoder and a Decoder. The Encoder maps the input to a lower dimensional representation, while the Decoder maps the encoded data back to the input.
Triton’s Inference Server Standardizes Model Deployment
Many of the key functionalities necessary for autoencoder deployment were already available in Eclipse Kura. The platform provides the ability to capture the needed field data, as well as the ability to process it at the edge and publish it through MQTT connectivity to the cloud.
In many cases, just deploying AI models is not enough. You need to make the AI fast and scalable as well. The capacity to deploy your anomaly detector across your whole network of devices is crucial. This is where the Nvidia Triton Inference Server comes in.
The first step in using Triton to serve your models is exporting them in a compatible format and storing them into a model repository. The model repository is a file-system based repository of the models that Triton will make available for inferencing. The models may require use of a model configuration, which is a Google protocol buffers (protobuf) definition that contains the runtime configuration information, as well as the input/output shape accepted by the model.
Triton Inference Server simultaneously executes multiple models from either the same or different backends, on a single CPU or GPU – though it can also create multiple instances in a multi-GPU environment. Using a model control API, Triton can serve tens or hundreds of models at once. These base capabilities are further enhanced by Triton’s flexibility. It’s available as a Docker container and can integrate with Kubernetes, Kubeflow, and KServe to provide functionalities from orchestration to end-to-end AI workflows. And its ability to export Prometheus metrics enables monitoring of metrics like GPU utilization.
One way Triton can enhance inference performance while also simplifying model handling is by using an Ensemble Model. This type of pipeline includes the models as well as the connection of input and output tensors between them. It is generally used to encapsulate procedures that use multiple models. The advantage to using Ensemble Models is that they avoid the overhead of transferring intermediate tensors and minimize the number of requests that must be sent to Triton.
Standardized Deployment Enhances Kura’s Utility
These tools for standardizing deployment make Eclipse Kura that much more useful and comprehensive for AI edge deployment.
Eclipse Kura’s feature set goes way beyond AI. It includes the following key features, among others:
- API access to the hardware interfaces of IoT gateways, including serial ports, GPS, watchdog, and others
- Ready-to-use field protocols such as Modbus, OPC-UA, and S7
- An application container
- Web-based visual data flow programming (Kura Wires)
Combining enhanced model deployment with a comprehensive and extensible Java-based open source IoT Edge Framework makes Kura both powerful and more straightforward to use in the field.
If you’re interested in checking out Eclipse Kura and maybe even getting involved, our website is a great place to start, as is the documentation. If you’d like to watch our EclipseCon 2022 talk where we explain how to use Eclipse Kura to build a deep-learning anomaly detector, you can find it here.