User Guide

Introduction

Vathos offers a computer vision programming interface (API), i.e., a toolbox of algorithms and data which clients can use in their industrial automation projects by making requests over the network. The system is hybrid in the sense that it runs on a combination of cloud and edge infrastructure with the advantage that customers can tap on vast resources of the cloud for computationally demanding workloads (e.g., synthesis of data, training, etc.) but rest assured that, on the shop floor, inference is executed safely and efficiently. To make the most of its capabilities, it is important to understand the various components involved and the interaction between them. These are roughly outlined in the following architecture diagram:

A part of API runs on servers in a public or private cloud as shown in the upper right corner of the diagram. A typical client could be any piece of software capable of issuing HTTP requests, e.g., a mobile app or web site such our main user portal at https://www.vathos.vision. The client exchanges encrypted messages with the API through the internet. Those could, e.g., initiate the training of a neuronal network based on a previously uploaded data set and consequently submit images for analysis (e.g., of the class of objects present in the image and their locations/poses).

Since there are valid concerns against relying on cloud services in automatic manufacturing systems, inference can also be run locally in the machine network. Here, the implementation of the API will run on an appliance (or “edge device”) which we provision for the customer. Now, the client can be a robot, a programmable logic controller (PLC), or even an enterprise IT system such as MES, SCADA, etc. The advantages of restricting communication to the edge are threefold:

Requests between clients on the shop floor (bottom left) and the edge device (in the middle of the top row) do not suffer from network latency or even a loss of connection to remote servers.
Remote control from the cloud is intentionally prohibited. The effect of a request, in particular robot motions that may put human health or life at risk, must be supervised by the person initiating it.
Requests are 100% secure as a man-in-the-middle attacks, i.e., attempts to steal confidential information on their way between client and server, are impossible.

The last ingredient is the imaging sensor, which can be a depth, monochrome, or even thermographic camera. It is directly connected to the edge device and exposes image data directly to the service installed thereon. This makes processing them even faster than with data exchange on the local network.

How to Read this Guide

All services begin with a product and a camera to take images of it, in that sense, a good place to start studying this documentation is the section on managing products.
Most available services are somehow related to localizing objects for picking with an (industrial) robot. Collision-free picking requires some geometrical model of the tool used for gripping. How to prepare such a model and supply it to the API is subject of the section on grippers.
What ties together gripper and a localized product (i.e., objects) is one or multiple alternative grasps or grips. Since the API is mainly geared toward machine tending applications, where precise placement is required, we rely mainly on user input to define these unambiguously as described in the section recommended for reading next.
Make yourself acquainted with the variety of sensor hardware supported by the API. Which is ideal in terms of price and performance for the automation problem at hand, depends very much on the target object (particularly, its geometry and reflectance), the environmental conditions (lighting, available space, etc.), and last but not least the process itself (e.g., cycle time).
Now, it is time to pick the right service for the (object localization/classification) problem at hand. Most algorithms we provide leverage on machine learning in some way or another. This means there is a training phase before the vision system starts operating on the shop floor. Make sure to study the subsection on training for each service first before moving on to the subsection giving a high-level introduction to running inference during operation. If you plan to pick localized items with a robot, you will not get around estimating the relationship between camera and robot coordinate system by means of hand-eye calibration.
Section Edge API then makes running inference on your specific robot model (or other clients in a network on the shop floor) concrete. A description of available protocols will help you identify the right communication channel for your client device. Finally, you will find information on integrating our services with your specific robot brand.

The expert wishing to integrate API functions in their own applications, should have a solid understanding of the HTTP protocol and likely benefits from studying the concepts section: