File Handling

Downloading files

A file in the realm of the Vathos REST-API consists of two parts:

metadata such as size and type
the actual file contents, a blob

A list of downloadable files is obtained by listing their metadata, in pseudo code:

GET https://api.vathos.net/v1/files
Authorization: Bearer $TOKEN

Note that to this and all following requests, a valid bearer token must be added in the header authorization as explained in the authentication section. The response will be an array of metadata objects

[
  {
    "_id": "6304d06de6a02100127e2086",
    "blob": "3d342b67-af05-483f-a93b-db821bfc6360",
    "hash": "3a56227cd4b83d395ae83d550dba6e9358ca6a9924bb60fbf6dfec44869dd902",
    "device": null,
    "size": 6509368,
    "originalName": "image.png",
    "contentType": "image/png",
    "createdAt": "2022-08-23T13:04:45.343Z",
    "updatedAt": "2022-08-23T13:04:45.343Z",
    "__v": 0
  },
  {}
]

with the following properties:

_id: the metadata and blob ID
hash: a unique SHA-2 hash of the file contents
device: id of the edge device the file is synchronized with, null otherwise
size file size in number of bytes
originalName: original name on the file system before upload
contentType: media (aka MIME) type

In this example, to actually download the blob associated with the first file metadata object, the file ID must be sent to a different API endpoint:

GET https://api.vathos.net/v1/blobs/6304d06de6a02100127e2086
Authorization: Bearer $TOKEN

Note that the file ID can be used with both paths /files and /blobs, the former returning the metadata, the latter returning the actual file itself.

Uploading files

One of multiple files are uploaded via multipart/form-data request against the /blobs endpoint. The anatomy of such a request is best explained at hand of the following example, where two PNG images stored on the local disk are uploaded (e.g., to run some calibration task):

POST https://api.vathos.net/v1/blobs
Authorization: Bearer $TOKEN
Content-Type: multipart/form-data; boundary="random_separation_string"

--random_separation_string
Content-Disposition: form-data; name="files"; filename="img_000.png"
Content-Type: image/png

< img_000.png
--random_separation_string
Content-Disposition: form-data; name="files"; filename="img_001.png"
Content-Type: image/png

< img_001.png
--random_separation_string--

Apart from the header containing the token for authorization, a header specifying the content type to be multipart/form-data; boundary="random_separation_string" must be set. The content-type header must define the boundary string that separates one file from the rest of the body. It can be chosen arbitrarily as long as there is no overlap with the actual contents of the request. Each segment/part of the body is enclosed by two instances of the boundary string, each prefixed with a double hyphen --. The end of the entire body is marked by yet another double hyphen. Each part starts of with a small header section of its own: as usual, Content-Type denotes the file’s media type (formerly known as MIME type). The Content-Disposition must be form-data followed by an arbitrary name for the form part, and the original name of the file on the source file system. The part header is followed by the (encoded) raw of the actual file content, here denoted by the input operator < and the file name.

Assembly of above pseudo-code request in Python is straightforward with the help of the requests module:

import requests

files = {
  'img000': open('img_000.png', 'rb'), 
  'img001': open('img_001.png', 'rb')}
}

upload_response = requests.post(
  https://api.vathos.net/v1/blobs,
  files=files,
  headers={'Authorization': 'Bearer ' + token})

uploaded_files = upload_response.json()

Images

Images are at the heart of Vathos’ computer vision services, that’s why they deserve a special treatment beyond the basic file metadata described above. A search is executed with a GET request against the /images endpoint:

GET https://api.vathos.net/v1/images
Authorization: Bearer $TOKEN

Since there are possibly ten of thousands of images, the search needs to be restricted further. First of all, by default, the maximum number of images returned by the server is 100. This value can be increased by setting the query parameter $limit. Care must be taken when forming the corresponding URL because $ is not an ASCII character and must be encoded as described in the HTTP primer:

GET https://api.vathos.net/v1/images?%24limit=200
Authorization: Bearer $TOKEN

Analogously, one can skip $n$ “pages” of 100 items by adding $skip=n to the query string (of course, properly encoded). Secondly, we could sort images in the inverse order they were acquired, putting the most recent ones at the beginning of the list, by equating the $sort parameter with

{
    "timestamp": -1
}

The timestamp is the precise acquisition time as UNIX epoch in milliseconds (see below). The sorting order, ascending or descending, is controlled by the sign of 1. Images captured later than a point in time, can be obtained with the following query object:

{
    "timestamp": {
        "$gt": 1667474108673
    }
}

or with a request against the encoded URL:

GET https://api.vathos.net/v1/images?timestamp%5B%24gt%5D=1667474108673
Authorization: Bearer $TOKEN

As usual, a request

GET https://api.vathos.net/v1/images/6363a2e27c281d2c8041d3c0
Authorization: Bearer $TOKEN

yields a single image, here with ID 6363a2e27c281d2c8041d3c0:

{
    "size": {
      "width": 80,
      "height": 60,
      "channels": 1
    },
    "type": "uint16",
    "session": "foo",
    "_id": "6363a2e27c281d2c8041d3c0",
    "timestamp": 1667474146718,
    "device": "6252940d1344d300191bbc80",
    "file": "6363a2e39cf66000199e3d11"
}

Only the most relevant properties are shown in above response, which are:

size: lateral image size in pixels and number of channels
type: data type or precision of a single channel
session: user-defined string under which images can be grouped
timestamp: acquisition time as UNIX epoch in ms
device: ID of the device with which the image was captured
file: reference to the underlying file

Further properties of the image schema are explained in the API documentation. Not that in particular the properties session and device can be used to limit the number of results when searching for images as described above.

Image compression

So far images from a range of different sensing modalities are supported:

“natural” images (color/gray-scale)
infrared images
depth images

When captured with an edge device, all of these are stored in a compressed PNG format. Color and gray-scale images consist of one or multiple channels of 8 bits of precision, which is indicated by the value of the type parameter in the image metadata object. They can be downloaded and opened like any other PNG file. Infrared and depth images have higher precision and make an additional decompression step necessary.

Unsigned short precision

In an image of type uint16 (e.g., from an infrared sensor), each pixel consists of 2 bytes. The least significant byte (LSB) is packed into the red channel of the PNG-compressed RGB image, the most significant byte (MSB) into the green channel. These two channels in the output of the PNG-decoder must re-combined into the original uint16 image as follows:

from imageio import imread
raw_rgb_image = imread('image.png').astype('uint16')
uint16_image = raw_rgb_image[:, :, 0] + 256 * raw_rgb_image[:, :, 1]

To avoid numeric overflow, make sure to cast the raw image to uint16 before adding the channels.

Floating point precision

Point clouds are compressed by omitting $x$- and $y$-coordinates and packing the depth which typically has single-float precision in to the four channel of an RGBA image and storing the latter in PNG format. To recover the point cloud from the PNG, two steps are necessary:

Unpacking the channels of the and converting them into an image of type float32:

from imageio import imread
raw_rgb_image = imread('image.png')
depth_image = np.reshape(np.frombuffer(raw_rgb_image.flatten().tobytes(), dtype='f'), 
                         raw_rgb_image.shape[:2])

Re-projection with the help of the inverse projection matrix K:

import numpy as np
K = np.reshape(
    np.array([1779.80041, 0, 0, 0, 1782.1018, 0, 986.39929, 597.44458, 1]),
    (3, 3), 'F')
size = depth_image.shape[::-1]
u, v = np.meshgrid(np.arange(0, size[0]), np.arange(0, size[1]))
xyz_image = np.zeros(depth_image.shape + (3,))
xyz_image[:,:,0] = depth_image/K[0,0]*(u - K[0,2])
xyz_image[:,:,1] = depth_image/K[1,1]*(v - K[1,2])
xyz_image[:,:,2] = depth_image

To be consistent with case of (un)packing depth image with unsigned-short precision, depths are by convention stored in millimeters. An implementation of all relevant packing and unpacking algorithms is available in the open-source Vathos client library for Python.