Never Lose Data With Powerful Global Metadata Search

Storing data in multiple clouds without a global metadata search engine is like storing wine bottles without labels in random shelves: the wine may be safe but you’ll never know which bottle will be appropriate for dinner. Using one object-based storage system can easily become complex but when you start uploading files to multiple clouds […]

Written By Stefano Maffulli

On August 8, 2018
"

Read more

Solve the challenges of large-scale data, once and for all.

Storing data in multiple clouds without a global metadata search engine is like storing wine bottles without labels in random shelves: the wine may be safe but you’ll never know which bottle will be appropriate for dinner. Using one object-based storage system can easily become complex but when you start uploading files to multiple clouds things can become an inextricable mess where nobody knows what is stored where. The good thing of object store is that objects are usually stored with metadata to describe them. For example, a video production company can include details to indicate that a video file is “production ready” or contain details about the department that produced the file, when raw footage was taken or the rockstar featured in a video. The tags we used to identify pictures of melons with Machine Box example are metadata, too.

Zenko offers a way to search metadata on objects stored across any cloud: whether your files are in Azure, Google Cloud, Amazon, Wasabi, Digital Ocean or Scality RING, you’ll be able to find all the videos classified for production or all the images of water melons.

The global metadata search capability is one of the core design principles of Zenko: one endpoint to control all your data, regardless of where it’s stored. The first implementation was using Apache Spark but the team realized it wasn’t performing as expected and switched to MongDB. Metadata searches can be performed from the command line or from the Orbit graphical user interface. Both searches use a common SQL-like syntax to drive a MongoDB search.

The Metadata Search feature expands on the standard GET Bucket S3 API. It allows users to conduct metadata searches by adding the custom Zenko querystring parameter, search. The search parameter is structured as a pseudo-SQL WHERE clause and supports basic SQL operators. For example, “A=1 AND B=2 OR C=3”. More complex queries can also be made using nesting operators, “(” and “)”.

The search process is as follows:

1. Zenko receives a GET request containing a search parameter:

GET /bucketname?search=key%3Dsearch-item HTTP/1.1
Host: 127.0.0.1:8000
Date: Wed, 18 Oct 2018 17:50:00 GMT
Authorization: <authorization string>

2. CloudServer parses and validates the search string: If the search string is invalid, CloudServer returns an InvalidArgument error. If the search string is valid, CloudServer parses it and generates an abstract syntax tree (AST).

3. CloudServer passes the AST to the MongoDB backend as the query filter for retrieving objects in a bucket that satisfies the requested search conditions.

4. CloudServer parses the filtered results and returns them as the response. Search results are structured the same as GET Bucket results:

<?xml version="1.0" encoding="UTF-8"?>
<ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
   <Name>bucketname</Name>
   <Prefix/>
   <Marker/>
   <MaxKeys>1000</MaxKeys>
   <IsTruncated>false</IsTruncated>
   <Contents>
      <Key>objectKey</Key>
      <LastModified>2018-04-19T18:31:49.426Z</LastModified>
      <ETag>&quot;d41d8cd98f00b204e9800998ecf8427e&quot;</ETag>
      <Size>0</Size>
         <Owner>
            <ID>79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be</ID>
            <DisplayName>Bart</DisplayName>
         </Owner>
      <StorageClass>STANDARD</StorageClass>
   </Contents>
   <Contents>
     ...
   </Contents>
</ListBucketResult>

You can perform metadata searches by entering a search in the Orbit Search tool or using the search_bucket tool. The S3 Search tool is an API extension to the AWS S3 search syntax. S3 Search is MongoDB-native, and addresses the S3 search through queries encapsulated in a SQL WHERE predicate. It uses Perl-Compatible Regular Expression (PCRE) search syntax. In the following examples, Zenko is accessible on endpoint http://127.0.0.1:8000 and contains the bucket zenkobucket.

Search for objects with metadata “blue”:

$ node bin/search_bucket -a accessKey1 -k verySecretKey1 -b zenkobucket -q "x-amz-meta-color=blue" -h 127.0.0.1 -p 8000

Search for objects tagged with “type=color”:

$ node bin/search_bucket -a accessKey1 -k verySecretKey1 -b zenkobucket -q "tags.type=color" -h 127.0.0.1 -p 8000

Search for objects modified on March 23, 2018:

$ node bin/search_bucket -a accessKey1 -k verySecretKey1 -b testbucket -q "`last-modified` LIKE "2018-03-23.*"" -h 127.0.0.1 -p 8000

Zenko’s global metadata search capabilities play a fundamental role in guaranteeing your freedom to choose the best cloud storage solution while keeping control of your data.

Photo by Nick Karvounis on Unsplash

Simple, secure S3 object storage software for modern applications