Cooking up the bright future of the multi-cloud data controller

Cooking up the bright future of the multi-cloud data controller

Just like you want to keep control of your kitchen and  of your ingredients while cooking, we believe that everybody needs to keep control of their data. Zenko was designed from the ground up to be the chef’s choice of those who create value with data. Like picking the best ingredients from the shelf, Zenko’s multi-cloud support for S3, Azure and Google APIs makes it easy to prepare cloud applications with the best storage for the task.

We’re happy to share what’s in Zenko 1.0 and a brief view of what’s cooking for the rest of 2018.

Multi-Cloud Storage Backends

Zenko is the Esperanto of cloud storage APIs. At the moment, Zenko supports Amazon S3, Google Cloud Storage (GCS), Microsoft Azure, Scality S3Connector RING, Scality  sproxyd RING, DigitalOcean Spaces and Wasabi Storage. Other S3-compatible backends are easy to add by extending the open source CloudServer component, with Blackblaze support already under development.

One-to-Many Replication

One-to-many cross-region replication (CRR) is a bucket-level configuration that enables automatic, asynchronous copying of objects across buckets in different clouds, private or public. This style of replication works particularly well for securing data in different locations, for example, copying  video files to the closest  distribution networks or keeping multiple copies for backups . Of course our favorite CRR use case is using replication to protect  against rising cloud egress costs. Like others have painfully discovered, once multiple terabytes of data is stored in one cloud, moving away will incur in a very high bill. By starting with multiple copies of the data, most of the control goes back to the user: switching from AWS to Azure won’t be as expensive or time consuming if the same data is uploaded in both.

Zenko Orbit offers an easy UI to track the progress of all replication processes across each target cloud. Replication jobs can also be paused when a target cloud is known to be down, and then resumed (manually or based on time) when it’s back up. Of course replication jobs that fail are automatically retried until they succeed, too.

Aside from replication-related data, like status and throughput, Orbit also provides other useful statistics like total data managed, Zenko capacity, memory usage, number of objects/ object versions managed, and amount of data managed between cloud destinations.

Local Cache (w/ Storage Limit)

Without a local cache, each objects would be stored immediately in a bucket on a public cloud. In that case, any replication will incur in egress costs because each object would have to go from the first cloud destination to the new ones, like a daisy chain. With a local cache, the objects are replicated by going straight to their destination, like a broadcast to the cloud.

Overview of Zenko’s replication engine.

Zenko’s local cache saves charges by queuing objects in a user-defined local cache before they’re uploaded to a target cloud. Users can set a local cache location on their preferred storage backend, set a storage limit for how large the cache can become. All objects are automatically cleared from the cache after the replication process completes.

Metadata Search

Objects are usually stored with metadata to describe the object itself. For a video production company for example, metadata can include details like “production ready” or the department that produced the file or the rockstar featured in a video.

With metadata search, users can actually search metadata on objects written through Zenko. Simply add tags and metadata to the object (in Orbit), write object to target cloud and then search for the object later.

Future versions will allow users to also ingest metadata for objects that are written natively to the clouds, so that Zenko users will be able to easily import existing catalogues.

Lifecycle: Expiration Policies

Reclaim storage space by defining lifecycle expiration policies (specified in Amazon’s S3 API) to any bucket managed through Zenko. The policies can be applied to both versioned and non-versioned buckets, and are triggered after a number of days (defined by user) have passed since the object’s creation.

Zenko features offer users the ability to personalize data workflows based on metadata, treating Zenko as an S3-compatible target that replicates data to remote servers in the cloud.

Future versions will add storage targets based on network filesystems like NFS and SMB. Also there are plans to radically improve the workflow management capabilities of Zenko with powerful API and a graphical UI. Get involved with Zenko development by following Zenko on GitHub and get on Zenko Orbit to test all Zenko features.

A free library for Number Theoretic Transform-based Erasure Codes

A free library for Number Theoretic Transform-based Erasure Codes

Our team at Scality has been working for years on QuadIron, an open source library which implements Fast Fourier Transform (FFT) based erasure codes in binary, prime, and extension fields, and a set of optimizations to make them practical for a very large number of symbols. These FFT based erasure codes could be useful for applications like decentralized storage over the internet. We’ll be presenting QuadIron at SNIA Storage Developer Conference in Santa Clara (CA) on Tuesday Sep 25.

As drive density continues to be driven higher in keeping with Moore’s Law, the price of storage continues to fall. This makes extra data copies cheaper, data can be stored reliably on relatively unreliable Internet servers by making extra copies. For example, generating and spreading hundreds of fragments from a file makes it possible to reconstruct the data while having only a fraction of the total data available. The QuadIron library provides a framework to  generate such erasure codes efficiently.

Erasure codes are a form of error correction code that use bit erasures transforming a message into longer messages made of pieces such that the original message can be recovered from a subset of those pieces.

A C(n,k) erasure code is defined by n=k+m, k being the number of data fragments, m being the number of desired erasure fragments. In an application it is required to transmit the n fragments. A Maximum Distance Separable (MDS) code guarantees that any k fragments can be used to decode a file. Erasure codes can be either systematic or non-systematic. Systematic codes generate n-k erasure fragments and therefore maintain k data fragments. Non-systematic codes generate n erasure fragments. In the case of systematic codes, we try to retrieve primarily the k data fragments if possible because there is nothing to decode. A decoding is necessary only if one or more data fragments are missing. In the case of non-systematic codes, we need to decode k fragments. Erasure codes can also be compared by their sensitivity to the rate r=k/n, which may or may not impact the encoding and decoding speed. Another comparison criterion is the support of adaptive rate: does the erasure code allows to change k and m dynamically, without having to regenerate the whole set of erasure fragments. Another critical property is called the ‘confidentiality’ which is determined if an attacker can partially decode the data if he obtains less than k fragments. Finally, we can also compare erasure code according to their repair bandwidth, i.e. the number of fragments required to repair a fragment. To sum up, here is a list of codes properties that are of interest for us:

  • MDS/non-MDS
  • Systematic/Non-systematic
  • Encoding/Decoding speed according to various n
  • Encoding/Decoding speed predictivity and stability acc/ to n
  • Rate sensitivity
  • Adaptive vs non-adaptive rate
  • Confidentiality
  • Repair bandwidth

Reed-Solomon (RS) codes are MDS codes constructed from Vandermonde  or Cauchy matrices that support both systematic and adaptive rates properties. The RS encoding process is traditionally performed in ways that lead to a high complexity. The topic of optimizing those codes has been widely discussed but mostly around hardware optimizations.

Low-Density-Parity-Check (LDPC) codes are also an important class of erasure codes and are constructed over sparse parity-check matrices. Although initially used in networking applications, some researchers recently showed that it is possible to use them in distributed storage scenarios. Those codes, which even though require to store n=k+m fragments (like MDS codes), need to retrieve k*f fragments to recover the data (instead of only k for MDS codes), f being called the overhead or the inefficiency. In general f oscillates between 1.10 and 1.30 for n <100, and when n > 1000, f is approaching 1.0. Those codes are more sensible to network latency because of the extra fragments, due to the overhead, that they need to retrieve, so in cases where latency can be important RS codes seems more interesting than LDPC codes. More recently hybrid-LDPC schemes have reduced the overhead to k+f with a very small f. Also it is possible to design LDPC codes which beat RS codes when taking into account the repair bandwidth, because RS codes always need to retrieve k fragments to be able to repair the data, while it is possible to design LDPC codes that require less than k fragments for the repair process. However:

  • LDPC are not MDS: it is always possible to find a pattern (e.g. stopping sets) that cannot decode (e.g. having only k fragments out of n).
  • You can always find/design an LDPC code optimized for few properties (i.e. tailored for a specific use case) that beats other codes on those few properties, but there is no silver bullet: it will be sub-optimal for the other properties (its a trade-off, e.g. good for large n and with an optimal repair bandwidth, but not good for small n and cannot support adaptive rate): these cannot be used in a generic library.
  • Designing a good LDPC code is some kind of black art that requires a lot of fine tuning and experimentation. Ultimately an LDPC code optimal for all the interesting properties for a given use case could exist but would be very complex and/or would only be available in a commercial library.

Recently some other types of codes, called Locally-Repairable-Codes (LRC) have tackled the repair bandwidth issue of the RS codes. They combine multiple layers of RS: the local codes and the global codes. However those codes are not MDS and they require an higher storage overhead than MDS codes.

Fast Fourier transform (FFT) based RS codes remain relatively simple, and can be used to perform encoding on finite fields with clearer and lower announced complexities therefore having a good set of desirable properties. We focus our research on optimizing their encode and decode speed.

Since chosen finite fields dominate the computational complexities of FFT operations, we investigate two types of finite field: prime finite fields and binary extension finite fields. For each type, there are different approaches to accelerate FFT operations.

These codes offer superior encoding performance compared to matrix-based RS erasure codes for applications requiring more than 24 symbols, and are simpler than LDPC codes, while supporting all the desirable properties: Fast, MDS, systematic or non-systematic, confidential when non-systematic, predictive and rate insensitive. The systematicity is not critical for a decentralized storage application because as k augments the chance of losing a data fragment also augments. The optimization of repair bandwidth are not critical for a decentralized storage archive application as people download the files in their entirety. The most important property for us remains the MDS property as a rock solid contract: being sure than if k fragments are available then the data is recoverable.

All of those properties make us think that the QuadIron library and NTT based erasure codes are suitable for a decentralized storage archival application over the Internet.

The library is open source (BSD 3-clause license), available on GitHub. Ask questions on the forum.

Photo by Ant Rozetsky on Unsplash.