How to replicate data across clouds with Zenko Orbit

How to replicate data across clouds with Zenko Orbit

The video below shows two demos of Zenko solving real-life issues faced by developers that need to replicate data across multiple clouds. Developers need to support multiple storage options for their applications and dealing with the complexity of multiple APIs is hard. Even without writing applications with multi-cloud support, the egress costs of transferring large amounts of data across clouds can force choice, reducing options. Zenko is designed to empower developers giving them freedom to choose the best storage solutions for their application while keeping control of where data is stored.

The video below shows two demos with real-life use cases. The first use case is that of a developer who prefers to use Amazon Glacier as archive of choice and wants to use Google Cloud machine learning service. Some of the data needs to be available in Google Cloud for faster processing but Glacier is slow and egress costs can be expensive. Zenko lets easily manage this multiple cloud scenario with a single policy. In addition, it lets developers pick the most cost-effective combination of storage options: without Zenko, if data is stored in AWS but analyzed in Google, you would incur in expensive charges while moving data out of Amazon.

The second demo shows the ability of Zenko to replicate data across three regions in AWS. One known limitation of S3 is that replication is limited to two regions. Replicating data to more than two regions allows to increase data resiliency and security. For example, companies that need a point of presence in Asia, Europe and US West can keep data closer to the point of consumption. Similar challenges for companies that collect data and need to comply with data sovereignty regulations like GDPR. Zenko’s replication augments AWS’s basic capability.

Enjoy the demos and try them out for free on Zenko Orbit.

Achieving Ultimate Data Durability through Multi-Cloud

In our last Blog, we introduced the new Zenko Orbit portal. Orbit has been designed to radically simplify the management of multi-cloud storage, through easy point-and-click actions. Now, it’s time for us to look at the business potential and impact with a very interesting use case for multi-cloud storage.

So first, what is today’s typical model of using cloud storage services? In  most cases, applications are written to use a single cloud such as AWS S3, Microsoft Azure Blob Storage, or Google Cloud Storage. All of these clouds are intrinsically highly-durable, with Service Level Agreements (SLA’s) for up to to an incredible “eleven 9’s” (99.999999999%) data durability. That is an incredibly high number, but it is also very important to understand that it still means data (objects) can be lost. This is a numbers game, and the more objects you store, the greater the chances are that you will lose some data even with this level of durability. The recourse offered by cloud vendors for violation of SLA’s are to provide service credits for future use of their cloud.

Another key consideration is data availability. So durability means “my data is safe”, whereas availability is about “can I get to my data”. Typical availability SLA’s are in the range of 99.0% to 99.95%, considerably lower than data durability. As we have seen over the course of the last few years, some smaller cloud storage services have disappeared entirely, and some of the bigger ones have suffered brief (minutes to several hours) of service outages to one or more of their regions, or even the service entirely. This has lead to some very prominent and publicized outages of well-known applications and services we all know and love. These include popular online video and entertainment services, ride-sharing, travel and communications services – and even enterprise SaaS applications that many business customers depend upon for their own operations.

Is Multi-Cloud Storage a Better Solution?

This problem made us ask the question: what can we do to improve both the durability AND the availability of our data in the cloud? What happens if instead of storing data in just one cloud region or cloud service, we use multi-cloud replication to store two copies of the data?

In terms of durability, we now have two independent services each of which has a durability of eleven 9’s. By storing data across both clouds, we can increase our data durability to “22 9’s” that makes a data loss event a statistically negligible probability. Furthermore, I can take advantage of immutability through object versioning in one or more of the cloud services, for even greater protection. I have also gained disaster recovery (D/R) protection, meaning the data is protected in the event of a total site disaster or loss. So in the end – this is essentially bulletproof data protection against most known events.

In terms of data availability, what are the chances that two cloud regions in one service (for example, AWS US East and AWS US West) are unavailable at the same time? Stretching this further, what are the chances that two INDEPENDENT cloud services such as AWS S3 and Azure Blob Storage are unavailable at the same time? The calculation can reveal the result, but it’s an exceedingly small probability so it essentially ensures that my data will be available.

What is the Cost of this Ultimate Data Protection?

Since we can likely agree that multi-cloud storage has positive benefits on data durability and availability – as with all good things, this has a cost we need to investigate. The cost of storing in multiple clouds has a few different components: storage capacity ($ per GB per month), bandwidth ($ per GB transferred) and transactions (number of PUTs, GETs, DELETEs).

Since most (but not all) cloud storage services make data INGESTED (written) into the cloud free, but data EGRESS (read) out of the cloud to the internet incurs fees, we need to take a look at the cost of not only storing the data but also the cost of EGRESS to replicate it to the second cloud. Note that with Zenko running in the cloud, we get the first copy into the cloud without any bandwidth charges.

One way to look at the bandwidth charges are to create a simplified TCO model. To make the numbers easy to start, we modeled 1 Petabyte (admittedly a lot of data) stored into the cloud for a period of 3-years. We added into that the cost of storage, bandwidth and the transaction fees also charged by most cloud services, and then compared the cost of one-copy stored in one cloud versus two-copies in two clouds. Our first observation was that the cost of bandwidth is a relatively small percentage of the overall TCO, typically 0 to % of the TCO:

multicloud tco calculator

View the full infographic

So this of course depends on the cloud vendors we analyzed, since there are variations in vendor charges and most interestingly some cloud vendors do NOT charge for egress charges out of their cloud. The model looks at a baseline in AWS S3 (single copy in US East alone), and compares it to two replicated copies in the the following services:

  • AWS US East and AWS US West (using S3 Cross Region Replication, which incurs egress charges across regions)
  • AWS US East and Azure Blob Storage Hot Tier (incurs Internet egress charges out of AWS S3)
  • Azure Hot Tier and AWS US East (this avoids bandwidth charges, since Azure Hot tier does not have data egress bandwidth charges)
  • Azure Hot Tier and Wasabi (Azure Hot tier with no egress charges to the newer Wasabi low cost storage service)
  • Azure Hot Tier and Backblaze B2 (Azure Hot tier with no egress charges to the well-known low cost storage service from Backblaze)

Our second observation: it is indeed possible to store two replicated copies of data in two cloud services cost effectively. In fact, it is even possible to intelligently construct a scenario where two replicated copies are lower in TCO than a single copy in better known major cloud services.

Zenko

Zenko and the Orbit platform provide an enabler to simplify this new world of multi-cloud storage. As we’ve looked at, there are dramatic data protection benefits by leveraging multi-cloud and solutions such as Zenko Orbit will provide ever simpler ways to take advantage of them.

Zenko

View the full infographic

Zenko Roadmap, what’s next for our open source multi-cloud data controller

Zenko Multi-Cloud Data Controller

Our Zenko Multi-Cloud Data Controller was launched into the open source on July 11, 2017, you can read full the press release here. In a nutshell, it’s a new solution for managing data both in public cloud services and also in local storage.

With Zenko under an Apache 2.0 open source license, our goal is for developers to freely use the unified S3 API and cloud storage capabilities in new applications. This means its free for use and distribution in your enterprise and embedded apps, edge devices and any “next great thing” you can think of. Zenko provides your apps with access to the AWS S3 public cloud (supported now with the launch product), and later we’ll support Microsoft Azure Blob Storage and Google Cloud Storage too (rollout details are below). The purpose is to make it easy as possible for your apps to access any cloud, even those that do not natively support the AWS S3 API. Right now, Zenko can store Bucket data locally on your machine in Docker Volumes, optionally in-memory (useful for fast transient processing such or for testing) and in our Scality RING object store for on-premises and “private cloud” style storage.

The first release of Zenko is based on our previously launched open source S3 Server Docker instance, and uses Docker Swarm to manage deployment and orchestrate HA/failover across S3 Server containers. This is documented here on the Zenko.io website, and we also are working to provide more docs on configuring public cloud integration with AWS now, and the others soon.

Looking at the bigger picture, we’ll also enhance Zenko in the coming months with some new features that will make it even more capable. This includes a new open source policy-based data management engine called Backbeat, and a metadata search engine called Clueso. Backbeat is all about enabling movement and mobility of data from on-premises Buckets to cloud Buckets through asynchronous replication. Later this year we’ll also provide Lifecycle management for auto-expiration and transitioning (tiering) objects to the cloud. Clueso lets you search across clouds using the S3 metadata attributes you can already store with your objects.

To help you plan your roadmaps with these new features in mind, here is the rollout plan for these new capabilities in the Zenko open source.

Zenko Rollout Plan

Our philosophy is to offer the features early to provide access to them as soon as possible, to get your feedback, comments and contributions on them as a community project. With that background, Zenko and its features will rollout as follows.

Zenko open source features supported at July Launch

  • Unified S3 API
  • HA/failover across two S3 containers managed by Docker Swarm
  • AWS v4 & v2 authentication (with access keys stored in a credentials file)
  • Bucket location control for object data storage in:
    • Local storage / Docker volumes
    • In-memory (fast transient processing)
    • Scality RING
    • AWS S3 (any S3 region endpoint)

In the late July 2017 time frame we will publish the following new capabilities:

  • Bucket location control and data storage in Microsoft Azure Blob Storage
  • Backbeat for Zenko to Zenko Cross-Region Replication (CRR) with local storage

In the September 2017 time frame we are targeting to deliver:

  • Clueso engine for federated searches on S3 metadata attributes (independent of data location)
  • Bucket Lifecycle for object expiration
  • Backbeat for Zenko replication to the AWS S3 cloud (CRR)

And by the end of 2017:

  • Backbeat for Zenko replication to Microsoft Azure Blob Storage replication (CRR)
  • Bucket Lifecycle for tiering to AWS S3

If you don’t see what you need, let us know what other cool features we should plan for Zenko!

GitHub is the best place for contributions and user comments and questions, thank you!