Many junior devops engineers have floated the pipe dream that you could simply point any application to any cloud storage without ever touching the code. As it turns out, that’s not such a pie-in-the sky idea. Zenko abstracts all major clouds under a single namespace and a single API, namely the AWS S3 API and this removes all the headaches of support multiple APIs from the get go.
It’s a common misconception that cloud storage APIs are similar enough that moving from one provider to another is just a matter of changing a host name in a configuration file. This might have been mostly true in the early stage of cloud but as you’ll see, it’s far from being true now.
- Multipart Upload or how to efficiently upload large pieces of data
- Object-level tagging or how to assign easily searchable metadata to objects
- Object versioning or protecting against accidental deletion and providing rollback to your users
- Replication or how to make sure they’re always a copy of your data somewhere else
|Google Cloud Storage||Amazon S3|
|Multipart upload||The application needs the logic||API tracks the pieces|
|Object-level tagging||Not available||Supported since Nov, 2016|
|Object versioning||DELETE request without version moves from ‘master’ to ‘archive’. There is no concept of ‘version stack’.||DELETE without version specified applies DELETE marker to master. You still get the latest version of an object if master is deleted.|
|Replication||Data stored redundantly with Multi-Regional Storage in a fixed manner.||Flexible and dynamic control with Cross Region Replication API|
Though GCS does have a method for merging multiple object into a single larger one, it lacks a counterpart to AWS’s popular multipart upload API. Here’s how multipart upload (MPU) works on S3:
- You initiate the upload by creating a multipart upload object
- You upload the object parts in parallel over multiple HTTP requests
- After you have uploaded all the parts, you complete the multipart upload.
- Upon receiving the complete multipart upload request, Amazon S3 constructs the object from the uploaded parts.
In that model, S3 keeps track of all the uploaded parts of a MPU. For example, aborting an MPU will remove all associated parts and they take care of managing the state of your upload for you. Objects only appears in your bucket after all the uploading is done.
In GCS, you’re in charge of keeping track of each part, of piecing them together and you have to write the corresponding logic:
- You upload “parts” of your object as individual objects in a bucket
- Perform a compose method on that list of objects, limited to 32 item per operation
- You repeat the compose operation by batches of 32 until the entire full final object is stitched together.
This clearly is a cumbersome process, it’s possible to merge in parallel for faster stitching together of a large object but it’s not trivial and requires a somewhat complex logic on the client side.
Developers also need to keep in mind that GCS allows a maximum of 1024 parts, while S3 allows 10,000 and both share the same 5TB maximum file size.
Update: On June 21, 2018 GCS removed the limit on the number of components in a composite object. Learn more on our forum.
Object tagging is a way to categorize data with multiple key-value pairs. It’s a useful way to locate data and is much more powerful than object name prefix-based search. You can think of object tagging as similar to Gmail tags by opposition to filesystem folders. Objects tags can also influence S3 Lifecycle and cross region replication policies. This API is relatively new for S3 but unfortunately it has no equivalent in GCS yet.
This functionality can not be migrated over from S3 to GCS, so check if your application requires tagging.
Both GCS and S3 support object versioning and enable the retrieval of objects that are deleted or overwritten. But both implementation differ in subtle ways that make them not fully interchangeable.
Think of the AWS object versioning as a stack of versions ordered by time:
- Each object has a master version that always points to the most recent entry in the stack
- Any operation that doesn’t specify a version works on that master version
- This includes delete operations, ie deleting an object without specifying a version creates a DELETE MARKER
- It’s possible to get or delete a specific version by using a version ID
GCP behaves differently, for each object, it maintains a MASTER version and an ARCHIVE version:
- Deleting an object without specifying a version id moves it from master to archive and does not create a DELETE MARKER
- Deleting a master object by using its version ID permanently destroys its data and does not move it to the archive
- There’s no concept of a stack so even if an archive version of an object exists, deleting the master version does not promote the archive to master. A get operation on the object will return a 404 not found code.
These differences are not obvious and these two versioning implementations are not interchangeable.
Replication is a way to copy object across buckets in different geographical locations and increase both data protection and availability. It’s a storage best practice, keeping a remote copy is one of the best insurance and doubles your data durability.
S3 supports replication through their Cross Region Replication (CRR) API and supports two way synchronisation of buckets.
GCS doesn’t have a replication API and lacks the flexibility of S3 CRR but it can still redundantly store data across locations by specifying a Multi-Regional Storage bucket location. This means that GCS stores your data redundantly in at least two geographic places separated by at least 100 miles within the multi-regional location of the bucket but you cannot precisely control which region like with AWS.
Both GCS and S3 provide geo redundant storage but AWS implementation supports more locations, flexibility and API control.
Key takeaway: two incompatible cloud storage protocols
The GCP and AWS S3 API are not interchangeable and require significant adaptation of your application and client logic to migrate from one to the other. When looking at object storage compatible applications, S3 is clearly the most widely supported API by far. That’s why we decided to implement the Amazon S3 API for our multi-cloud controller, Zenko.
Try a sandbox version of Zenko very quickly from Zenko Orbit, our hosted management portal.