The ultimate guide to object storage and IAM in AWS, GCP and Azure

The ultimate guide to object storage and IAM in AWS, GCP and Azure

Here is a brief overview of the architectural differences between AWS, GCP and Azure for data storage and authentication, and additional links if you wish to further deep dive into specific topics.

Working on Zenko at Scality, we have to deal with multiple clouds on a day-to-day basis. Zenko might make these clouds seem very similar, as it simplifies the inner complexities and gives us a single interface to deal with buckets and objects across all clouds. But the way actual data is stored and accessed on these clouds is very different.

Disclaimer: These cloud providers have numerous services, multiple ways to store data and different authentication schemes. This blog post will only deal with storage whose purpose is, give me some data and I will give it back to you. This means it addresses only object storage (no database or queue storage) that deals with actual data and authentication needed to manipulate/access that data. The intent is to discuss the key differences to help you decide which one suits your needs.

Storage

Each cloud has its own hierarchy to store data. For any type of object storage everything comes down to objects and buckets/containers. The below table gives a bottom-up comparison of how objects are stored in AWS, GCP and Azure.

Category AWS GCP Azure
Base Entity Objects Objects Objects also called blobs
Containers buckets buckets containers
Storage Class S3 Standard, S3 Intelligent-Tiering, S3 Standard-IA, S3 One Zone-IA, S3 Glacier, S3 Glacier Deep Archive Multi-Regional Storage, Regional Storage, Nearline Storage, Coldline Storage Hot, Cool, Archive
Region Regions and AZs Multi-regional Azure Locations
Underlying service S3, S3 Glacier Cloud Storage Blob Storage
Namespace Account Project Storage Account
Management Console, Programmatic Console, Programmatic Console, Programmatic

Keys

Following the traditional object storage model, all three clouds (AWS, GCP and Azure) can store objects. Objects are identified using ‘keys’. Keys are basically names/references to the objects with the ‘value’ being actual data. Each one has it’s own metadata engine which allows us to retrieve data using keys.  In Azure storage these objects are also called “blobs”. Any key that ends with a slash(/) or delimiter in case of AWS is treated as a PREFIX for the underlying objects. This helps in with grouping objects in a folder like structure and can be used for organizational simplicity.

Limitations:

  • AWS: 5TB object size limit with 5GB part size limit
  • GCP: 5 TB object size limit
  • Azure: 4.75 TB blob size limit with 100 MB block size limit

Containers

In object storage everything is stored under containers, also called buckets. Containers can be used to organize the data or provide access to it but, unlike a typical file system architecture, buckets cannot be nested.

Note that in AWS and GCP containers are referred to as buckets and in Azure they are actually called containers.

Limitations:

  • AWS: 1000 buckets per account
  • GCP: No known limit on a number of buckets. But there are limits for a number of operations.
  • Azure: No limit on the number of containers

Storage Class

Each cloud solution provides different storage tiers based on your needs.

AWS:

  • S3 Standard: Data is stored redundantly across multiple devices in multiple facilities and is designed to sustain the loss of two facilities concurrently with 99.99 % availability, 99.999999999% durability.
  • S3 Intelligent-Tiering: Designed to optimize costs by automatically transitioning data to the most cost-effective access tier, without performance impact or operational overhead.
  • S3 Standard-IA: Used for data which is accessed less frequently, but requires rapid access when needed. Lower fee than S3 Standard but you are charged a revival fee.
  • S3 One Zone-IA: Same as standard-IA, but data is stored only in one availability zone. It will be lost in case of an availability zone destruction
  • S3 Glacier: Cheap storage suitable for archival data or infrequently accessed data.
  • S3 Glacier Deep Archive: Lowest cost storage, used for data archival and retention which may be accessed only twice a year.

GCP:

  • Multi-Regional Storage: Typically used for storing data that is frequently accessed (“hot” objects) around the world, such as serving website content, streaming videos, or gaming and mobile applications.
  • Regional Storage: Data is stored in the same region as your google cloud dataPRoc. Has higher SLA than multi-regional (99.99%).
  • Nearline Storage: Available both multi-regional and regional. Very low-cost storage used for archival data or infrequently accessed data. There are high operation costs and data retrieval costs.
  • Coldline Storage: Lowest cost storage, used for data archival and retention which may be accessed only once or twice a year.

Azure:

  • Hot: Designed for frequently accessed data. Higher storage costs but lower retrieval costs.
  • Cold: Designed for data which is typically access once in a month. It has lower storage costs and higher retrieval costs as compared to hot storage.
  • Archive: Long term backup solution with the cheapest storage costs and highest retrieval costs.

Regions

Each cloud provider has multiple data centers, facilities and availability zones divided by regions. Usually, a specific region is used for better latencies and multiple regions are used for HA / geo-redundancy. You can find more details about each cloud provider storage specific region below:

Underlying service

AWS, GCP and Azure combined have thousands of services which are not just limited to storage. They involve and are not limited to compute, databases, data analytics, traditional data storage, AI, machine learning, IOT, networking, IAM, developer tools, migration, etc. Here is a cheat sheet that I follow for GCP. As mentioned before we are only going to discuss actual data storage services.

AWS provides Simple Storage Service(S3) and S3 Glacier, GCP uses its Cloud Storage service and Azure uses Blob storage. All these services provide massively scalable storage namespace for unstructured data along with their own metadata engines.

Namespace

Here is the place the architecture of each cloud deviates from each other. Every cloud has its own hierarchy. Be aware that we are only discussing the resource hierarchy for object storage solutions. For other services, this might be different.

AWS: Everything in AWS is under an “account”. In a single account there is one S3 service which has all the buckets and corresponding objects. Users and groups can be created under this account. An administrator can provide access to the S3 service and underlying buckets and the service to users and groups using permissions, policies, etc. (discussed later). There is no hard limit on the amount of data that can be stored under 1 account. The only limit is on the number of buckets which defaults to 100 but can be increased to 1000.

GCP: GCP’s hierarchy model is ‘Projects’. A project can be used to organize all your Google cloud services/resources. Each project has its own set of resources. All projects are eventually linked to a domain. In the image below, we have a folder for each department and each folder has multiple projects. Depending on the project requirements and current usage, the projects can use different resources. The image shows the current utilization of the resources of each project. It’s important to note that every service will be available for every project. Each project will have its own set of users, groups, permissions, etc. By default you can create ~20 projects on GCP, this limit can be increased on request. I have not seen any storage limits specified by GCP except for the 5TB single object size limit.

Graph credits

Azure: Azure is different from both GCP and AWS. In Azure we have the concept of storage accounts. An Azure storage-account provides a unique namespace for all your storage. This entity only consists of data storage. All other services can be accessed by the user and are considered as separate entities from storage accounts. Authentication and authorization are managed by the storage account.

A storage account is limited to storage of 2 PB for the US and Europe, 500 TB for all other regions, which includes the UK. A number of storage accounts per region per subscription, including both standard and premium accounts is 250.

Management

All cloud providers have the option of console access and programmatic access.

Identity and Access Management

Information security should ensure proper data flow and the right level of data flow. Per the CIA triad, you shouldn’t be able to view or change the data that you are not authorized to and should be able to access the data which you have right to. This ensures confidentiality, integrity and availability (CIA). The AAA model of security needs authentication, authorization and accounting. Here, we will cover authentication and authorization. There are other things that we should keep in mind while designing secure systems. To learn more about the design considerations I would highly recommend going through learning more about security design principles by OWASP and the OWASP Top 10.

AWS, GCP and Azure provide solid security products with reliable security features. Each one has its own way of providing access to the storage services. I will provide an overview of how users can interact with the storage services. There is a lot more that goes on in the background than what will be discussed here. For our purpose, we will stick to everything needed for using storage services. I will consider that you already have an AWS, GCP and Azure account with the domain configured (where needed). This time I will use a top-down approach:

 

Category AWS GCP Azure
Underlying Service AWS IAM GCP IAM AAD, ADDS, AADDS
Entities Users/groups per account users/groups per domain per project users/groups per domain
Authentication Access Keys / Secret Keys Access Keys / Secret Keys Storage Endpoint, Access Key
Authorization roles, permissions, policies Cloud IAM permissions, Access Control Lists(ACLs), Signed URLs, Signed Policy Documents domain user permissions, shared keys, shared access signatures
Required details for operations Credentials, bucket name, authorization Credentials, bucket name, authorization Credentials, storage account name, container name

Underlying Service

AWS: AWS Identity and Access Management(IAM) is an AWS web service that helps you securely manage all your resources. You can use IAM to create IAM entities (users, groups, roles) and thereafter provide them access to various services using policies. IAM is used for both authentication and authorization for users, groups and resources. In other clouds there can be multiple IAM services for multiple entities but in AWS for a single account there is only one point of authentication and authorization.

GCP: GCP IAM is similar to AWS IAM but every project will have its own IAM portal and its own setup if IAM entities (users, groups, resources).

Azure: Azure uses the same domain services as Microsoft and is known to have a very stable authentication service. Azure supports three types of services: Azure AD(AAD), active directory domain services(ADDS – used with windows server 2016, 2012 with DCPromo) and Azure active directory domain services(AADDS – managed domain services).

Azure AD is the most modern out of the three services and should be used for any enterprise solutions. It can sync with the cloud as well as on-premise services. It supports various authentication modes such as cloud-only, password hash sync + seamless SSO, pass-through authentication + seamless SSO, ADFS, 3rd party authentication providers. Once you have configured your AD, you use RBAC to allow your users to create storage accounts.

Entities

All cloud providers have the concept of users and groups. In AWS there is a single set of users and groups across an account. In GCP there is a single set of users and groups in every project. In Azure the users and groups depend upon how the domain was configured. Azure AD can sync all users from the domain or an admin can add users on the fly for their particular domain.

Authentication

Credentials is a way for the end-user to prove their identity. By now you might have figured out that the services that help us create users will also provide us access to the storage services. This is true in the case on AWS and GCP, but not for Azure.

For AWS and GCP their respective IAM services allow us to generate a pair of Access Key and Secret Key for any user. These keys can later be used by the users to authenticate themselves to use cloud services which include AWS S3 and GCP cloud storage. For Azure the authentication for the containers is managed by the storage account. When a storage account is created, it creates a set of keys and an endpoint along with it. These keys and the endpoint along or the domain credentials are used for authentication.

Authorization

Once a user has proved their identity, they need proper access rights to interact with the S3 buckets or GCP buckets or Azure containers.

AWS: In AWS this can be done in multiple ways. User can first be given access to S3 services using roles/permissions/policies and then on then can be given bucket level permissions using bucket policies or ACLs.  Here is a small tutorial on how can a user give permissions for an S3 bucket. There are many other ways you can access buckets, but it’s always good to use some kind of authentication and authorization.

GCP: In GCP every project has its own IAM instance. Similar to AWS, you can control who can access the resource and how much access they will have. For our use case, this can be done using Cloud IAM permissions, Access Control Lists(ACLs), Signed URLs or Signed Policy Documents. GCP has a very thorough guide and documentation on these topics. Here is the list of permissions that you might want to use.

Azure: Azure has a lot of moving pieces considering it uses Azure AD as the default authentication mechanism. For now, we will assume that you are already authenticated to AD and only need to access the resources inside a storage account. Every storage account has its own IAM which you can provide a domain user permissions to access resources under the storage account. You can also use shared keys or shared access signatures for authorization.

Required Details for Operations

Now that we have authentication and authorized to our storage services we need some details to actually access our resources. Below are the details required for programmatic access:

  • AWS S3: Access Key, Secret Key, Bucket name, region(optional)
  • GCP Cloud storage: Access Key Secret Key, Bucket Name
  • Azure: Storage Account name, Storage endpoint, Access Key, container name

 

This concludes my take on the key differences I noticed in a multi-cloud storage environment while working with the multi-cloud data controller, Zenko.

Let me know what you think or ask me a question on forum.

This is how to avoid bad mistakes using Azure and Amazon storage API

This is how to avoid bad mistakes using Azure and Amazon storage API

It’s easy to make mistakes when developing multi-cloud applications, even when only dealing with object storage API.  Amazon S3 and Azure Blob Storage are similar models but with differing semantics and APIs, just like Google Cloud Storage API. Amazon S3 is a RESTful API providing command syntax for create (PUT), access (GET), and delete (DELETE) operations on both buckets and objects, plus access to bucket metadata (HEAD).

Applications that need to support both API have to be developed very carefully to manage all corner cases and different implementations of the clouds. Luckily, Zenko’s team is dedicated to finding those corner cases and solve them once for everybody. Zenko CloudServer translates standard Amazon S3 calls to Azure Blob Storage, abstracting complexity. The design philosophy of CloudServer’s translations are:

  • S3 API calls follow the Amazon S3 API specification for mandatory and optional headers, and for response and error codes.
  • The Azure Blob Storage container is created when the application calls S3 PUT bucket, and the container is assigned the name given in the PUT bucket request.
  • Bucket names must follow AWS naming conventions and limitations.

Try Zenko S3-to-Azure in a free sandbox


Non-exhaustive API Comparison: AWS versus Azure

Put Bucket  / Create Container

  • Bucket naming restrictions are similar but not the same.
  • CloudServer returns an InvalidBucketName error for a bucket name with “.” even though allowed on AWS S3.
  • Canned ACLs can be sent as part of the header in an AWS S3 bucket put call.
  • CloudServer uses Azure metadata x-ms-meta-scality_md_x-amz-acl header to store canned ACLs in Azure containers.

Get Bucket / List Blobs

  • The AWS S3 “Marker” parameter expects a object key value, but Azure does not have an implementation to retrieve object listings after a certain key name alphabetically (can only retrieve blobs after an opaque continuation token).
  • AWS S3 sends back the object owner in each listing entry XML but Azure does not include object owner information in listings.

Delete Bucket / Delete Container

  • While AWS S3 returns an error if a bucket is non-empty, Azure deletes containers regardless of contents. Zenko CloudServer makes a call to lists blobs in the container first and returns the AWS S3 BucketNotEmpty error if not empty.

Put Object /  Put Blob

  • CloudServer only allows canned ACLs, except aws-exec-read and log-delivery-write. ACLs are stored as blob metadata. From the Azure side, there are no object ACLs so behavior is based on container settings.
  • Only the STANDARD setting is allowed as “storage class”
  • Setting object-level encryption is not allowed through headers. The user must set encryption through Azure on an account basis.

Delete Object / Delete Blob

  • AWS S3 has delete versions and offers an MFA requirement for delete. MFA header is not supported in CloudServer.

Get Service / ListContainers

  • AWS S3 returns a creation date in its listing, while Azure only stores the last-modified date.

Initiate Multi-part Upload (MPU) / no correspondent on Azure

  • A MPU is treated as a regular Put Blob call in Azure. CloudServer cannot allow users to initiate more than one MPU at a time because there is no way of renaming or copying a committed block blob to the correct name efficiently, and any uncommitted blocks on a blob are deleted when the block blob is committed (preventing an upload to the same key name). To allow for initiate MPU, Zenko CloudServer creates a “hidden” blob with a unique prefix that is used for saving the metadata/ACL/storage class/encryption of the future objectListing of ongoing MPUs.

Put Part / Put Block

  • Azure has a size limit of 100 MB per block blob. AWS S3 has a max part size of 5 GB.
  • Azure also has a 50,000-block maximum. At 100 MB max per block, this comes out to around 5 TB, which is the maximum size for an AWS S3 MPU. Putting the same part number to an MPU multiple times may also risk running out of blocks before 5 TB size limit is reached.

The easiest way to write multi-cloud applications is to use the open source projects Zenko and Zenko CloudServer.

Photo by Simon Buchou on Unsplash

How to find keys and account info for AWS, Azure and Google

How to find keys and account info for AWS, Azure and Google

When configuring storage locations in Zenko Orbit, you need to enter some combination of access key, secret key, and account name. All this information varies by cloud provider and it can be annoyingly complicated to find all that information. This cheatsheet will help  you configure access to AWS, Azure and Google for Zenko Orbit.

This document assumes you know how to log into the AWS, Azure and Google cloud portals and that you know how to create a storage location in Zenko Orbit.

AWS

Location name = any descriptive name; “aws-location” “aws-demo” etc.

AWS Access Key and AWS Secret Key: https://console.aws.amazon.com/iam/home?#/security_credential

  • You may or may not have IAM roles set up in your account. If not, press the Continue to Security Credentials button
  • Press the “+” sign next to “Access keys (access key ID and secret access key)”
  • Prease the Create New Access Key Button
  • Window will appear with your Access and Secret Key (screenshot below)
  • Copy/paste your Secret Key somewhere safe.

Target Bucket Name = name of existing Bucket in Amazon S3

Azure

Location name = any descriptive name; “azure-location” “azure-demo” etc.

Azure Storage Endpoint = the “Blob Storage Endpoint” located on the top right of the Overview tab for your storage account (screenshot below).

Azure Account Name = the name of your Azure storage account located on the top of the Azure Portal (screenshot below – “scalitydemo” is Azure Account Name).

Azure Access Key = the “key1 / Key” visible when you select Access Key in the Azure Portal.

Target Bucket Name = name of existing Container in Azure

Google

Location name = any descriptive name; “gcs-location” “gcs-demo” etc.

GCP Access Key and Secret Key = navigate to GCP Console / Storage / Settings / Interoperability Tab (see screenshot below

Target Bucket Name = name of existing Bucket in Google Cloud Storage

Target Helper Bucket Name for Multi-part Uploads = name of existing Bucket in Google Cloud Storage

(Google Cloud Storage handles MPU in such a way that Zenko requires a second bucket for temporary staging purposes)

Richard Payette

How to Replicate Data Between Digital Ocean Spaces and Amazon S3 buckets

How to Replicate Data Between Digital Ocean Spaces and Amazon S3 buckets

Zenko’s multi-cloud capabilities keep expanding with the addition of Digital Ocean Spaces as a new endpoint within the Orbit management panel. Digital Ocean is a cloud hosting company that has experienced explosive growth since 2013 with a very simple to use product. Spaces is the latest addition to the popular Digital Ocean cloud offering: a simple object storage compatible with Amazon S3 API.

The video below demonstrates how to set up data replication between an Amazon S3 bucket and a Digital Ocean Space. One might want to setup such replication to keep multiple copies of a backup file for example, or to increase resilience of an application in those rare cases when a cloud is not available. Another typical use case is to optimize costs, using the best features of many clouds while keeping costs under control.

The newly released version of Zenko Orbit lets you seamlessly replicate objects between Amazon S3, Digital Ocean, Google Cloud, Azure, Wasabi, Scality RING and local storage. You can test the replication capabilities easily by creating an account on Orbit and then connecting it to your accounts on Amazon AWS and Digital Ocean Space. Watch the video for more details. If you have questions don’t hesitate to ask on Zenko forums.