How to deploy Zenko 1.1 GA on bare metal, private or public cloud

How to deploy Zenko 1.1 GA on bare metal, private or public cloud

We have been working hard on Zenko 1.1 release, and finally, it is here! Thanks to the dedicated and tireless work of Zenko team, our newest release comes with an array of useful new features. Now is a good time to try Zenko: you can deploy it on a managed Kubernetes (Azure, Amazon, Google) or on Minikube for a quick test. But what if you want to run Zenko on bare metal or on your own cloud, we suggest you deploy on MetalK8s. It’s an open source opinionated distribution of Kubernetes with a focus on long-term on-prem deployments. MetalK8s is developed at Scality to provide great functionality while reducing complexity for users and delivering efficient access to local stateful storage.

This tutorial comes from our core engineering team, and we use it on a daily basis to deploy and test Zenko. This guide has been developed as a collective effort from contributions made in this forum post.

Here are the steps we are using to deploy Zenko 1.1 with our OpenStack-based private cloud. Let’s do this!

Part 1: Deploying MetalK8s

This tutorial creates Zenko instance distributed on three nodes, but you can always repurpose it for as many servers as you wish.

1. Create three instances with the following characteristics:

  • Operating system: CentOS-7.6
  • Size: 8 CPUs and 32GB of RAM

2. If you are deploying on a private cloud create the following volumes (type: SSD):

  • one volume with a 280GB capacity
  • two volumes with a 180GB capacity

3. Attach a volume to each instance

4. SSH into a node:

$ ssh -A centos@<node-ip>

Pro-tip: If you use ssh -A from your computer into the first node this will forward your authentication agent connection and allow native ssh access to the remaining nodes in your cluster.

5. $ sudo yum install git vim -y
   $ git clone https://github.com/scality/metalk8s
   $ cd metalk8s/

6. Checking out the current stable version of MetalK8s

$ git checkout tags/1.1.0
$ mkdir -p inventory/zenko-cluster/group_vars
$ cd inventory/zenko-cluster/
7. $ vim hosts

Copy the following in your hosts file and update the IPs to your instance IPs:

# Floating IP addresses can be specified using the var `access_ip=<ip-address>` on the line corresponding to the attached server
node-01 ansible_host=10.200.3.179 ansible_user=centos # server with the larger volume attached
node-02 ansible_host=10.200.3.164 ansible_user=centos # server with the smaller volume attached
node-03 ansible_host=10.200.2.27  ansible_user=centos # server with the smaller volume attached

[bigserver]
node-01

[smallserver]
node-02
node-03

[kube-master]
node-01
node-02
node-03

[etcd]
node-01
node-02
node-03

[kube-node:children]
bigserver
smallserver

[k8s-cluster:children]
kube-node
kube-master
8. $ vim group_vars/bigserver.yml

Run this statement and copy the following into bigserver.yml (this is for the server that will provision Zenko Local Filesystem)

metalk8s_lvm_drives_vg_metalk8s: ['/dev/vdb']
metalk8s_lvm_lvs_vg_metalk8s:
lv01:
size: 100G
lv02:
size: 54G
lv03:
size: 22G
lv04:
size: 12G
lv05:
size: 10G
lv06:
size: 6G

Note: /dev/vdb on the first line is a default location of a newly attached drive, if this location is already in use on your machine you need to change this part. For example:

/dev/vda
/dev/vdb
/dev/vdc
etc...
9. $ vim group_vars/smallserver.yml

Run this statement and copy the following into smallserver.yml

metalk8s_lvm_drives_vg_metalk8s: ['/dev/vdb']
metalk8s_lvm_lvs_vg_metalk8s:
lv01:
size: 54G
lv02:
size: 22G
lv03:
size: 12G
lv04:
size: 10G
lv05:
size: 6G

10. This step is optional but highly recommended:

$ vim group_vars/all

Paste this into the group_vars/all and save:

$ kubelet_custom_flags:
- --kube-reserved cpu=1,memory=2Gi
- --system-reserved cpu=500m,memory=1Gi
- --eviction-hard=memory.available<500Mi

This adds resource reservations for system processes and k8s control plane along with a pod eviction threshold, thus preventing out-of-memory issues that typically lead to node/system instability. For more info see this issue.

11. Return to metalK8s folder

$ cd ~/metalk8s

12. And run the virtual environment

$ make shell

13. Make sure that you have ssh access to each other node in your cluster and run the following:

$ ansible-playbook -i inventory/zenko-cluster -b playbooks/deploy.yml

Deployment typically takes between 15-30 minutes. Once it is done, you will see a URL for the Kubernetes dashboard access along with a username/password in the output of the last task.

Notes

If you forget this password or need access to it again, it is saved under:

metalk8s/inventory/zenko-cluster/credentials/kube_user.creds

The MetalK8s installation created an admin.conf file:

metalk8s/inventory/zenko-cluster/artifacts/admin.conf

This file can be copied from your deployment machine to any other machine that requires access to the cluster (for example if you did not deploy from your laptop)

MetalK8s 1.1 is now deployed!

Part 2: Deploying Zenko 1.1

1. Clone Zenko repository:

$ git clone https://github.com/scality/zenko ~/zenko
$ cd zenko/

2.  Grab fresh Zenko 1.1 release:

$ git checkout tags/1.1.0
$ cd kubernetes/

3. You will be provided with the latest version of helm from MetalK8s installation we did in part 1. Now it’s time to actually deploy Zenko instance on three nodes we have prepared.

Run this command:

$ helm install --name zenko --set ingress.enabled=true
--set ingress.hosts[0]=zenko.local
--set cloudserver.endpoint=zenko.local zenko

4. Wait about 15-20 minutes while the pods stabilize.

5. You can confirm that the zenko instance is ready when all pods are in the running state. To check:

$ kubectl get pods

Note

It is expected that the queue-config pods will multiply until one succeeds. Any  “Completed” or  “Error” queue-config pods can be deleted.

Zenko is now deployed!

Part 3: Registering your Zenko instance with Orbit

Orbit is a cloud-based GUI portal to manage the Zenko instance you deployed in the previous two parts. It gives you insight into metrics and lets you create policies and rules to manage the data and replicate it between different public clouds. Here are the steps to register Zenko with Orbit.

1. Find cloudserver manager pod:

$ kubectl get pods | grep cloudserver-manager

2. Use the pod name to find the Zenko instance ID:

$ kubectl logs zenko-cloudserver-manager-7f8c8846b-5gjxk | grep 'Instance ID'

3. Now, find your Instance ID and head to Orbit to register your Zenko instance with your instance ID.

Your Orbit instance is now registered!

If you successfully launched a Zenko 1.1 instance with MetalK8s and Orbit using this tutorial, let us know. If you use this guide and get stuck or have any questions, let us know! Visit the forum and we can troubleshoot through any issues. Your input will also help to refine and constantly update this tutorial along the way. We’re always looking for feedback on our features and tutorials.

What are Kubernetes Operators and why you should use them

What are Kubernetes Operators and why you should use them

First, containers and microservices transformed the way we create and ship applications, shifting challenges to orchestrating many moving pieces at scale. Then Kubernetes came to save us. But the salty “helmsman” needs a plan to steer a herd of microservices and Operators are the best way to do that.

Hello Operator, what are you exactly?

The most commonly used definition online is: “Operators are the way of packaging, deploying and managing your application that runs atop Kubernetes”. In other words, Operators help in building cloud-native applications by automation of deployment, scaling, backup and restore. All that while being K-native application itself so almost absolutely independent from the platform where it runs.

CoreOS (who originally proposed the Operators concept in 2016) suggests thinking of an operator as an extension of the software vendor’s engineering team that watches over your Kubernetes environment and uses its current state to make decisions in milliseconds. An Operator essentially is codified knowledge on how to run the Kubernetes application.

Why Operators?

Kubernetes has been very good at managing stateless applications without any custom intervention.

But think of a stateful application, a database running on several nodes. If a majority of nodes go down, you’ll need to restore the database from a specific point following some steps. Scaling nodes up, upgrading or disaster recovery – these kinds of operations need knowing what is the right thing to do. And Operators help you bake that difficult patterns in a custom controller.

Some perks you get:

  • Less complexity: Operators simplify the processes of managing distributed applications. They take the Kubernetes promise of automation to its logical next step.
  • Transferring human knowledge to code: very often application management requires domain-specific knowledge. This knowledge can be transferred to the Operator.
  • Extended functionality: Kubernetes is extensible – it offers interfaces to plug in your network, storage, runtime solutions. Operators make it possible to extend K8s APIs with application specific logic!
  • Useful in most of the modern settings: Operators can run where Kubernetes can run: on public/hybrid/private, multi-cloud or on-premises.

Diving deeper

An Operator is basically a Kubernetes Custom Controller managing one or more Custom Resources. Kubernetes introduced custom resource definitions (CRDs) in version 1.7 and the platform became extensible. The application you want to watch is defined in K8s as a new object: a CRD that has its own YAML file and object type that the API server can understand. That way, you can define any specific criteria in the custom spec to watch out for.

CRD is a mean to specify a configuration. The cluster needs controllers to monitor its state and to match with the configuration. Enter Operators. They extend K8s functionality by allowing you to declare a custom controller to keep an eye on your application and perform custom tasks based on its state. The way Operator works is very similar to native K8s controllers, but it’s using mostly custom components that you defined.

This is a more specific list of what you need in order to create your custom operator:

  • A custom resource (CRD) spec that defines the application we want to watch, as well as an API for the CR
  • A custom controller to watch our application
  • Custom code within the new controller that dictates how to reconcile our CR against the spec
  • An operator to manage the custom controller
  • Deployment for the operator and custom resource

Where to start developing your Operator

Writing a CRD schema and its accompanying controller can be a daunting task. Currently, the most commonly used tool to create operators is Operator SDK. It is an open-source toolkit that makes it easier to manage and build Kubernetes native applications – Operators. The framework also includes the ability to monitor and collect metrics from operator-built clusters and to administrate multiple operators with lifecycle-manager.

You should also check this Kubernetes Operator Guidelines document on design, implementation, packaging, and documentation of a custom Operator.

The creation of an operator mostly starts by automating an application’s installation and then matures to perform more complex automation. So I would suggest starting small and wet your toes creating a basic operator that deploys an application or does something small.

The framework has a maturity model for provided tools that you can use to build the Operator. As you can see using Helm Operator Kit is probably the easiest way to get started, but not as powerful if you wish to build more sophisticated tool.

Operator maturity model from Operator SDK

Explore other operators

The number of custom operators for well-known applications is growing every day. In fact, Red Hat in collaboration with AWS, Google Cloud and Microsoft launched OperatorHub.io just a couple of months ago. It is the public registry for finding Kubernetes Operator backed services. You might find one that is useful for some components of your application or list your custom operator there.

Wrapping up

Kubernetes coupled with operators provides cloud-agnostic application deployment and management. It is so powerful that might lead us to treat cloud providers almost like a commodity, as you will be able to freely migrate between them and offer your product on any possible platform.

But is it a step to make Kubernetes easier or it actually adds even more complexity? Is it yet another tool that available, but just makes it more complicated for someone new? Is it all just going to explode in our face? So many questions…

If you have any thoughts or questions stop by the forum 🙂

How to move data to Google Cloud Storage with Zenko

How to move data to Google Cloud Storage with Zenko

If you want to use the strengths of any public clouds, you often have to move your data. Take machine learning, where Google Cloud Platform seems to have taken the lead: if you want to use TensorFlow as a service, your training datasets will have to be copied to GCP. Moreover, managing data on the level of application (in my case ML application) was something giving me a headache.

I used to move data to the cloud with ad-hoc solutions but that is inefficient and can lead to a high quantity of abandoned data occupying space. With Zenko, you can copy or move data to Google Cloud while keeping track of stray files, controlling your costs and making the process less painful.

The limits of uploading data straight into GCP

A common objection to installing Zenko is why not simply upload data into the cloud?

It depends on what you are doing. There is gsutil CLI tool and Google Storage Transfer offered by Google. The first one is slow and is good for small, one-time transfers, though you have to make sure you don’t end up terminating your command because gsutil can’t resume the transfer. Storage Transfer Services is scheduled as a job on GCP so you don’t have to guard it. If you transfer data from an external source, you pay for egress and operational GCP fees for using this service. It’s also worth mentioning rclone: it is handy to transfer data to GCP but doesn’t manage the transfers on the object level.

Zenko is an open source tool you can use to transfer and manage data between your on-prem location and desired locations in any public cloud. The key difference is that you are able to use one tool to continuously manage/move/backup/change/search the data.


Manage your data in any cloud – Try Zenko

Move that data

Step 0 – Setup Zenko

You will need to set up your Zenko instance and register it on  Zenko Orbit to proceed with this tutorial. If you haven’t completed that step, follow the Getting Started guide.

Step 1 – Create a bucket in Zenko local filesystem

This bucket (or multiple buckets) will be a transfer point for your objects. There are general naming rules in the AWS object storage world. These are the same rules you should follow when naming buckets on Zenko.

Creating a bucket on Zenko local filesystem

Step 2 – Add GCP buckets to Zenko

For each bucket in GCP storage that you want to add to Zenko, create another bucket with the name ending in “-mpu”. For example, if you want to have a bucket in GCP named “mydata”, you’ll have to create two buckets: one called “mydata” and another called “mydata-mpu”. We need to do this because of the way Zenko abstracts away the differences between various public cloud providers. S3 protocol uses a technique to split big files and objects into parts and upload them in parallel to speed up the process. When all the parts are uploaded it stitches them back together. GCP doesn’t have this concept so Zenko needs an extra bucket to simulate multipart upload (it’s one of the four differences between S3 and Google storage API we discussed before.)

Creating “-mpu” bucket on GCP for multipart upload

Find or create your access and secret keys to the GCP storage service to authorize Zenko to write to it.

Creating/getting access and secret keys from GCP Storage

Step 3 – Add your Google Cloud buckets to Zenko

You need to authorize access to the newly created GCP buckets by adding the keys (follow the instructions in the animation above). In this example, I have three buckets on GCP all in different regions. I will add all three to Zenko and later set the rules for data to follow and that will allow me to decide which data goes to which region on GCP.

Adding GCP buckets to “Storage locations” in Zenko

Now you can set up rules and policies that will move objects to the cloud. You have two options, replication or transition policies if your objective is moving data to GCP.

You can replicate data to Google Cloud Storage. And it can be as many rules as you like for different kinds of data. Zenko will create a replication queue using Kafka for each new object and if replication fails it will retry again and again.

Here is how to set a rule for replication. I am not specifying any prefixes for objects I wish to replicate but you can use this feature to distinguish between objects that should follow different replication rules.

Setting up object replication rules to GCP Storage

Another way to move data with Zenko is through a transition policy. You can specify when and where an object will be transferred. In this case, the current version of the object in Zenko local bucket will be transferred to a specified cloud location, GCP center in Tokyo in my example.

Creating a transition policy from Zenko to GCP Storage

As you can see there is no need for manual work. You just have to set up your desired storage locations once and create the rules to which all incoming data will adhere. It could be data produced by your application every day (Zenko is just an S3 endpoint) or big dataset you wish to move to GCP without sitting and hypnotizing the migration.


Manage your data in any cloud – Try Zenko

For more information ask a question on the forum.

How I made a Kubernetes cluster with five Raspberry Pis

How I made a Kubernetes cluster with five Raspberry Pis

Working as a DevOps in Scality, I’m exposed to Kubernetes clusters and CI/CD pipelines across the major clouds. My day-to-day tasks include maintaining Zenko and therefore I typically see large amounts of compute and storage resources at my disposal to test and deploy new infrastructure.

I love Kubernetes and would try to deploy a cluster on anything from a couple of toasters to AWS. And then one day I heard the announcement from Rancher about their micro Kubernetes distribution called K3s (five  less than K8s)

I immediately was hit with an undeniable desire to set up a small, physically portable cluster and test the guts out of K3s. Being a long-time Raspberry Pi enthusiast, naturally, I saw this as an opportunity for a passion project.

The idea is simple but interesting. Take some Raspberry Pis, string them together as a Kubernetes cluster. Far from a unique idea as this has already been done before; however, combined with this light-weight Kubernetes would allow for enough room to fit some workloads. I started to dream about Zenko at some remote edge device where asynchronous replication to the cloud would thrive. I thought: “Let’s do this!

The shopping list for a tiny Kubernetes cluster

Start with the shopping list:

  • Five Raspberry Pis 3B+ (Plus memory cards)
  • C4 Labs “Cloudlet” 8 bay case
  • portable TP-link router
  • Anker 6-port 60-watt USB charger
  • 8-port switch

Operating System hustle

There are countless great guides on how to set up a Raspberry Pi with the various OSes available. On the initial setup, I started with just a basic Raspbian to test out and see if I could find or build ARM images for all the Zenko services. I was able to easily build key components – CloudServer and Backbeat images – with the ‘arm32v6/node’ Docker image as a base.

After that was successful I decided to test out MongoDB, which is the core database we use for our metadata engine. Here’s where I hit my first problem: I found out that MongoDB 3.x version only supports 64bit operating systems. This is something I’ve taken for granted for so long now that I forgot it’s an issue. Fortunately Raspberry Pis 2 or newer use 64bit ARM chips but I still had to find a new OS since Raspbian only comes in the 32bit flavor.

While there is no definitive list, most distributions have an ‘aarch64’ version that typically works with the newer Raspberry Pis. I settled on Fedora 29 mostly because they have a CLI tool to load the image onto the SD card, add an ssh-public-key, and resize the root filesystem to fill the SD card. These are all manual configurations that typically needs to be done after you first boot up your Pi. This also meant that I could set up all five of my Pis without hooking up a keyboard and monitor and immediately have headless servers running.

Note: you can download Fedora from here.

So with all my Pis setup, I’m essentially left with just setting up the Kubernetes cluster.  While I’ve deployed countless clusters on virtual machines and bare-metal servers to the point that I feel like I could do it in my sleep, this time was completely unlike any I’ve done before. Thanks to the K3s installer, I had a cluster with four dedicated nodes and one master/node deployed under five minutes (not including my RPI setup time). Their bootstrap script allows you to set this up super easily. As easy as this:

# On the control server node
curl -sfL https://get.k3s.io | sh -

# Kubeconfig is written to /etc/rancher/k3s/k3s.yaml
k3s kubectl get node

# To setup an agent node run the below. NODE_TOKEN comes from /var/lib/rancher/k3s/server/node-token on the your server
curl -sfL https://get.k3s.io | K3S_URL=https://master-node-hostname:6443 K3S_TOKEN=XXX sh -

Putting Kubernetes on a mini-rack

With the 5-node Pi cluster operational it was time to set everything up in a portable format. The goals here were to only have a single power cable for everything and easily connect to WiFi wherever we take it. However, this also meant we didn’t want to go through the hassle of the manual setup and connecting each Raspberry Pi to the WiFi at every new location we brought it to. The solution was simple, make the network itself equally portable with a small switch and portable router.

The Cloudlet case from C4Labs is very thought out with wire management in mind and well put together with straightforward instructions for installing all the Raspberry Pis.

In our case, I wanted to be sure to leave room for the portable router, switch, and power brick as well. Fortunately and purely by accident, the length of the switch we ordered fit the exact internal height of the case allowing us to mount the switch vertically. This left us room underneath the Pis for the power brick and allowed us to mount the portable TP-link router in one of the remaining bays.

With all the fans mounted, Pis plugged in, and wires managed we still had one very obvious issue — both the 8-port switch and the USB power brick needed their own plugs. Looking over the switch, I quickly noticed that it ran off 5v which means it could easily run off USB. But I used up all six ports of the power brick for the five RPis and the portable router.

What’s next?

While this is it for me today, the goal is to now put this diminutive cluster through some workloads for a gauge of performance and eventually turn the setup process into some simple Ansible playbooks to streamline to bootstrapping of multiple nodes. Let me know what you think or ask me anything on the forum.

The ultimate guide to object storage and IAM in AWS, GCP and Azure

The ultimate guide to object storage and IAM in AWS, GCP and Azure

Here is a brief overview of the architectural differences between AWS, GCP and Azure for data storage and authentication, and additional links if you wish to further deep dive into specific topics.

Working on Zenko at Scality, we have to deal with multiple clouds on a day-to-day basis. Zenko might make these clouds seem very similar, as it simplifies the inner complexities and gives us a single interface to deal with buckets and objects across all clouds. But the way actual data is stored and accessed on these clouds is very different.

Disclaimer: These cloud providers have numerous services, multiple ways to store data and different authentication schemes. This blog post will only deal with storage whose purpose is, give me some data and I will give it back to you. This means it addresses only object storage (no database or queue storage) that deals with actual data and authentication needed to manipulate/access that data. The intent is to discuss the key differences to help you decide which one suits your needs.

Storage

Each cloud has its own hierarchy to store data. For any type of object storage everything comes down to objects and buckets/containers. The below table gives a bottom-up comparison of how objects are stored in AWS, GCP and Azure.

Category AWS GCP Azure
Base Entity Objects Objects Objects also called blobs
Containers buckets buckets containers
Storage Class S3 Standard, S3 Intelligent-Tiering, S3 Standard-IA, S3 One Zone-IA, S3 Glacier, S3 Glacier Deep Archive Multi-Regional Storage, Regional Storage, Nearline Storage, Coldline Storage Hot, Cool, Archive
Region Regions and AZs Multi-regional Azure Locations
Underlying service S3, S3 Glacier Cloud Storage Blob Storage
Namespace Account Project Storage Account
Management Console, Programmatic Console, Programmatic Console, Programmatic

Keys

Following the traditional object storage model, all three clouds (AWS, GCP and Azure) can store objects. Objects are identified using ‘keys’. Keys are basically names/references to the objects with the ‘value’ being actual data. Each one has it’s own metadata engine which allows us to retrieve data using keys.  In Azure storage these objects are also called “blobs”. Any key that ends with a slash(/) or delimiter in case of AWS is treated as a PREFIX for the underlying objects. This helps in with grouping objects in a folder like structure and can be used for organizational simplicity.

Limitations:

  • AWS: 5TB object size limit with 5GB part size limit
  • GCP: 5 TB object size limit
  • Azure: 4.75 TB blob size limit with 100 MB block size limit

Containers

In object storage everything is stored under containers, also called buckets. Containers can be used to organize the data or provide access to it but, unlike a typical file system architecture, buckets cannot be nested.

Note that in AWS and GCP containers are referred to as buckets and in Azure they are actually called containers.

Limitations:

  • AWS: 1000 buckets per account
  • GCP: No known limit on a number of buckets. But there are limits for a number of operations.
  • Azure: No limit on the number of containers

Storage Class

Each cloud solution provides different storage tiers based on your needs.

AWS:

  • S3 Standard: Data is stored redundantly across multiple devices in multiple facilities and is designed to sustain the loss of two facilities concurrently with 99.99 % availability, 99.999999999% durability.
  • S3 Intelligent-Tiering: Designed to optimize costs by automatically transitioning data to the most cost-effective access tier, without performance impact or operational overhead.
  • S3 Standard-IA: Used for data which is accessed less frequently, but requires rapid access when needed. Lower fee than S3 Standard but you are charged a revival fee.
  • S3 One Zone-IA: Same as standard-IA, but data is stored only in one availability zone. It will be lost in case of an availability zone destruction
  • S3 Glacier: Cheap storage suitable for archival data or infrequently accessed data.
  • S3 Glacier Deep Archive: Lowest cost storage, used for data archival and retention which may be accessed only twice a year.

GCP:

  • Multi-Regional Storage: Typically used for storing data that is frequently accessed (“hot” objects) around the world, such as serving website content, streaming videos, or gaming and mobile applications.
  • Regional Storage: Data is stored in the same region as your google cloud dataPRoc. Has higher SLA than multi-regional (99.99%).
  • Nearline Storage: Available both multi-regional and regional. Very low-cost storage used for archival data or infrequently accessed data. There are high operation costs and data retrieval costs.
  • Coldline Storage: Lowest cost storage, used for data archival and retention which may be accessed only once or twice a year.

Azure:

  • Hot: Designed for frequently accessed data. Higher storage costs but lower retrieval costs.
  • Cold: Designed for data which is typically access once in a month. It has lower storage costs and higher retrieval costs as compared to hot storage.
  • Archive: Long term backup solution with the cheapest storage costs and highest retrieval costs.

Regions

Each cloud provider has multiple data centers, facilities and availability zones divided by regions. Usually, a specific region is used for better latencies and multiple regions are used for HA / geo-redundancy. You can find more details about each cloud provider storage specific region below:

Underlying service

AWS, GCP and Azure combined have thousands of services which are not just limited to storage. They involve and are not limited to compute, databases, data analytics, traditional data storage, AI, machine learning, IOT, networking, IAM, developer tools, migration, etc. Here is a cheat sheet that I follow for GCP. As mentioned before we are only going to discuss actual data storage services.

AWS provides Simple Storage Service(S3) and S3 Glacier, GCP uses its Cloud Storage service and Azure uses Blob storage. All these services provide massively scalable storage namespace for unstructured data along with their own metadata engines.

Namespace

Here is the place the architecture of each cloud deviates from each other. Every cloud has its own hierarchy. Be aware that we are only discussing the resource hierarchy for object storage solutions. For other services, this might be different.

AWS: Everything in AWS is under an “account”. In a single account there is one S3 service which has all the buckets and corresponding objects. Users and groups can be created under this account. An administrator can provide access to the S3 service and underlying buckets and the service to users and groups using permissions, policies, etc. (discussed later). There is no hard limit on the amount of data that can be stored under 1 account. The only limit is on the number of buckets which defaults to 100 but can be increased to 1000.

GCP: GCP’s hierarchy model is ‘Projects’. A project can be used to organize all your Google cloud services/resources. Each project has its own set of resources. All projects are eventually linked to a domain. In the image below, we have a folder for each department and each folder has multiple projects. Depending on the project requirements and current usage, the projects can use different resources. The image shows the current utilization of the resources of each project. It’s important to note that every service will be available for every project. Each project will have its own set of users, groups, permissions, etc. By default you can create ~20 projects on GCP, this limit can be increased on request. I have not seen any storage limits specified by GCP except for the 5TB single object size limit.

Graph credits

Azure: Azure is different from both GCP and AWS. In Azure we have the concept of storage accounts. An Azure storage-account provides a unique namespace for all your storage. This entity only consists of data storage. All other services can be accessed by the user and are considered as separate entities from storage accounts. Authentication and authorization are managed by the storage account.

A storage account is limited to storage of 2 PB for the US and Europe, 500 TB for all other regions, which includes the UK. A number of storage accounts per region per subscription, including both standard and premium accounts is 250.

Management

All cloud providers have the option of console access and programmatic access.

Identity and Access Management

Information security should ensure proper data flow and the right level of data flow. Per the CIA triad, you shouldn’t be able to view or change the data that you are not authorized to and should be able to access the data which you have right to. This ensures confidentiality, integrity and availability (CIA). The AAA model of security needs authentication, authorization and accounting. Here, we will cover authentication and authorization. There are other things that we should keep in mind while designing secure systems. To learn more about the design considerations I would highly recommend going through learning more about security design principles by OWASP and the OWASP Top 10.

AWS, GCP and Azure provide solid security products with reliable security features. Each one has its own way of providing access to the storage services. I will provide an overview of how users can interact with the storage services. There is a lot more that goes on in the background than what will be discussed here. For our purpose, we will stick to everything needed for using storage services. I will consider that you already have an AWS, GCP and Azure account with the domain configured (where needed). This time I will use a top-down approach:

 

Category AWS GCP Azure
Underlying Service AWS IAM GCP IAM AAD, ADDS, AADDS
Entities Users/groups per account users/groups per domain per project users/groups per domain
Authentication Access Keys / Secret Keys Access Keys / Secret Keys Storage Endpoint, Access Key
Authorization roles, permissions, policies Cloud IAM permissions, Access Control Lists(ACLs), Signed URLs, Signed Policy Documents domain user permissions, shared keys, shared access signatures
Required details for operations Credentials, bucket name, authorization Credentials, bucket name, authorization Credentials, storage account name, container name

Underlying Service

AWS: AWS Identity and Access Management(IAM) is an AWS web service that helps you securely manage all your resources. You can use IAM to create IAM entities (users, groups, roles) and thereafter provide them access to various services using policies. IAM is used for both authentication and authorization for users, groups and resources. In other clouds there can be multiple IAM services for multiple entities but in AWS for a single account there is only one point of authentication and authorization.

GCP: GCP IAM is similar to AWS IAM but every project will have its own IAM portal and its own setup if IAM entities (users, groups, resources).

Azure: Azure uses the same domain services as Microsoft and is known to have a very stable authentication service. Azure supports three types of services: Azure AD(AAD), active directory domain services(ADDS – used with windows server 2016, 2012 with DCPromo) and Azure active directory domain services(AADDS – managed domain services).

Azure AD is the most modern out of the three services and should be used for any enterprise solutions. It can sync with the cloud as well as on-premise services. It supports various authentication modes such as cloud-only, password hash sync + seamless SSO, pass-through authentication + seamless SSO, ADFS, 3rd party authentication providers. Once you have configured your AD, you use RBAC to allow your users to create storage accounts.

Entities

All cloud providers have the concept of users and groups. In AWS there is a single set of users and groups across an account. In GCP there is a single set of users and groups in every project. In Azure the users and groups depend upon how the domain was configured. Azure AD can sync all users from the domain or an admin can add users on the fly for their particular domain.

Authentication

Credentials is a way for the end-user to prove their identity. By now you might have figured out that the services that help us create users will also provide us access to the storage services. This is true in the case on AWS and GCP, but not for Azure.

For AWS and GCP their respective IAM services allow us to generate a pair of Access Key and Secret Key for any user. These keys can later be used by the users to authenticate themselves to use cloud services which include AWS S3 and GCP cloud storage. For Azure the authentication for the containers is managed by the storage account. When a storage account is created, it creates a set of keys and an endpoint along with it. These keys and the endpoint along or the domain credentials are used for authentication.

Authorization

Once a user has proved their identity, they need proper access rights to interact with the S3 buckets or GCP buckets or Azure containers.

AWS: In AWS this can be done in multiple ways. User can first be given access to S3 services using roles/permissions/policies and then on then can be given bucket level permissions using bucket policies or ACLs.  Here is a small tutorial on how can a user give permissions for an S3 bucket. There are many other ways you can access buckets, but it’s always good to use some kind of authentication and authorization.

GCP: In GCP every project has its own IAM instance. Similar to AWS, you can control who can access the resource and how much access they will have. For our use case, this can be done using Cloud IAM permissions, Access Control Lists(ACLs), Signed URLs or Signed Policy Documents. GCP has a very thorough guide and documentation on these topics. Here is the list of permissions that you might want to use.

Azure: Azure has a lot of moving pieces considering it uses Azure AD as the default authentication mechanism. For now, we will assume that you are already authenticated to AD and only need to access the resources inside a storage account. Every storage account has its own IAM which you can provide a domain user permissions to access resources under the storage account. You can also use shared keys or shared access signatures for authorization.

Required Details for Operations

Now that we have authentication and authorized to our storage services we need some details to actually access our resources. Below are the details required for programmatic access:

  • AWS S3: Access Key, Secret Key, Bucket name, region(optional)
  • GCP Cloud storage: Access Key Secret Key, Bucket Name
  • Azure: Storage Account name, Storage endpoint, Access Key, container name

 

This concludes my take on the key differences I noticed in a multi-cloud storage environment while working with the multi-cloud data controller, Zenko.

Let me know what you think or ask me a question on forum.

How to deploy Zenko on Azure Kubernetes Service

How to deploy Zenko on Azure Kubernetes Service

In the spirit of the Deploy Zenko anywhere series, I would like to guide you through deploying Zenko on AKS (Azure Kubernetes Service) today. Azure is a constantly expanding worldwide network of data centers maintained by Microsoft.

You can find previous tutorials on how to deploy Zenko here:

Prerequisites

Initial VM

We are going to create an initial virtual machine on Azure that will be used to spin up and manage a Kubernetes cluster later. But first, create a resource group. Azure uses the concept of resource groups to group related resources together. We will create our computational resources within this resource group.

az group create\ 
  --name=<YourResourceGroupName>\
  --location=centralus

After that, you can follow this tutorial to create a virtual machine. It is pretty straight forward.

Things I want to mention:

  • choose your resource group within which the virtual machine is created
  • choose CentOS operating system
  • create a public IP address
  • expose SSH and HTTP ports at least
  • add your local computer’s (the one you will use to connect to VM) SSH public keys, as we need a way to connect to the machine later

Once the VM is created, you can connect to it through SSH and the public IP address from your local computer.

Azure CLI

To use the Kuberenetes Service on Azure, we need a command line tool to interact with it. You can choose between Azure interactive shell or installing the command line tool locally. In this case, I find CLI far easier to work with.

Install the Azure CLI tool on the new VM and try to login into Azure. This command will take you to a web browser page where you can confirm the login info.

az login

Create a Kubernetes cluster

To keep things neat, I suggest creating a directory inside the VM:

mkdir <ClusterName>
cd <ClusterName>

To secure your future cluster, generate SSH keys and:

ssh-keygen -f ssh-key-<ClusterName>

It will prompt you to add a passphrase, which you can leave empty if you wish. This will create a public key named ssh-key-<ClusterName>.pub and a private key named ssh-key-<ClusterName> in the folder we created.

The following command will request a Kubernetes cluster within the resource group that we created earlier:

az aks create --name <ClusterName> \
              --resource-group <YourResourceGroupName> \
              --ssh-key-value ssh-key-<ClusterName>.pub \
              --node-count 3 \
              --node-vm-size Standard_D2s_v3

In the code above:

  • –name is the name you want to use to refer to your cluster
  • –resource-group is the resource group you created in the beginning
  • –ssh-key – value is the SSH public key created for this cluster
  • –node – count is the number of nodes you want in your Kubernetes cluster (I am using 3 for tutorial)
  • –node-vm-size is the size of the nodes you want to use, which varies based on what you are using your cluster for and how much RAM/CPU each of your users needs. There is a list of all possible node sizes for you to choose from, but not all might be available in your location. If you get an error whilst creating the cluster you can try changing either the region or the node size.
  • It will install the default version of Kubernetes. You can pass –kubernetes-version to install a different version.

This might take some time. Once it is ready you will see information about the new Kubernetes cluster printed in the terminal.

Install Kubernetes CLI

To work with the cluster we need to install kubectl, the Kubernetes command line tool. Run the following command:

az aks install-cli

The next step is to get credentials from Azure:

az aks get-credentials
--name <ClusterName>
--resource-group <YourResourceGroupName>

Now if I run this command, I get all my nodes and the status on each:
It looks good, we can move on.

Install Helm

Helm is the first application package manager running atop Kubernetes, and we can use the official Zenko helm charts to deploy it to our cluster. It allows describing the application structure through convenient helm charts and managing it with simple commands.

1. Download helm v2.13.1

2. Unpack it and move it to its desired destination :

tar -zxvf helm-v2.13.1-linux-386.tar.gz
mv linux-386/helm /usr/local/bin/helm
helm version

The first service we need is a tiller, it runs inside of your Kubernetes cluster and manages releases (installations) of your charts. Create a serviceaccount for tiller:

kubectl create serviceaccount tiller --namespace kube-system

Create rbac-config.yaml that will configure tiller service:

kind: ServiceAccount
metadata:
   name: tiller
   namespace: kube-system
 ---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
   name: tiller
roleRef:
   apiGroup: rbac.authorization.k8s.io
   kind: ClusterRole
   name: cluster-admin

subjects:
  - kind: ServiceAccount
    name: tiller
    namespace: kube-system

Lastly, apply rbac-config.yaml file :

kubectl apply -f rbac-config.yaml
helm init --service-account tiller

Install Zenko

Get the latest release of Zenko or the one that you prefer from here:

wget https://github.com/scality/Zenko/releases/tag/1.0.2-hotfix.1
unzip 1.0.2-hotfix.1.zip

Go to the kubernetes folder and run the following commands that will deploy Zenko:

cd Zenko-1.0.2-hotfix.1/kubernetes
helm init
helm install --name zenko --set ingress.enabled=true
--set ingress.hosts[0]=zenko.local
--set cloudserver.endpoint=zenko.local zenko

This step may take up to 10 minutes. After the setup is done, you can run this command to see all Zenko pods and their availability:

kubectl get pods

Wait a few more minutes for all services to be started and run this command to get your Instance ID, you will need it to connect to Orbit:

kubectl logs $(kubectl get pods --no-headers=true -o custom-columns=:metadata.name | grep cloudserver-manager) | grep Instance | tail -n 1
Connect your instance to Orbit

Once you got the Instance ID copy it and go to Orbit signup page. After signup, you will have a choice to start sandbox or connect existing instance – that is what you need. Enter your ID and create a name for your Zenko cluster. Done! Start managing data.