How to move data to Google Cloud Storage with Zenko

How to move data to Google Cloud Storage with Zenko

If you want to use the strengths of any public clouds, you often have to move your data. Take machine learning, where Google Cloud Platform seems to have taken the lead: if you want to use TensorFlow as a service, your training datasets will have to be copied to GCP. Moreover, managing data on the level of application (in my case ML application) was something giving me a headache.

I used to move data to the cloud with ad-hoc solutions but that is inefficient and can lead to a high quantity of abandoned data occupying space. With Zenko, you can copy or move data to Google Cloud while keeping track of stray files, controlling your costs and making the process less painful.

The limits of uploading data straight into GCP

A common objection to installing Zenko is why not simply upload data into the cloud?

It depends on what you are doing. There is gsutil CLI tool and Google Storage Transfer offered by Google. The first one is slow and is good for small, one-time transfers, though you have to make sure you don’t end up terminating your command because gsutil can’t resume the transfer. Storage Transfer Services is scheduled as a job on GCP so you don’t have to guard it. If you transfer data from an external source, you pay for egress and operational GCP fees for using this service. It’s also worth mentioning rclone: it is handy to transfer data to GCP but doesn’t manage the transfers on the object level.

Zenko is an open source tool you can use to transfer and manage data between your on-prem location and desired locations in any public cloud. The key difference is that you are able to use one tool to continuously manage/move/backup/change/search the data.


Manage your data in any cloud – Try Zenko

Move that data

Step 0 – Setup Zenko

You will need to set up your Zenko instance and register it on  Zenko Orbit to proceed with this tutorial. If you haven’t completed that step, follow the Getting Started guide.

Step 1 – Create a bucket in Zenko local filesystem

This bucket (or multiple buckets) will be a transfer point for your objects. There are general naming rules in the AWS object storage world. These are the same rules you should follow when naming buckets on Zenko.

Creating a bucket on Zenko local filesystem

Step 2 – Add GCP buckets to Zenko

For each bucket in GCP storage that you want to add to Zenko, create another bucket with the name ending in “-mpu”. For example, if you want to have a bucket in GCP named “mydata”, you’ll have to create two buckets: one called “mydata” and another called “mydata-mpu”. We need to do this because of the way Zenko abstracts away the differences between various public cloud providers. S3 protocol uses a technique to split big files and objects into parts and upload them in parallel to speed up the process. When all the parts are uploaded it stitches them back together. GCP doesn’t have this concept so Zenko needs an extra bucket to simulate multipart upload (it’s one of the four differences between S3 and Google storage API we discussed before.)

Creating “-mpu” bucket on GCP for multipart upload

Find or create your access and secret keys to the GCP storage service to authorize Zenko to write to it.

Creating/getting access and secret keys from GCP Storage

Step 3 – Add your Google Cloud buckets to Zenko

You need to authorize access to the newly created GCP buckets by adding the keys (follow the instructions in the animation above). In this example, I have three buckets on GCP all in different regions. I will add all three to Zenko and later set the rules for data to follow and that will allow me to decide which data goes to which region on GCP.

Adding GCP buckets to “Storage locations” in Zenko

Now you can set up rules and policies that will move objects to the cloud. You have two options, replication or transition policies if your objective is moving data to GCP.

You can replicate data to Google Cloud Storage. And it can be as many rules as you like for different kinds of data. Zenko will create a replication queue using Kafka for each new object and if replication fails it will retry again and again.

Here is how to set a rule for replication. I am not specifying any prefixes for objects I wish to replicate but you can use this feature to distinguish between objects that should follow different replication rules.

Setting up object replication rules to GCP Storage

Another way to move data with Zenko is through a transition policy. You can specify when and where an object will be transferred. In this case, the current version of the object in Zenko local bucket will be transferred to a specified cloud location, GCP center in Tokyo in my example.

Creating a transition policy from Zenko to GCP Storage

As you can see there is no need for manual work. You just have to set up your desired storage locations once and create the rules to which all incoming data will adhere. It could be data produced by your application every day (Zenko is just an S3 endpoint) or big dataset you wish to move to GCP without sitting and hypnotizing the migration.


Manage your data in any cloud – Try Zenko

For more information ask a question on the forum.

Peeking into the new Zenko open source version

Peeking into the new Zenko open source version

We have been working hard and are thrilled to give you a sneak peek of the Zenko 1.1 release. Here is the list of new treats for you:

Lifecycle management

Control over your data.

The new release is bringing you more tools to control your data workflow in the multi-cloud environment.

What is lifecycle policy anyway? Basically, you can create rules that will specify actions to be taken on the objects in Zenko after a certain period of time. In the previous Zenko release you could set an expiration date on objects, meaning you could simply tell Zenko to delete certain objects after a specified period of time.

We are adding another powerful policy to the lifecycle management – transition. Transition policies allow moving object data from one location to another automatically instead of deleting it. Usually, you would like to move old or infrequently-accessed data to a slower but cheaper location. On versioned buckets, you can apply this policy to current version data, as well as noncurrent versions.

Support for Ceph-based storage

Liberating freedom of choice

Zenko is striving to be up-to-date with all current cloud storage offerings on the market. Our goal is to provide as much flexibility as possible and avoid vendor lock-in. 

Ceph is an open source software put together to facilitate highly scalable object, block and file-based storage under one whole system.

Zenko 1.1 adds Ceph to the list of possible storage locations. Just go to the Orbit -> Storage Locations -> Add, in a dropdown menu you will find Ceph.

Out-of-band (OOB) updates from Scality RING to Zenko

If you are already utilizing our on-prem object storage software, RING, we are excited to let you know that now you can go multi-cloud with all your stored data. Previously you could use Zenko capabilities to manage data across different clouds, but only for the inbound data stream. What about all the data you already had stored in the RING?

That’s where out-of-band updates are coming in to save the day.

Cosmos

Our team created an extensible framework (a.k.a Cosmos) that will allow Zenko to manage data stored on various kinds of backends such as filesystems, block storage devices, and any other storage platform. Pre-existing data on these storage systems and data not created through Zenko will be chronologically ingested/synchronized.

Enter SOFS (Scale-Out-File-System)

To accommodate a variety of architectures and use cases, RING allows native file system access to RING storage through the integrated SOFS with NFS, SMB and FUSE Connectors for access over these well-known file protocols. SOFS is more precisely a virtual file system on top of the RING’s storage services. It is a commercial product from Scality and you can learn more about it in this whitepaper.

These updates essentially enable Zenko to discover and import metadata from files stored on existing Scality RING file system(s), as well as to receive ongoing asynchronous (out-of-band) updates for changes to the target file systems such as new file creates, deletes and metadata updates. Once the metadata is imported into Zenko, key Zenko functionality can be used.

To name a few key features:

  • Cross-region replication
  • Lifecycle management
  • Metadata search

OOB from RING S3 Connector(S3C) to Zenko

The Scality S3 Connector provides a modern S3-compatible application interface to the Scality RING. The AWS S3 API has become the industry’s default cloud storage API and has furthermore emerged as the standard RESTful dialect for object storage.

Zenko 1.1 has a new service (ingestion-populator and ingestion-processor) in the Backbeat component to discover and import metadata from Scality S3C, as well as to receive ongoing asynchronous (out-of-band) updates for changes to the target bucket such as new objects created, deleted and metadata updates. Once the metadata is imported into Zenko, key Zenko functionality can be used on the associated RING object data.

Let us know

We would love to hear your thoughts on this updates to Zenko or If you want to contribute to the Zenko roadmap, check our GitHub repository and leave your comments on the forum.

How to deploy Zenko on Azure Kubernetes Service

How to deploy Zenko on Azure Kubernetes Service

In the spirit of the Deploy Zenko anywhere series, I would like to guide you through deploying Zenko on AKS (Azure Kubernetes Service) today. Azure is a constantly expanding worldwide network of data centers maintained by Microsoft.

You can find previous tutorials on how to deploy Zenko here:

Prerequisites

Initial VM

We are going to create an initial virtual machine on Azure that will be used to spin up and manage a Kubernetes cluster later. But first, create a resource group. Azure uses the concept of resource groups to group related resources together. We will create our computational resources within this resource group.

az group create\ 
  --name=<YourResourceGroupName>\
  --location=centralus

After that, you can follow this tutorial to create a virtual machine. It is pretty straight forward.

Things I want to mention:

  • choose your resource group within which the virtual machine is created
  • choose CentOS operating system
  • create a public IP address
  • expose SSH and HTTP ports at least
  • add your local computer’s (the one you will use to connect to VM) SSH public keys, as we need a way to connect to the machine later

Once the VM is created, you can connect to it through SSH and the public IP address from your local computer.

Azure CLI

To use the Kuberenetes Service on Azure, we need a command line tool to interact with it. You can choose between Azure interactive shell or installing the command line tool locally. In this case, I find CLI far easier to work with.

Install the Azure CLI tool on the new VM and try to login into Azure. This command will take you to a web browser page where you can confirm the login info.

az login

Create a Kubernetes cluster

To keep things neat, I suggest creating a directory inside the VM:

mkdir <ClusterName>
cd <ClusterName>

To secure your future cluster, generate SSH keys and:

ssh-keygen -f ssh-key-<ClusterName>

It will prompt you to add a passphrase, which you can leave empty if you wish. This will create a public key named ssh-key-<ClusterName>.pub and a private key named ssh-key-<ClusterName> in the folder we created.

The following command will request a Kubernetes cluster within the resource group that we created earlier:

az aks create --name <ClusterName> \
              --resource-group <YourResourceGroupName> \
              --ssh-key-value ssh-key-<ClusterName>.pub \
              --node-count 3 \
              --node-vm-size Standard_D2s_v3

In the code above:

  • –name is the name you want to use to refer to your cluster
  • –resource-group is the resource group you created in the beginning
  • –ssh-key – value is the SSH public key created for this cluster
  • –node – count is the number of nodes you want in your Kubernetes cluster (I am using 3 for tutorial)
  • –node-vm-size is the size of the nodes you want to use, which varies based on what you are using your cluster for and how much RAM/CPU each of your users needs. There is a list of all possible node sizes for you to choose from, but not all might be available in your location. If you get an error whilst creating the cluster you can try changing either the region or the node size.
  • It will install the default version of Kubernetes. You can pass –kubernetes-version to install a different version.

This might take some time. Once it is ready you will see information about the new Kubernetes cluster printed in the terminal.

Install Kubernetes CLI

To work with the cluster we need to install kubectl, the Kubernetes command line tool. Run the following command:

az aks install-cli

The next step is to get credentials from Azure:

az aks get-credentials
--name <ClusterName>
--resource-group <YourResourceGroupName>

Now if I run this command, I get all my nodes and the status on each:
It looks good, we can move on.

Install Helm

Helm is the first application package manager running atop Kubernetes, and we can use the official Zenko helm charts to deploy it to our cluster. It allows describing the application structure through convenient helm charts and managing it with simple commands.

1. Download helm v2.13.1

2. Unpack it and move it to its desired destination :

tar -zxvf helm-v2.13.1-linux-386.tar.gz
mv linux-386/helm /usr/local/bin/helm
helm version

The first service we need is a tiller, it runs inside of your Kubernetes cluster and manages releases (installations) of your charts. Create a serviceaccount for tiller:

kubectl create serviceaccount tiller --namespace kube-system

Create rbac-config.yaml that will configure tiller service:

kind: ServiceAccount
metadata:
   name: tiller
   namespace: kube-system
 ---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
   name: tiller
roleRef:
   apiGroup: rbac.authorization.k8s.io
   kind: ClusterRole
   name: cluster-admin

subjects:
  - kind: ServiceAccount
    name: tiller
    namespace: kube-system

Lastly, apply rbac-config.yaml file :

kubectl apply -f rbac-config.yaml
helm init --service-account tiller

Install Zenko

Get the latest release of Zenko or the one that you prefer from here:

wget https://github.com/scality/Zenko/releases/tag/1.0.2-hotfix.1
unzip 1.0.2-hotfix.1.zip

Go to the kubernetes folder and run the following commands that will deploy Zenko:

cd Zenko-1.0.2-hotfix.1/kubernetes
helm init
helm install --name zenko --set ingress.enabled=true
--set ingress.hosts[0]=zenko.local
--set cloudserver.endpoint=zenko.local zenko

This step may take up to 10 minutes. After the setup is done, you can run this command to see all Zenko pods and their availability:

kubectl get pods

Wait a few more minutes for all services to be started and run this command to get your Instance ID, you will need it to connect to Orbit:

kubectl logs $(kubectl get pods --no-headers=true -o custom-columns=:metadata.name | grep cloudserver-manager) | grep Instance | tail -n 1
Connect your instance to Orbit

Once you got the Instance ID copy it and go to Orbit signup page. After signup, you will have a choice to start sandbox or connect existing instance – that is what you need. Enter your ID and create a name for your Zenko cluster. Done! Start managing data.


				
					
Baking a multi-cloud RaspberryPi for DockerCon

Baking a multi-cloud RaspberryPi for DockerCon

About one month ago, I was walking around the office and saw this goofy pink toy microwave. It was just there for the team to take funny pictures. Should I say it was a big hit at the office parties? It started its life as a passion project and a demo of WordPress REST APIs and with DockerCon19 on the horizon,  we were thinking about how we could demonstrate Zenko to fellow developers at the event. We’ve decided it should be interactive and fun – and suddenly our pink oven photo booth received a new purpose.

Zenko is a multi-cloud controller that enables developers to manage active workflows of unstructured data. It provides a single, unified API across all clouds to simplify application development. The data is stored in standard cloud format to make the data consumable directly by native cloud apps and services. With the photo booth, our intention is to create the data that we will manage using Zenko.

Setting up the RaspberryPi

This is the list of what we needed to make photo booth:

  • Raspberry Pi (in this case a Raspberry 3 model B)
  • SD Card for the Raspberry Pi
  • Micro USB cable + power adapter 5V and 2A (to power the Raspberry)
  • Camera module for Raspberry
  • USB Hub
  • Pink toy microwave
  • 7 inch HDMI touch display
  • The decoration (yes, this is essential)

I also would like to mention that I ended up using wired access to the internet. LAN cable works infinitely better than wifi for a stable connection. The “Start” button is connected to the Raspberry Pi on the GPIO Pin 18 and the LED light on GPIO 7.

Install the Python dependencies

The operating system of choice is the latest version of Raspbian Stretch Lite. It was written to the SD card (32GB in this case, but it could be way smaller as all pictures backed up on the cloud by Zenko). I used Etcher to write the operating system on the card.

All the necessary libraries:

  • Python
  • Boto3 (AWS SDK for Python)
  • Picamera (package to use the camera)
  • GraphicsMagick (a tool to create gifs)

How the demo flows

Step 1

The LED light indicates “Ready” status after the Pi is booted or the previous session is finished. The script runs in an endless loop and launches at boot.

Step 2

After the “Start” button is pressed, the script is executed. The user is guided to get ready and the Pi Camera Module will take 4 pictures in a row.

Step3

All pictures are saved in the local directory at first. Using the GraphicsMagick tool animated gif is created.

gm convert -delay 1[delay between pictures] <input_files> <output_file>
Step 4

Next, the user is asked to enter their name and email. These two values will be used as metadata for the animated gif when uploading to Zenko.

Step 5

Upload the gif. Boto is the Amazon Web Services (AWS) SDK for Python. We create a low-level client with the service name ‘s3’ and the keys to a Zenko instance along with the endpoint. All this info is available on Orbit connected to the Zenko instance.

session = boto3.session.Session()

s3_client = session.client(
service_name='s3',
aws_access_key_id='ECCESS KEY',
aws_secret_access_key='SECRET KEY',
endpoint_url='ZENKO ENDPOINT',)

s3_client.put_object(Bucket='transfer-bucket',
Key=user_name,
Body=data,
Metadata={ 'name':user_name, 'email': user_email, 'event': 'dockercon19' })

When putting the object to Zenko using client there are few small details to keep in mind:

  • Key – is a string (not a file path) that will be the name to the object.
  • Body – is a binary string (that’s why there is a call to open()).
  • Metadata – key: value pairs to be added to the object.
  • “transfer-bucket” – is the name of the target bucket in Zenko.

This bucket is a transient source bucket and appears as “temporary” in Orbit. The “isTransient” location property is set through Orbit. It is used for low-latency writes to local storage before having CRR transitioning the data asynchronously to cloud targets (GCP, AWS, Azure).

Step 6

If everything went well while putting the current object to Zenko then preview mode will start and show the resulting gif to the user a couple of times. Instant gratification is important 😉

Our freshly created data is ready to be managed!

Some of  Zenko’s capabilities are:

  • Unified interface across Clouds
  • Data is stored in a cloud-native format
  • Global search using metadata
  • Policy-based data management
  • Single metadata namespace
  • Deploy-anywhere architecture

At this point, it is a good idea to check the animated gif in the Orbit browser and make sure that it was replicated to different cloud storage locations (I already have the rule in place that replicates the object to all 3 cloud locations). Maybe create some new rules on where to replicate object or when it expires. Have a peek at statistics: memory usage, replication status, number of objects, total data managed, archived vs active data. Use the global search across all managed data in Orbit.

Check out the repository with the code for the demo. Come see me at DockerCon19! Look for the Zenko booth and our pink oven photo booth.

If you cannot make it to DockerCon this year, I will be happy to chat or answer any questions on the forum. Cheers!

Learn how to make your first pull request to Zenko in 5 steps

Learn how to make your first pull request to Zenko in 5 steps

One of the best ways to improve your programming skills is to get involved with a community, meet people, and find new opportunities is to collaborate with others in open source projects. If it’s your first time creating a pull request it can be quite intimidating. I’m here to tell you to not be afraid of making even a tiny change because it’s likely that your pull request will help make Zenko better.

Feel free to ask

The best idea is to reach out to us first. We can discuss what you want to contribute and check whether someone is working on a similar change already or if you can get started right away. Wherever possible, we want to make sure you have a clear path to make your work easier, faster, relevant. Or if you are not sure what exactly you can do, we would be happy to help you find a way to contribute.

To do that you can create an issue on GitHub or ask your question on the Zenko forum.

Where you can find Zenko

If you visit the Zenko repository you will find that it includes installation resources (helm charts) to deploy the full Zenko stack over an orchestration system. A helm chart is a collection of files that describes a related set of Kubernetes resources.

The actual components of Zenko are spread across two repositories: Backbeat (core engine for asynchronous replication, optimized for queuing metadata updates and dispatching work to long-running tasks in the background) and CloudServer (Node.js implementation of the Amazon S3 protocol on the front-end and backend storage capabilities to multiple clouds, including Azure and Google).

Another great way to help is contributing to Zenko-specs (repository that contains design.mds of upcoming features where you are more than welcome to suggest or comment). Additionally, every repository has a design.md describing the existing feature.

Let’s get down to it

Step 1

After you have chosen a repository to contribute to, go ahead and fork it to your GitHub account. In the forked repository, you have “write” access and can push changes. Eventually, you will contribute back to the original repository using pull requests.

Let’s say you want to add some changes to Backbeat.

Clone the forked repository to your local machine:

$ git clone https://github.com/dashagurova/backbeat.git
$ cd backbeat

Step 2

You will find yourself in the default development branch of some version(development/major.minor). There is no master branch. Want to know why? Learn more about Scality’s own GitWaterFlow delivery model here.

The next step is to create your own branch where all your work will be done:

$ git checkout -b type_of_branch/name_your_fix

Step 3

Important: “type_of_branch” should be one of these prefixes: feature/*, improvement/*, bugfix/*, hotfix/*.
Do your magic! Fix something, improve existing code, add a feature or document one.

Note: Scality utilizes TDD (Test Driven Development) model, so it is highly appreciated if any code submission is associated with related unit tests or changes on the existing tests (more info), depending on the type of code that was submitted. You will find a tests/ folder in the root directory of every repository.

Step 4

While working in your branch, you might end up having many commits. In order to keep things easy to navigate, it is common practice to “squash” many small commits down to a few or a single logical changeset before submitting a pull request.

To squash three commits into one, you can do  the following:

$ git rebase -i HEAD~3
Where 3 is the number of commits

In the text editor that comes up, replace the words “pick” with “squash” next to the commits you want to squash into the commit before it.

Save and close the editor, and git will combine the squashed commits with the one before it. Git will then give you the opportunity to change your commit message to describe your fix or feature (in no more than 50 characters).

Step 5

If you’ve already pushed commits to GitHub and then squashed them locally, you will have to force the push to your branch.

$ git push -f origin type_of_branch/myfix

Otherwise just:

$ git push origin type_of_branch/myfix

Important: make sure that you push the changes to your type_of_branch/myfix!

Make the pull request

Now you’re ready to create a pull request. You can open a pull request to the upstream repository (original repository) or in your fork. One option is to create it in your fork and search for bugfix/myfix branch. Hit “New pull request”.

After that, you are presented with the page where you can go into the details about your work.

After you click  “Create pull request,” you are greeted by Bert-E. Bert-E is the gatekeeping and merging bot Scality developed in-house to automate GitWaterFlow. Its purpose is to help developers merge their feature branches on multiple development branches.

Now it’s time to relax and have some tea. Our core developers will review your request and get back to you shortly. If you are willing to contribute code, docs, issues, proposals or just ask a question, come find me on the forum.