How to reset Zenko queue counters

How to reset Zenko queue counters

The objects counters for target clouds can get out of sync when objects are deleted before they are replicated across regions (CRR) or when deleted or old versions of objects are removed before delete operations are executed on the target cloud. If this happens, you need to reset the Zenko queue counters in Redis and below are the instructions to do it.

Step-by-step guide

To clear the counters you first need to make sure the replication queues are empty and then reset the counters in Redis.

1) Do check the queues, set maintenance.enabled = true and maintenance.debug = true for the deployment. This can be done by setting the values by enabling them in the chart and running a “helm upgrade” or by setting them with an upgrade like this:

% helm upgrade my-zenko -f options.yml --set maintenance.enabled=true --set maintenance.debug.enabled=true zenko

This enables some extra pods for performing maintenance activities and debugging. After it’s done deploying make sure the “my-zenko-zenko-debug-kafka-client” pod is running.

2) Then you can enter the pod and check the queues:

% kubectl exec -it [kafka-client pod] bash 

# List the avail queues (replacing "my-zenko-zenko-queue" with "[your name]-zenko-queue")
root@[pod-name]/opt/kafka# ./bin/kafka-consumer-groups.sh --bootstrap-server my-zenko-zenko-queue:9092 --list

3) Identify the target cloud replication groups relevant to the counters you want to reset and check the queue lag like this:

root@[pod-name]/opt/kafka# ./bin/kafka-consumer-groups.sh --bootstrap-server my-zenko-zenko-queue:9092--group backbeat-replication-group-example-location --describe

Check the “LAG” column for pending actions, lag should be zero if they are empty. If the queues for all of the targets are quiescent we can move on.

4) Now we can head over to a Redis pod and start resetting counters.

% kubectl exec -it my-zenko-redis-ha-server-0 bash
no-name!@galaxy-z-redis-ha-server-0:/data$ redis-cli KEYS [location constraint]* |grep pending

# (for example: redis-cli KEYS aws-eu-west-1* |grep pending)
# This will return two keys, one for bytespending and one for opspending
no-name!@galaxy-z-redis-ha-server-0:/data$ redis-cli KEYS aws-eu-west-1* |grep pending
aws-eu-west-1:bb:crr:opspending
aws-eu-west-1:bb:crr:bytespending

# Set the counters to 0
no-name!@galaxy-z-redis-ha-server-0:/data$ redis-cli SET aws-eu-west-1:bb:crr:opspending 0
OK
no-name!@galaxy-z-redis-ha-server-0:/data$ redis-cli SET aws-eu-west-1:bb:crr:bytespending 0
OK

Do this for each target location that you wish to clear.

Failed Object Counters

Failed object markers for a location will clear out in 24 hours (if they are not manually or automatically retried). You can force the to clear by setting the “failed” counters to zero. You’ll need to find the keys with “failed” in the text and delete them. Something like this:

##
# Grep out the redis keys that house the failed object pointers
no-name!@galaxy-z-redis-ha-server-0:/data$ redis-cli KEYS aws-eu-west-1* |grep failed

##
# Now delete those keys
no-name!@galaxy-z-redis-ha-server-0:/data$ redis-cli DEL [key name]

Developing and debugging a highly distributed system can be hard and sharing our learning is a way to help others. For everything else, please use the forum to ask more questions 🙂

Photo by Nick Hillier on Unsplash

Learn how to make your first pull request to Zenko in 5 steps

Learn how to make your first pull request to Zenko in 5 steps

One of the best ways to improve your programming skills is to get involved with a community, meet people, and find new opportunities is to collaborate with others in open source projects. If it’s your first time creating a pull request it can be quite intimidating. I’m here to tell you to not be afraid of making even a tiny change because it’s likely that your pull request will help make Zenko better.

Feel free to ask

The best idea is to reach out to us first. We can discuss what you want to contribute and check whether someone is working on a similar change already or if you can get started right away. Wherever possible, we want to make sure you have a clear path to make your work easier, faster, relevant. Or if you are not sure what exactly you can do, we would be happy to help you find a way to contribute.

To do that you can create an issue on GitHub or ask your question on the Zenko forum.

Where you can find Zenko

If you visit the Zenko repository you will find that it includes installation resources (helm charts) to deploy the full Zenko stack over an orchestration system. A helm chart is a collection of files that describes a related set of Kubernetes resources.

The actual components of Zenko are spread across two repositories: Backbeat (core engine for asynchronous replication, optimized for queuing metadata updates and dispatching work to long-running tasks in the background) and CloudServer (Node.js implementation of the Amazon S3 protocol on the front-end and backend storage capabilities to multiple clouds, including Azure and Google).

Another great way to help is contributing to Zenko-specs (repository that contains design.mds of upcoming features where you are more than welcome to suggest or comment). Additionally, every repository has a design.md describing the existing feature.

Let’s get down to it

Step 1

After you have chosen a repository to contribute to, go ahead and fork it to your GitHub account. In the forked repository, you have “write” access and can push changes. Eventually, you will contribute back to the original repository using pull requests.

Let’s say you want to add some changes to Backbeat.

Clone the forked repository to your local machine:

$ git clone https://github.com/dashagurova/backbeat.git
$ cd backbeat

Step 2

You will find yourself in the default development branch of some version(development/major.minor). There is no master branch. Want to know why? Learn more about Scality’s own GitWaterFlow delivery model here.

The next step is to create your own branch where all your work will be done:

$ git checkout -b type_of_branch/name_your_fix

Step 3

Important: “type_of_branch” should be one of these prefixes: feature/*, improvement/*, bugfix/*, hotfix/*.
Do your magic! Fix something, improve existing code, add a feature or document one.

Note: Scality utilizes TDD (Test Driven Development) model, so it is highly appreciated if any code submission is associated with related unit tests or changes on the existing tests (more info), depending on the type of code that was submitted. You will find a tests/ folder in the root directory of every repository.

Step 4

While working in your branch, you might end up having many commits. In order to keep things easy to navigate, it is common practice to “squash” many small commits down to a few or a single logical changeset before submitting a pull request.

To squash three commits into one, you can do  the following:

$ git rebase -i HEAD~3
Where 3 is the number of commits

In the text editor that comes up, replace the words “pick” with “squash” next to the commits you want to squash into the commit before it.

Save and close the editor, and git will combine the squashed commits with the one before it. Git will then give you the opportunity to change your commit message to describe your fix or feature (in no more than 50 characters).

Step 5

If you’ve already pushed commits to GitHub and then squashed them locally, you will have to force the push to your branch.

$ git push -f origin type_of_branch/myfix

Otherwise just:

$ git push origin type_of_branch/myfix

Important: make sure that you push the changes to your type_of_branch/myfix!

Make the pull request

Now you’re ready to create a pull request. You can open a pull request to the upstream repository (original repository) or in your fork. One option is to create it in your fork and search for bugfix/myfix branch. Hit “New pull request”.

After that, you are presented with the page where you can go into the details about your work.

After you click  “Create pull request,” you are greeted by Bert-E. Bert-E is the gatekeeping and merging bot Scality developed in-house to automate GitWaterFlow. Its purpose is to help developers merge their feature branches on multiple development branches.

Now it’s time to relax and have some tea. Our core developers will review your request and get back to you shortly. If you are willing to contribute code, docs, issues, proposals or just ask a question, come find me on the forum.

How to manage data automatically with custom Backbeat extensions

How to manage data automatically with custom Backbeat extensions

Backbeat, a key Zenko microservice, dispatches work to long-running background tasks. Backbeat uses Apache Kafka, the popular open-source distributed streaming platform, for scalability and high availability. This gives Zenko functionalities like:

  • Asynchronous multi-site replication
  • Lifecycle policies
  • Metadata ingestion (supporting Scality RING today, with other backends coming soon)

As with the rest of the Zenko stack, Backbeat is an open-source project, with code organized to let you use extensions to add features. Using extensions, you can create rules to manipulate objects based on metadata logs. For example, an extension can recognize music files by artist and move objects in buckets named after the artist. Or an extension can automatically move objects to separate buckets, based on data type (zip, jpeg, text, etc.) or on the owner of the object.

All Backbeat interactions go through CloudServer, which means they are not restricted to one backend and you can reuse existing solutions for different backends.

The Backbeat service publishes a stream of bucket and object metadata updates to Kafka. Each extension applies its own filters to the metadata stream, picking only metadata that meets its filter criteria. Each extension has its own Kafka consumers that consume and process metadata entries as defined.

To help you develop  new extensions, we’ve added a basic extension called “helloWorld.” This extension filters the metadata stream to select only object key names with the name “helloworld” (case insensitive) and when processing each metadata entry, applies a basic AWS S3 putObjectTagging where the key is “hello” and the value is “world.”

This example extension shows:

  • How to add your own extension using the existing metadata stream from a Zenko 1.0 deployment
  • How to add your own filters for your extension
  • How to add a queue processor to subscribe to and consume from a Kafka topic

There are two kinds of Backbeat extensions: populators and processors. The populator receives all the metadata logs, filters them as needed, and publishes them to Kafka. The processor subscribes to the extension’s Kafka topic, thus receiving these filtered metadata log entries from the populator. The processor then applies any required changes (in our case, adding object tags to all “helloworld” object keys).

[maxbutton id=”1″ url=”https://www.zenko.io/try-zenko” text=”Try Zenko now!” ]

Example

Begin by working on the populator side of the extension. Within Backbeat, add all the configs needed to set up a new helloWorld extension, following the examples in this commit. These configurations are placeholders. Zenko will overwrite them with its own values, as you’ll see in later commits.

Every extension must have an index.js file in its extension directory (“helloWorld/” in the present example). This file must contain the extension’s definitions in its name, version, and configValidator fields. The index.js file is the entry point for the main populator process to load the extension.

Add filters for the helloWorld extension by creating a new class that extends the existing architecture defined by the QueuePopulatorExtension class. It is important to add this new filter class to the index.js definition as “queuePopulatorExtension”.

On the processor side of the extension, you need to create service accounts in Zenko to be used as clients to complete specific S3 API calls. In the HelloWorldProcessor class, this._serviceAuth is the credential set we pass from Zenko to Backbeat to help us perform the putObjectTagging S3 operation. For this demo, borrow the existing replication service account credentials.

Create an entry point for the new extensions processor by adding a new script in the package.json file. This part may be a little tricky, but the loadManagementDatabase method helps sync up Backbeat extensions with the latest changes in the Zenko environment, including config changes and service account information updates.

Instantiate the new extension processor class and finish the setup of the class by calling the start method, defined here.

Update the docker-entrypoint.sh file. These variables point to specific fields in the config.json file. For example, “.extensions.helloWorld.topic” points to the config.json value currently defined as “topic”: “backbeat-hello-world”.

These variable names (i.e. EXTENSION_HELLOWORLD_TOPIC) are set when Zenko is upgraded or deployed as a new Kubernetes pod, which updates these config.json values in Backbeat.

Finally, add the new extension to Zenko. You can see the variables defined by the Backbeat docker-entrypoint.sh file in these Zenko changes.

Some config environment variables aren’t so apparent to add because we did not add them to our extension configs, but they are necessary for running some of Backbeat’s internal processes. Also, because this demo borrows some replication service accounts, those variables (EXTENSIONS_REPLICATION_SOURCE_AUTH_TYPE, EXTENSIONS_REPLICATION_SOURCE_AUTH_ACCOUNT) must be defined as well.

Upgrade the existing Zenko deployment with:

$ helm upgrade --set ingress.enabled=true --set backbeat.helloworld.enabled=true zenko zenko

Where the Kubernetes deployment name is “zenko”. You must update the “backbeat” Docker image with the new extension changes.

With the Helm upgrade, you’ve added a new Backbeat extension! Now whenever you create an object with the key name of “helloworld” (case insensitive), Backbeat automatically adds object tagging, with a “hello” key and a “world” value  to the object.

Have any questions or comments? Please let us know on our forum. We would love to hear from you.

Photo by Jan Antonin Kolar on Unsplash

Deploy Zenko on Amazon EKS in 30 minutes

Deploy Zenko on Amazon EKS in 30 minutes

Do you have half an hour and an AWS account? If so, you can install Zenko and use Orbit to manage your data. Below is a step-by-step guide with time estimates to get started.

If you are an AWS user with appropriate permissions or policies to create EC2 instances and EKS clusters, you can dive into this tutorial. Otherwise, contact your administrator, who can add permissions (full documentation).

Initial Machine Setup (estimated time: 10 minutes):

For this tutorial, we use a jumper EC2 instance with Amazon Linux to deploy and manage our Kubernetes cluster. A power user can use their own workstation or laptop to manage the Kubernetes cluster.

Follow this guide to set up your EC2 instance and connect to your new instance using the information here. Once connected to the instance, install applications that will help set up the Kubernetes cluster.

Install Kubectl, a command-line tool for running commands against Kubernetes clusters.

$ curl -LO https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl

$ chmod +x ./kubectl
$ sudo mv ./kubectl /usr/local/bin/kubectl

Verify that kubectl is installed (expect a similar output):

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.2", GitCommit:"cff46ab41ff0bb44d8584413b598ad8360ec1def", GitTreeState:"clean", BuildDate:"2019-01-10T23:35:51Z", GoVersion:"go1.11.4", Compiler:"gc", Platform:"linux/amd64"}

Download aws-iam-authenticator, a tool to use AWS IAM credentials to authenticate to a Kubernetes cluster.

$ curl -o aws-iam-authenticator https://amazon-eks.s3-us-west-2.amazonaws.com/1.11.5/2018-12-06/bin/linux/amd64/aws-iam-authenticator

$ chmod +x ./aws-iam-authenticator
$ mkdir bin
$ cp ./aws-iam-authenticator $HOME/bin/aws-iam-authenticator && export PATH=$HOME/bin:$PATH

Install eksctl. eksctl is a simple CLI tool for creating clusters on EKS – Amazon’s new managed Kubernetes service for EC2.

$ curl --silent --location "https://github.com/weaveworks/eksctl/releases/download/latest_release/eksctl_$(uname -s)_amd64.tar.gz" | tar xz -C /tmp

$ sudo mv /tmp/eksctl /usr/local/bin

Configure AWS credentials:

$ mkdir ~/.aws
$ vim ~/.aws/credentials
$ cat ~/.aws/credentials
[default]
aws_access_key_id = AKIAII25IGOGWQITLYIQ
aws_secret_access_key = 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
$ export AWS_SHARED_CREDENTIALS_FILE=~/.aws/credentials

Verify credentials work. If the output looks similar, you are ready to launch your Kubernetes cluster:

$ eksctl get clusters
No clusters found

Deploy a Three-Node Kubernetes Cluster for Zenko: (estimated time: 10–15 minutes):

$ eksctl create cluster --name=zenko-eks-cluster --nodes=3 --region=us-west-2

Once you get the line below, your cluster is ready:

[✔]  EKS cluster "zenko-eks-cluster" in "us-west-2" region is ready

Install Helm:

$ curl https://raw.githubusercontent.com/kubernetes/helm/master/scripts/get > get_helm.sh 
$ bash ./get_helm.sh
$ helm version
Client: &version.Version{SemVer:"v2.12.3", GitCommit:"eecf22f77df5f65c823aacd2dbd30ae6c65f186e", GitTreeState:"clean"}

EKS requires role-based access control to be set up. The first step is to create a service account for Tiller:

$ kubectl create serviceaccount tiller --namespace kube-system

Create a Tiller service account: Make a rbac-config.yaml file and apply it.

$ cat rbac-config.yaml
apiVersion: v1
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
 name: tiller-role-binding
roleRef:
 kind: ClusterRole
 name: cluster-admin
 apiGroup: rbac.authorization.k8s.io
subjects:
- kind: ServiceAccount
 name: tiller
 namespace: kube-system

$ kubectl apply -f rbac-config.yaml
$ helm init --service-account tiller

Deploy Zenko (estimated time: 10 minutes)

Install Git:

$ sudo yum install git

Clone Zenko:

$ git clone https://github.com/scality/Zenko/

Go to the kubernetes folder and deploy Zenko. This will take about 10 minutes.

$ cd Zenko/kubernetes/
$ helm init
$ helm install --name zenko --set ingress.enabled=true \
--set ingress.hosts[0]=zenko.local \
--set cloudserver.endpoint=zenko.local zenko

Connect EKS Zenko to Orbit

Find the Instance ID to use for registering your instance:

$ kubectl logs $(kubectl get pods --no-headers=true -o \
custom-columns=:metadata.name | grep cloudserver-manager) | grep Instance | tail -n 1

{"name":"S3","time":1548793280888,"req_id":"a67edf37254381fc4781","level":"info","message":"this deployment's Instance ID is fb3c8811-88c6-468c-a2f4-aebd309707ef","hostname":"zenko-cloudserver-manager-8568c85497-5k5zp","pid":17}

Copy the ID and head to Orbit to paste it in the Settings page. Once the Zenko instance is connected to Orbit you’ll be able to attach cloud storage from different providers.

If you have any questions or want to show off a faster time than 30 minutes, join us at the Zenko forum.

Photo by chuttersnap on Unsplash

How to do Event-Based Processing with CloudServer and Kubeless

How to do Event-Based Processing with CloudServer and Kubeless

We want to provide all the tools our customers need for data and storage, but sometimes the best solution is one the customer creates on their own. In this tutorial, available in full on the Zenko forums, our Head of Research Vianney Rancurel demonstrates how to set up a CloudServer instance to perform additional functions from a Python script.

The environment for this instance includes a modified version of CloudServer deployed in Kubernetes (Minikube will also work) with Helm, AWS CLI, Kubeless and Kafka. Kubeless is a serverless framework designed to be deployed on a Kubernetes cluster, which allows users to call functions in other languages through Kafka triggers (full documentation). We’re taking advantage of this feature to call a Python script that produces two thumbnails of any image that is uploaded to CloudServer.

The modified version of CloudServer will generate Kafka events in a specific topic for each S3 operation. When a user uploads a photo, CloudServer pushes a message to the Kafka topic and the Kafka trigger runs the Python script to create two thumbnail images based on the image uploaded.

This setup allows users to create scripts in popular languages like Python, Ruby and Node.js to configure the best solutions to automate their workflows. Check out the video below to see Kubeless and Kafka triggers in action.

For those of you who like to also see text description, follow Vianney’s full tutorial on Zenko forum.

Photo by Yann Allegre on Unsplash

How to use Azure Video Indexer to add metadata to files stored anywhere

How to use Azure Video Indexer to add metadata to files stored anywhere

As the media and entertainment industry modernizes, companies are leveraging private and public cloud technology to meet the ever-increasing demands of consumers. Scality Zenko can be integrated with existing public cloud tools, such as Microsoft Azure’s Video Indexer, to help “cloudify” media assets.

Azure’s Video Indexer utilizes machine learning and artificial intelligence to automate a number of tasks, including face detection, thumbnail extraction and object identification. When paired with the Zenko Orbit multi-cloud browser, metadata can be automatically created by the Indexer and imported as tags into Zenko Orbit.

Check out the demo of Zenko Orbit and Video Indexer to see them in action. A raw video file—with no information on content beyond a filename—is uploaded with Zenko Orbit, automatically indexed through the Azure tool, and the newly created metadata is fed back into Zenko as tags for the video file. Note that Orbit also supports user-created tags, so more information can be added if Indexer misses something important.

Why is this relevant?

  • Applications don’t need to support multiple APIs to use the best cloud features. Zenko Orbit uses the S3 APIs and seamlessly translates the calls to Azure Blob Storage API.
  • The metadata catalog is the same, wherever the data is stored. The metadata added by Video Indexer are available even if the files are expired from Azure and replicated to other locations.

Enjoy the demo:

Don’t hesitate to reach out on the Zenko Forums with questions.

Photo by Kevin Ku on Unsplash