How I made a Kubernetes cluster with five Raspberry Pis

How I made a Kubernetes cluster with five Raspberry Pis

Working as a DevOps in Scality, I’m exposed to Kubernetes clusters and CI/CD pipelines across the major clouds. My day-to-day tasks include maintaining Zenko and therefore I typically see large amounts of compute and storage resources at my disposal to test and deploy new infrastructure.

I love Kubernetes and would try to deploy a cluster on anything from a couple of toasters to AWS. And then one day I heard the announcement from Rancher about their micro Kubernetes distribution called K3s (five  less than K8s)

I immediately was hit with an undeniable desire to set up a small, physically portable cluster and test the guts out of K3s. Being a long-time Raspberry Pi enthusiast, naturally, I saw this as an opportunity for a passion project.

The idea is simple but interesting. Take some Raspberry Pis, string them together as a Kubernetes cluster. Far from a unique idea as this has already been done before; however, combined with this light-weight Kubernetes would allow for enough room to fit some workloads. I started to dream about Zenko at some remote edge device where asynchronous replication to the cloud would thrive. I thought: “Let’s do this!

The shopping list for a tiny Kubernetes cluster

Start with the shopping list:

  • Five Raspberry Pis 3B+ (Plus memory cards)
  • C4 Labs “Cloudlet” 8 bay case
  • portable TP-link router
  • Anker 6-port 60-watt USB charger
  • 8-port switch

Operating System hustle

There are countless great guides on how to set up a Raspberry Pi with the various OSes available. On the initial setup, I started with just a basic Raspbian to test out and see if I could find or build ARM images for all the Zenko services. I was able to easily build key components – CloudServer and Backbeat images – with the ‘arm32v6/node’ Docker image as a base.

After that was successful I decided to test out MongoDB, which is the core database we use for our metadata engine. Here’s where I hit my first problem: I found out that MongoDB 3.x version only supports 64bit operating systems. This is something I’ve taken for granted for so long now that I forgot it’s an issue. Fortunately Raspberry Pis 2 or newer use 64bit ARM chips but I still had to find a new OS since Raspbian only comes in the 32bit flavor.

While there is no definitive list, most distributions have an ‘aarch64’ version that typically works with the newer Raspberry Pis. I settled on Fedora 29 mostly because they have a CLI tool to load the image onto the SD card, add an ssh-public-key, and resize the root filesystem to fill the SD card. These are all manual configurations that typically needs to be done after you first boot up your Pi. This also meant that I could set up all five of my Pis without hooking up a keyboard and monitor and immediately have headless servers running.

Note: you can download Fedora from here.

So with all my Pis setup, I’m essentially left with just setting up the Kubernetes cluster.  While I’ve deployed countless clusters on virtual machines and bare-metal servers to the point that I feel like I could do it in my sleep, this time was completely unlike any I’ve done before. Thanks to the K3s installer, I had a cluster with four dedicated nodes and one master/node deployed under five minutes (not including my RPI setup time). Their bootstrap script allows you to set this up super easily. As easy as this:

# On the control server node
curl -sfL https://get.k3s.io | sh -

# Kubeconfig is written to /etc/rancher/k3s/k3s.yaml
k3s kubectl get node

# To setup an agent node run the below. NODE_TOKEN comes from /var/lib/rancher/k3s/server/node-token on the your server
curl -sfL https://get.k3s.io | K3S_URL=https://master-node-hostname:6443 K3S_TOKEN=XXX sh -

Putting Kubernetes on a mini-rack

With the 5-node Pi cluster operational it was time to set everything up in a portable format. The goals here were to only have a single power cable for everything and easily connect to WiFi wherever we take it. However, this also meant we didn’t want to go through the hassle of the manual setup and connecting each Raspberry Pi to the WiFi at every new location we brought it to. The solution was simple, make the network itself equally portable with a small switch and portable router.

The Cloudlet case from C4Labs is very thought out with wire management in mind and well put together with straightforward instructions for installing all the Raspberry Pis.

In our case, I wanted to be sure to leave room for the portable router, switch, and power brick as well. Fortunately and purely by accident, the length of the switch we ordered fit the exact internal height of the case allowing us to mount the switch vertically. This left us room underneath the Pis for the power brick and allowed us to mount the portable TP-link router in one of the remaining bays.

With all the fans mounted, Pis plugged in, and wires managed we still had one very obvious issue — both the 8-port switch and the USB power brick needed their own plugs. Looking over the switch, I quickly noticed that it ran off 5v which means it could easily run off USB. But I used up all six ports of the power brick for the five RPis and the portable router.

What’s next?

While this is it for me today, the goal is to now put this diminutive cluster through some workloads for a gauge of performance and eventually turn the setup process into some simple Ansible playbooks to streamline to bootstrapping of multiple nodes. Let me know what you think or ask me anything on the forum.

The ultimate guide to object storage and IAM in AWS, GCP and Azure

The ultimate guide to object storage and IAM in AWS, GCP and Azure

Here is a brief overview of the architectural differences between AWS, GCP and Azure for data storage and authentication, and additional links if you wish to further deep dive into specific topics.

Working on Zenko at Scality, we have to deal with multiple clouds on a day-to-day basis. Zenko might make these clouds seem very similar, as it simplifies the inner complexities and gives us a single interface to deal with buckets and objects across all clouds. But the way actual data is stored and accessed on these clouds is very different.

Disclaimer: These cloud providers have numerous services, multiple ways to store data and different authentication schemes. This blog post will only deal with storage whose purpose is, give me some data and I will give it back to you. This means it addresses only object storage (no database or queue storage) that deals with actual data and authentication needed to manipulate/access that data. The intent is to discuss the key differences to help you decide which one suits your needs.

Storage

Each cloud has its own hierarchy to store data. For any type of object storage everything comes down to objects and buckets/containers. The below table gives a bottom-up comparison of how objects are stored in AWS, GCP and Azure.

Category AWS GCP Azure
Base Entity Objects Objects Objects also called blobs
Containers buckets buckets containers
Storage Class S3 Standard, S3 Intelligent-Tiering, S3 Standard-IA, S3 One Zone-IA, S3 Glacier, S3 Glacier Deep Archive Multi-Regional Storage, Regional Storage, Nearline Storage, Coldline Storage Hot, Cool, Archive
Region Regions and AZs Multi-regional Azure Locations
Underlying service S3, S3 Glacier Cloud Storage Blob Storage
Namespace Account Project Storage Account
Management Console, Programmatic Console, Programmatic Console, Programmatic

Keys

Following the traditional object storage model, all three clouds (AWS, GCP and Azure) can store objects. Objects are identified using ‘keys’. Keys are basically names/references to the objects with the ‘value’ being actual data. Each one has it’s own metadata engine which allows us to retrieve data using keys.  In Azure storage these objects are also called “blobs”. Any key that ends with a slash(/) or delimiter in case of AWS is treated as a PREFIX for the underlying objects. This helps in with grouping objects in a folder like structure and can be used for organizational simplicity.

Limitations:

  • AWS: 5TB object size limit with 5GB part size limit
  • GCP: 5 TB object size limit
  • Azure: 4.75 TB blob size limit with 100 MB block size limit

Containers

In object storage everything is stored under containers, also called buckets. Containers can be used to organize the data or provide access to it but, unlike a typical file system architecture, buckets cannot be nested.

Note that in AWS and GCP containers are referred to as buckets and in Azure they are actually called containers.

Limitations:

  • AWS: 1000 buckets per account
  • GCP: No known limit on a number of buckets. But there are limits for a number of operations.
  • Azure: No limit on the number of containers

Storage Class

Each cloud solution provides different storage tiers based on your needs.

AWS:

  • S3 Standard: Data is stored redundantly across multiple devices in multiple facilities and is designed to sustain the loss of two facilities concurrently with 99.99 % availability, 99.999999999% durability.
  • S3 Intelligent-Tiering: Designed to optimize costs by automatically transitioning data to the most cost-effective access tier, without performance impact or operational overhead.
  • S3 Standard-IA: Used for data which is accessed less frequently, but requires rapid access when needed. Lower fee than S3 Standard but you are charged a revival fee.
  • S3 One Zone-IA: Same as standard-IA, but data is stored only in one availability zone. It will be lost in case of an availability zone destruction
  • S3 Glacier: Cheap storage suitable for archival data or infrequently accessed data.
  • S3 Glacier Deep Archive: Lowest cost storage, used for data archival and retention which may be accessed only twice a year.

GCP:

  • Multi-Regional Storage: Typically used for storing data that is frequently accessed (“hot” objects) around the world, such as serving website content, streaming videos, or gaming and mobile applications.
  • Regional Storage: Data is stored in the same region as your google cloud dataPRoc. Has higher SLA than multi-regional (99.99%).
  • Nearline Storage: Available both multi-regional and regional. Very low-cost storage used for archival data or infrequently accessed data. There are high operation costs and data retrieval costs.
  • Coldline Storage: Lowest cost storage, used for data archival and retention which may be accessed only once or twice a year.

Azure:

  • Hot: Designed for frequently accessed data. Higher storage costs but lower retrieval costs.
  • Cold: Designed for data which is typically access once in a month. It has lower storage costs and higher retrieval costs as compared to hot storage.
  • Archive: Long term backup solution with the cheapest storage costs and highest retrieval costs.

Regions

Each cloud provider has multiple data centers, facilities and availability zones divided by regions. Usually, a specific region is used for better latencies and multiple regions are used for HA / geo-redundancy. You can find more details about each cloud provider storage specific region below:

Underlying service

AWS, GCP and Azure combined have thousands of services which are not just limited to storage. They involve and are not limited to compute, databases, data analytics, traditional data storage, AI, machine learning, IOT, networking, IAM, developer tools, migration, etc. Here is a cheat sheet that I follow for GCP. As mentioned before we are only going to discuss actual data storage services.

AWS provides Simple Storage Service(S3) and S3 Glacier, GCP uses its Cloud Storage service and Azure uses Blob storage. All these services provide massively scalable storage namespace for unstructured data along with their own metadata engines.

Namespace

Here is the place the architecture of each cloud deviates from each other. Every cloud has its own hierarchy. Be aware that we are only discussing the resource hierarchy for object storage solutions. For other services, this might be different.

AWS: Everything in AWS is under an “account”. In a single account there is one S3 service which has all the buckets and corresponding objects. Users and groups can be created under this account. An administrator can provide access to the S3 service and underlying buckets and the service to users and groups using permissions, policies, etc. (discussed later). There is no hard limit on the amount of data that can be stored under 1 account. The only limit is on the number of buckets which defaults to 100 but can be increased to 1000.

GCP: GCP’s hierarchy model is ‘Projects’. A project can be used to organize all your Google cloud services/resources. Each project has its own set of resources. All projects are eventually linked to a domain. In the image below, we have a folder for each department and each folder has multiple projects. Depending on the project requirements and current usage, the projects can use different resources. The image shows the current utilization of the resources of each project. It’s important to note that every service will be available for every project. Each project will have its own set of users, groups, permissions, etc. By default you can create ~20 projects on GCP, this limit can be increased on request. I have not seen any storage limits specified by GCP except for the 5TB single object size limit.

Graph credits

Azure: Azure is different from both GCP and AWS. In Azure we have the concept of storage accounts. An Azure storage-account provides a unique namespace for all your storage. This entity only consists of data storage. All other services can be accessed by the user and are considered as separate entities from storage accounts. Authentication and authorization are managed by the storage account.

A storage account is limited to storage of 2 PB for the US and Europe, 500 TB for all other regions, which includes the UK. A number of storage accounts per region per subscription, including both standard and premium accounts is 250.

Management

All cloud providers have the option of console access and programmatic access.

Identity and Access Management

Information security should ensure proper data flow and the right level of data flow. Per the CIA triad, you shouldn’t be able to view or change the data that you are not authorized to and should be able to access the data which you have right to. This ensures confidentiality, integrity and availability (CIA). The AAA model of security needs authentication, authorization and accounting. Here, we will cover authentication and authorization. There are other things that we should keep in mind while designing secure systems. To learn more about the design considerations I would highly recommend going through learning more about security design principles by OWASP and the OWASP Top 10.

AWS, GCP and Azure provide solid security products with reliable security features. Each one has its own way of providing access to the storage services. I will provide an overview of how users can interact with the storage services. There is a lot more that goes on in the background than what will be discussed here. For our purpose, we will stick to everything needed for using storage services. I will consider that you already have an AWS, GCP and Azure account with the domain configured (where needed). This time I will use a top-down approach:

 

Category AWS GCP Azure
Underlying Service AWS IAM GCP IAM AAD, ADDS, AADDS
Entities Users/groups per account users/groups per domain per project users/groups per domain
Authentication Access Keys / Secret Keys Access Keys / Secret Keys Storage Endpoint, Access Key
Authorization roles, permissions, policies Cloud IAM permissions, Access Control Lists(ACLs), Signed URLs, Signed Policy Documents domain user permissions, shared keys, shared access signatures
Required details for operations Credentials, bucket name, authorization Credentials, bucket name, authorization Credentials, storage account name, container name

Underlying Service

AWS: AWS Identity and Access Management(IAM) is an AWS web service that helps you securely manage all your resources. You can use IAM to create IAM entities (users, groups, roles) and thereafter provide them access to various services using policies. IAM is used for both authentication and authorization for users, groups and resources. In other clouds there can be multiple IAM services for multiple entities but in AWS for a single account there is only one point of authentication and authorization.

GCP: GCP IAM is similar to AWS IAM but every project will have its own IAM portal and its own setup if IAM entities (users, groups, resources).

Azure: Azure uses the same domain services as Microsoft and is known to have a very stable authentication service. Azure supports three types of services: Azure AD(AAD), active directory domain services(ADDS – used with windows server 2016, 2012 with DCPromo) and Azure active directory domain services(AADDS – managed domain services).

Azure AD is the most modern out of the three services and should be used for any enterprise solutions. It can sync with the cloud as well as on-premise services. It supports various authentication modes such as cloud-only, password hash sync + seamless SSO, pass-through authentication + seamless SSO, ADFS, 3rd party authentication providers. Once you have configured your AD, you use RBAC to allow your users to create storage accounts.

Entities

All cloud providers have the concept of users and groups. In AWS there is a single set of users and groups across an account. In GCP there is a single set of users and groups in every project. In Azure the users and groups depend upon how the domain was configured. Azure AD can sync all users from the domain or an admin can add users on the fly for their particular domain.

Authentication

Credentials is a way for the end-user to prove their identity. By now you might have figured out that the services that help us create users will also provide us access to the storage services. This is true in the case on AWS and GCP, but not for Azure.

For AWS and GCP their respective IAM services allow us to generate a pair of Access Key and Secret Key for any user. These keys can later be used by the users to authenticate themselves to use cloud services which include AWS S3 and GCP cloud storage. For Azure the authentication for the containers is managed by the storage account. When a storage account is created, it creates a set of keys and an endpoint along with it. These keys and the endpoint along or the domain credentials are used for authentication.

Authorization

Once a user has proved their identity, they need proper access rights to interact with the S3 buckets or GCP buckets or Azure containers.

AWS: In AWS this can be done in multiple ways. User can first be given access to S3 services using roles/permissions/policies and then on then can be given bucket level permissions using bucket policies or ACLs.  Here is a small tutorial on how can a user give permissions for an S3 bucket. There are many other ways you can access buckets, but it’s always good to use some kind of authentication and authorization.

GCP: In GCP every project has its own IAM instance. Similar to AWS, you can control who can access the resource and how much access they will have. For our use case, this can be done using Cloud IAM permissions, Access Control Lists(ACLs), Signed URLs or Signed Policy Documents. GCP has a very thorough guide and documentation on these topics. Here is the list of permissions that you might want to use.

Azure: Azure has a lot of moving pieces considering it uses Azure AD as the default authentication mechanism. For now, we will assume that you are already authenticated to AD and only need to access the resources inside a storage account. Every storage account has its own IAM which you can provide a domain user permissions to access resources under the storage account. You can also use shared keys or shared access signatures for authorization.

Required Details for Operations

Now that we have authentication and authorized to our storage services we need some details to actually access our resources. Below are the details required for programmatic access:

  • AWS S3: Access Key, Secret Key, Bucket name, region(optional)
  • GCP Cloud storage: Access Key Secret Key, Bucket Name
  • Azure: Storage Account name, Storage endpoint, Access Key, container name

 

This concludes my take on the key differences I noticed in a multi-cloud storage environment while working with the multi-cloud data controller, Zenko.

Let me know what you think or ask me a question on forum.

Baking a multi-cloud RaspberryPi for DockerCon

Baking a multi-cloud RaspberryPi for DockerCon

About one month ago, I was walking around the office and saw this goofy pink toy microwave. It was just there for the team to take funny pictures. Should I say it was a big hit at the office parties? It started its life as a passion project and a demo of WordPress REST APIs and with DockerCon19 on the horizon,  we were thinking about how we could demonstrate Zenko to fellow developers at the event. We’ve decided it should be interactive and fun – and suddenly our pink oven photo booth received a new purpose.

Zenko is a multi-cloud controller that enables developers to manage active workflows of unstructured data. It provides a single, unified API across all clouds to simplify application development. The data is stored in standard cloud format to make the data consumable directly by native cloud apps and services. With the photo booth, our intention is to create the data that we will manage using Zenko.

Setting up the RaspberryPi

This is the list of what we needed to make photo booth:

  • Raspberry Pi (in this case a Raspberry 3 model B)
  • SD Card for the Raspberry Pi
  • Micro USB cable + power adapter 5V and 2A (to power the Raspberry)
  • Camera module for Raspberry
  • USB Hub
  • Pink toy microwave
  • 7 inch HDMI touch display
  • The decoration (yes, this is essential)

I also would like to mention that I ended up using wired access to the internet. LAN cable works infinitely better than wifi for a stable connection. The “Start” button is connected to the Raspberry Pi on the GPIO Pin 18 and the LED light on GPIO 7.

Install the Python dependencies

The operating system of choice is the latest version of Raspbian Stretch Lite. It was written to the SD card (32GB in this case, but it could be way smaller as all pictures backed up on the cloud by Zenko). I used Etcher to write the operating system on the card.

All the necessary libraries:

  • Python
  • Boto3 (AWS SDK for Python)
  • Picamera (package to use the camera)
  • GraphicsMagick (a tool to create gifs)

How the demo flows

Step 1

The LED light indicates “Ready” status after the Pi is booted or the previous session is finished. The script runs in an endless loop and launches at boot.

Step 2

After the “Start” button is pressed, the script is executed. The user is guided to get ready and the Pi Camera Module will take 4 pictures in a row.

Step3

All pictures are saved in the local directory at first. Using the GraphicsMagick tool animated gif is created.

gm convert -delay 1[delay between pictures] <input_files> <output_file>
Step 4

Next, the user is asked to enter their name and email. These two values will be used as metadata for the animated gif when uploading to Zenko.

Step 5

Upload the gif. Boto is the Amazon Web Services (AWS) SDK for Python. We create a low-level client with the service name ‘s3’ and the keys to a Zenko instance along with the endpoint. All this info is available on Orbit connected to the Zenko instance.

session = boto3.session.Session()

s3_client = session.client(
service_name='s3',
aws_access_key_id='ECCESS KEY',
aws_secret_access_key='SECRET KEY',
endpoint_url='ZENKO ENDPOINT',)

s3_client.put_object(Bucket='transfer-bucket',
Key=user_name,
Body=data,
Metadata={ 'name':user_name, 'email': user_email, 'event': 'dockercon19' })

When putting the object to Zenko using client there are few small details to keep in mind:

  • Key – is a string (not a file path) that will be the name to the object.
  • Body – is a binary string (that’s why there is a call to open()).
  • Metadata – key: value pairs to be added to the object.
  • “transfer-bucket” – is the name of the target bucket in Zenko.

This bucket is a transient source bucket and appears as “temporary” in Orbit. The “isTransient” location property is set through Orbit. It is used for low-latency writes to local storage before having CRR transitioning the data asynchronously to cloud targets (GCP, AWS, Azure).

Step 6

If everything went well while putting the current object to Zenko then preview mode will start and show the resulting gif to the user a couple of times. Instant gratification is important 😉

Our freshly created data is ready to be managed!

Some of  Zenko’s capabilities are:

  • Unified interface across Clouds
  • Data is stored in a cloud-native format
  • Global search using metadata
  • Policy-based data management
  • Single metadata namespace
  • Deploy-anywhere architecture

At this point, it is a good idea to check the animated gif in the Orbit browser and make sure that it was replicated to different cloud storage locations (I already have the rule in place that replicates the object to all 3 cloud locations). Maybe create some new rules on where to replicate object or when it expires. Have a peek at statistics: memory usage, replication status, number of objects, total data managed, archived vs active data. Use the global search across all managed data in Orbit.

Check out the repository with the code for the demo. Come see me at DockerCon19! Look for the Zenko booth and our pink oven photo booth.

If you cannot make it to DockerCon this year, I will be happy to chat or answer any questions on the forum. Cheers!

Solving Connection Reset issue in Kubernetes

Solving Connection Reset issue in Kubernetes

The DNS is not the root of all problems after all! It’s been a few months now since Scality Release Engineering started noticing weird networking issues in Kubernetes. Our CI workloads were seeing unexplained connection-reset errors on communication over HTTP between pods inside the cluster. We were left wondering if the issue could be triggered by the tests themselves (so our software) or by CI components.

We didn’t have a clear understanding of it, so we were left with good old patience and trial and error. As a first step, we configured some bandwidth limitations into the HTTP server, which seemed to show some effect. The issue was no longer happening enough for us to justify investing more time to investigate deeper.

A few weeks ago, we integrated an HTTP proxy cache to reduce repetitive downloads of software dependencies and system packages. Additionally we found a way to cache the content. At this point, the nasty bug started to hit back, and this time it was angry. Out of patience, we had to take out the microscope and really understand it. I put my head down and created a tool to reproduce connection resets in Kubernetes, available on my GitHub repository.

Pulling in more brainpower, we reached out to Google Cloud support, and that’s where we found out  that we uncovered a bug in Kubernetes. Following our ticket, a Google employee opened an issue on the official K8s repo, as well as a PR with proper explanation

“Network services with heavy load will cause ‘connection reset’ from time to time. Especially those with big payloads. When packets with sequence number out-of-window arrived k8s node, conntrack marked them as INVALID. kube-proxy will ignore them, without rewriting DNAT. The packet goes back to the original pod, who doesn’t recognize the packet because of the wrong source ip, end up RSTing the connection.”

So, beware of connection reset errors, either in your dev environments or worse, reported by a customer. It might be worth checking that you’re not hitting the issue in your environment. I wrote this code to help you with this task.

Reproducing the issue is easy—basically any internal communication between pods in a cluster sending a fair amount of data triggers it. I think there’s a good chance we hit it due to the type of workload we have in Zenko. The “why” is tricky.

Unfortunately, the PR is still in review, but until it’s fixed for good, there’s a simple workaround that can be found in Docker’s libnetwork project. You can remove it once the proper fix is included in Kubernetes itself.

Now we can go back to blaming the DNS!

Photo by Scott Umstattd on Unsplash

Hitting the ground running: My first days at Scality

Hitting the ground running: My first days at Scality

For my first week at Scality as the Technical Community Manager, I was lucky enough to join the yearly Open Source Leadership Summit hosted by the Linux Foundation. Around 400 leaders of the open source community met to “drive digital transformation with open source technologies and learn how to collaboratively manage the largest shared technology investment of our time.”

Here are some interesting insights into the three days event in Half Moon Bay.

Open Data is as important as open source

The movement that becomes more and more prominent with the advancements in machine learning technologies. The idea is that data should be shared freely, used and redistributed by anyone. It should not be personal data, of course, or contain information about specific individuals. Many impressive tools for machine learning (like TensorFlow) were open sourced which is great but the real value for machine learning is data. Even if many valuable datasets are carefully guarded secrets in big corporations, lots of good is coming from governments, research institutions and corporations like Mozilla. In the spirit of growing interest in machine learning solutions, open data seems to be a younger brother to open source.

Hybrid cloud and multi-cloud are real

I was curious about what these world-class leaders and thinkers have to say on the future of the cloud and cloud storage. The most powerful comment I heard was from Sarah Novotny, program manager for Kubernetes Community: “Multi-cloud is something that people want nowadays”. The more I thought about it, the more obvious it became: multi-cloud is absolutely essential to preserve freedom and avoid vendor lock-in, to give an opportunity to the open source community to implement great ideas. Nobody wants to have a single-vendor.

On that note, a surprising discovery was The Permanent Legacy Foundation. They have a small but dedicated team working to provide permanent and accessible data storage to all people, for all time. It is a nonprofit organization and they use storage services on multiple public clouds to lower the cost (as they store your data forever) and stay flexible to archive precious memories for the next generations to come. An interesting use case and a business model that is between a charity (like Internet Archive) and a for-profit company (like ancestry.com).

Open source is made of people

Linux Foundation’s Executive Director announced CommunityBridge, a platform created to empower open source developers, the individuals and organizations who support them to advance sustainability, security, and diversity in open source technology. It’s a big step to support people who want to contribute but don’t know where to start or don’t have an opportunity. CommunityBridge provides grants and access to world-class specialists to help people contribute to open source communities, grow and innovate together. Go check it out.

More food for thoughts from fellow community managers

As a community manager, I want to foster a strong and healthy community for Zenko. The Summit was a goldmine for ideas about what can be done better to sustain a happy, engaged community. Four most important nuggets of wisdom I gathered:

  • Diversity means strength – we have to make sure that clear routines and nurturing environment are established for our communities to strive.
  • Transparency and openness is the key – plans and roadmaps are open and every opinion in the community is heard and taken into consideration.
  • Anyone can contribute – anything new or existing members want to bring to the table is important and appreciated.
  • The future is in open source communities – the number of people involved as members with diverse expertise from all over the world grows every day, collaborating and creating together, it’s a fascinating place to be in right now.

The location was a treat!

The Ritz Hotel right on the coast in Half Moon Bay. You can check out more beautiful pictures on the Linux Foundation website. That was a great three days at the beginning of my Zenko journey. Now you’ll see me more regularly on this and other channels! Let’s create some exciting things together!

Images © Linux Foundation – (CC BY-NC 2.0)

Why setting standards for software engineers matters

Why setting standards for software engineers matters

When I work on my own personal software projects, I enjoy total freedom: on the technologies I use, on how I document design – typically on a piece of paper I lost – on my coding style, on the kind of testing I do. I also don’t have any deadlines or customers waiting for my code. I practice my craftsmanship for my own enjoyment. Mind you, the team is made up of me, myself and I focused on building a database of game theory optimal strategies for poker. And every once in a while, I still look at old code I wrote and wonder what the hell I meant to do!

On the other hand, when I work as part of a team, or more often as part of a team of teams… Well, we accomplish a lot more, in our case open source and mission-critical software for 250 of the largest companies in the world. It’s great to have this feeling that I contribute to solving big problems. But that also means I’ve had to maintain code written a few years ago by colleagues, often wasting hours trying to understand how it should work, what it does, what doesn’t work and guessing at the intent. I’ve picked up old trouble tickets with no understanding of what the customer impact was or what had been tried so far to resolve the issue. It can feel like the Tower of Babel, ready to topple at any moment.

On the right track

This isn’t exactly a new idea. A famous example is the gauge of train tracks in the United States. We all know the country was built with railways. Initially some Northern states just adopted the English gauge standards — which were already a quirky 4 feet by 8-½ — but many other states, more rebellious and individualistic, came up with their own gauges, perhaps more efficient for their geography, perhaps for no particular reason, who knows? What happened is obvious. When a train had a load to carry from North to South, it had to stop at the state border, unload one train and put everything back on the next one. What a waste of time and energy!  This is the reality for so many software engineers today: they are trained and motivated to create elegant software that makes trains go super fast, but spend more than half their days offloading parcels from stopped trains; these fantastic problem solvers spend most of their time on menial, stupid tasks…Poor return on investment for the employer, totally demotivating for engineers.

Standards allow engineers to do great work and stop wasting time on frustrating, mindless tasks.

That doesn’t mean standards are easy to implement. Software engineers are certainly problem solvers, great with numbers, but they are also very creative and artists at heart. I guarantee that someday soon we’ll see a piece of code hung on a wall at Le Louvre museum. (It’s already getting closer.) As creative artists, we’re opinionated. Sometimes standards are just a best practice and it’s reasonably easy to agree on them. Sometimes they’re just a convention, not particularly better than another, but we need everybody to respect them. It’s much harder to get developers to agree to stick to them consistently but worth the effort.

Set it, then forget it

The process we use at Scality for Zenko and RING is humble and semi-democratic. We shared a Google doc where engineers provided their input. As VP of engineering, I asked my team to think of cases where they wasted time and where standards would have avoided this. Well, I got loads of opinions. We had three categories: coding standards, design standards and ticket standards. My task now is to turn this input into a small initial set of actionable standards. By starting on small set, easy for all to agree on and act on, I feel we can start building a strong habit of enjoying standards and sticking to them. Then over time we can gradually improve and add to The Standard. That way all of our work — from tickets to pull requests — will be consistent. We’ll make these available online on Zenko forums so the larger community can participate in the discussion and use/modify them, too.

I’m confident this approach will help engineers spend less time on meaningless work and more on what they really want to do. At the end of the day, this is the most important task for an engineering manager: not telling engineers what to do, but removing all the idiot work so they can focus on creativity. Stay tuned for the next update to see how we’re doing!

Photo by Mike Enerio on Unsplash