Exploring open source license compliance for Docker containers

Exploring open source license compliance for Docker containers

Open Source Software (OSS) Compliance is all about giving credit where credit is due and following the terms of use. Not doing so destroys others’ trust in you.

Complying is as easy as:

  1. Find your software’s build/run dependencies
  2. Find licenses for them
  3. Do what the licenses tell you to do

Sounds pretty straight forward, right? Enter the complicated world of containers!

Don’t get me wrong, we love them. Zenko is a distributed application packaged in Docker containers and orchestrated by Kubernetes. This architecture makes it possible to deploy your software almost anywhere, benefit from agile development workflows and ensure continuous uptime. But with great power comes great trouble, too.

Anatomy of a container

In established areas of open source software, it is well known that distributing software means inheriting all the license obligations of the dependencies. But when it comes to cutting-edge technology like containers it is almost like the “wild wild west” out there. Why is the landscape of tools and practices for OSS compliance in containerized applications so limited?

Although we have been using containers to ship applications for a relatively small period of time, the adoption of it was so fast that the compliance part has been almost ignored. The ghosts of failing to comply with OSS are now coming back to haunt us.

I covered Docker container security issues in last week’s post, and I think those two issues (security and OSS compliance) are related and have similar challenges. Both security vulnerabilities and failing to comply with open source licenses come from the same place – the shady nature of container images.

How a Docker container is usually created:

  1. We download some images from a public repository to use as a base. At that point, all that gives us is a collection of tarballs containing files.
  2. We copy our application to the container (probably the only files that we know anything about).
  3. We execute scripts to install dependencies, build binaries, etc..
  4. We upload the resulting image to a repository.

This image is now shipped to the customer or other users. Now we are distributing not only our application but everything inside this image, and we are responsible for complying with all the licenses.

OSS compliance for containers

You should set up your compliance strategy from at least two angles.

Angle 1 – Control the process of creating and building images

Use only “known-good” OS base images, where you know you can get the list of all installed software along with licenses and sources (for example, Debian’s package manager can do that). On top of that, use build manifests to install the software you need and keep track of it. Avoid using Docker multi-stage build functions, as they are not reproducible.

Angle 2 – Employ scanning tools

I want to make one thing clear right away – there is no tool that will take care of everything. You need to put a process in place that combines scanning  Docker images with manual checks and research.

Tools from Linux Foundation

The Linux Foundation recently launched the Automated Compliance Tooling (ACT) project in an attempt to drive and help with OSS license compliance. There are four projects that are part of ACT you should check out:

  • FOSSology – an open source license compliance toolkit letting users run and export license and copyright scans.
  • QMSTR – this tool creates an integrated open-source toolchain that implements industry best practices of license compliance management.
  • SPDX Tools – Software Package Data Exchange (SPDX) is an open standard for communicating software bill of material information including components, licenses, copyrights and security references.

Container compliance scanners

  • VMware also donated its Tern project to ACT to help cover compliance automation in the container space. It offers a couple of different formats for the scan results. Deployment options: Docker, Vagrant, Linux.
  • Another tool I came across is Container-diff – it analyzes images in a similar way as Tern, but offers a comparison between two images. You can use this to keep track of changes made to different versions of your images. Deployment options: Linux, macOS, Windows.

Container security scanners

Security scanners are another set of software that can help you with container images. The variety of tools in this domain is more robust, maybe because security breaches are can get far messier than OSS compliance. You can use these scanners not only to check against known vulnerabilities but also as a way to produce a very thorough SBOM (software bill of material). These tools in their essence are scanners that decompose container images to building blocks (dependencies).In my humble opinion, those are leading the pack:

To learn more and see the longer list, check out my previous post on Docker container security tools.

What did we learn?

We should be more mindful when reusing Docker images and building on top of them. It is also necessary to create a workflow that includes some automation tools (scanners) and manual checks.

With more developers and leaders talking about the importance of open source compliance, hopefully, we will soon see shipping fully compliant containers as a normal, more streamlined practice. As it happens, tooling will evolve automating the process and making it easier.

Share your thoughts or ask a question on the forum.

5 free tools to navigate through Docker containers’ security

5 free tools to navigate through Docker containers’ security

In this day and age, either you are already using Docker containers or considering using it. Containers have made a huge impact on the way teams architect, develop and ship software. No wonder – they are lightweight and scalable, and help us create an extremely portable environment to run our applications anywhere.

The problem with containers

To understand the problem we need to get our basics down. A container is an instance of an executable package that includes everything needed to run an application: code, configuration files, runtime, libraries and packages, environment variables, etc.

A container is launched based on something called an image, which consists of a series of layers. For Docker, each layer represents an instruction in a text file called a  Dockerfile. A parent image is a base on which your custom image is built. Most Dockerfiles start from a parent image.

When talking about container images, we often focus on one particular piece of software that we are interested in. However, an image includes the whole collection of software that plays a supporting role to the featured component. Even a developer who regularly works with a particular image may have only a superficial understanding of everything in the image.

It’s time-consuming to track all the libraries and packages included in an image once it’s built. Moreover, developers casually pull images from public repositories where it is impossible to know who built an image, what they used to build it and what exactly is included in it. But when you ship your application along with everything that is in the container, you are responsible for security. If there is a security breach, it is your reputation that could be destroyed.

Container Scanners

It is so difficult to track what is going on under the hood of a container image. Image scanners have emerged to address this issue, giving users varying degrees of insight into Docker container images. Most of the tools execute the same set of actions:

  • Binary scan of the Docker image, deconstruct it to layers and put together a detailed bill of material of the contents.
  • Take a snapshot (index) of the OS and packages.
  • Compare this bill of material from an image against a database of known vulnerabilities and report any matches.

Even though those tools are similar they are not the same. And when choosing one to use, you need to consider how effective they are:

  • How deep can the scan go? In other words, the scanner’s ability to see inside the image layers and their contents (packages and files).
  • How up-to-date the vulnerability lists are.
  • How the results of the scan are presented, in which form/format.
  • Capabilities to reduce noisy data (duplication).

5 tools to consider

Clair – tool from well-known and loved CoreOS. It is a scanning engine for static analyses of vulnerabilities in containers or clusters of containers (like Kubernetes). Static means that the actual container image doesn’t have to be executed, and you can catch the security threats before they enter your system.

Clair maintains a comprehensive vulnerability database from configured CVE resources. It exposes APIs to clients to invoke and perform scans of images.  A scan indexes features present in the image, and is stored in the database. Clients can use the Clair API to query the database for vulnerabilities of a particular image.

Anchor – is a well-maintained and powerful automated scanning and policy enforcement engine that can be integrated into CI/CD pipelines and Docker images. Users can create whitelists, blacklists and enforce rules.

It is available as a free online SaaS navigator to scan public repositories, and as an open source engine for on-prem scans. The on-prem engine can be wired into your CI/CD through CLI or REST to automatically fail builds that don’t pass defined policies.

Below is an example of Anchor scan results of Zenko cloudserver Docker image (the list of Node.js dependencies)

Anchore Engine ultimately provides a policy evaluation result for each image: pass/fail against policies defined by the user. Even though it comes with some predefined security and compliance policies, functions and decision gates, you can also write your own analysis modules and reports.

Dagda – is a tool to perform static analyses of known vulnerabilities in Docker images and containers. Dagda retrieves information about the software installed into your Docker images, such as the OS packages and the dependencies of the programming languages, and verifies for each product and its version if it is free of vulnerabilities against the previously stored information into a MongoDB instance. This database includes the known vulnerabilities as CVEs (Common Vulnerabilities and Exposures), BIDs (Bugtraq IDs), RHSAs (Red Hat Security Advisories) and RHBAs (Red Hat Bug Advisories), and the known exploits from Offensive Security database.

On top of that, it uses ClamAV to detect viruses and malware. I also want to note that all reports from scanning the image/container are stored in MongoDB where the user can access it.

Docker Bench for Security – the Center of Internet Security came up with a solid step-by-step guide on how to secure Docker. As a result, the Docker team released a tool (shell script) that runs as a small container and checks for these best-practices around deploying Docker containers in production.

OpenSCAP – this is a full ecosystem of tools that assist with measurement and enforcement of a security baseline. They have a specific container-oriented tool, oscap-docker, that performs the CVE scans of containers and checks it against predefined policies.

OSCAP Base is the base command line NIST-certified scanner. OSCAP WorkBench is a graphical user interface that represents the results of the scanner and aims to be intuitive and user-friendly.

Wrap up

These tools appeared because Docker’s popularity has grown so fast. Only two years ago, it would have been hard to trust those tools as they only started to pop up. Today, they are more experienced with Docker containers and the challenges that came up with the rise of this technology.

Next week, I will go through other tools and scanners that are more OSS compliance-oriented.

That’s it for today. Stay safe and let’s chat on the forum.

A primer on open source license compliance

A primer on open source license compliance

With open source software ubiquitous and irreplaceable, setting up a license compliance and procurement strategy for your business is indispensable. No software engineer I know wants to voluntarily talk about open source compliance but not doing it may lead to a lot of pain. Do you remember the litigations by GPL-violations with D-Link, TomTom and many more in the early 2000s?

It’s better to keep in mind open source license compliance from the early stages of development when creating a product: you want to know where all its parts are coming from and if they are any good. Nobody thinks they will be asked for the bill of material of their software product until they are.

“Open source compliance is the process by which users, integrators, and developers of open source software observe copyright notices and satisfy license obligations for their open source software components.” (The Linux Foundation)

Objectives for open source software (OSS) compliance in companies:

  • Protect proprietary IP
  • Facilitate the effective use of open source software
  • Comply with open source licensing
  • Comply with the third-party software supplier/customer obligations

What is a software license?

Put very simply, a software license is a document that states what users are permitted to do with a piece of software. Open source software (OSS) licenses are licenses that the Open Source Initiative (OSI) has reviewed for respecting the Open Source Definition.  There are approximately 80 open source licenses (OSI maintains a list and so does the Free Software Foundation although these are called “free software” licenses), split between two larger families:

  • So-called “copyleft” licenses (GPLv2 and GPLv3) designed to guarantee users long-term freedoms, make it harder to lock the code in proprietary/non-free applications. The most important clause in these is that if you want to modify the software under copyleft license you have to share the modifications under a compatible license.
  • Permissive/BSD-like open source licenses guarantee freedom of using the source code, modifying it and redistribute, including as a proprietary product. (for example MIT, Apache.)

Despite the variety of license, companies sometimes invent new ones, modify them with new clauses and apply them to their products. This creates even more confusion among engineers. If your company is looking to use open source software, tracking and complying with every open source license and hybrids can be a nightmare.

Establish an open source license compliance policy

The goal is to have a full inventory of all the open source components in use and their  dependencies. It should be clear that there are no conflicts between licenses, all clauses are met and necessary attributions to the authors are made.

Whether you have an open source project using other open source components, or a proprietary project using open source components, it is important to establish a clear policy regarding OSS compliance. You want to create a solid, repeatable policy to outline what licenses are acceptable for your specific project.

Ways to execute OSS compliance

Manual

A surprising number of companies are still using this approach.  Basically, you create a spreadsheet and manually fill it out with components, versions, licenses and analyze it against your policy.

This works out well for smaller projects if they established a compliance policy (list of licenses or clauses acceptable in the company) from the beginning to spare themselves trouble in the future. In this scenario, every developer must review and log a software’s license before introducing the open source component.

The downside of this approach is that as the quantity of OSS components in the project grows, it becomes more difficult to keep track of relationships between licenses (if they all work together or there are conflicts). It is vital to list them as the dependency might have a different license than the actual library you are using.

Semi-Automated OSS Compliance

This is a more reliable approach and is becoming more popular, as the importance of open source compliance practices grows along with the risks associated with ignoring these practices. There are many tools available, in a way that it gets overwhelming. Why semi-automated? Because there are always false positives if the license is not explicitly referenced in the header and you still have to read through some of them to discover special terms or conditions.

Of the tools I’ve seen, there seem to be four main approaches used:

  1. File scanners  usually involve all sorts of heuristics to detect licenses or components that usually would be missed by developers. Usually, these tools offer different formats for the output.
  2. Code scanners – it is just exactly what it sounds like. You can use them periodically to check for new open source components.
  3. CI (continuous integration) scanners – these tools integrate into continuous integration or build tools. This will automatically detect all open source components in the code every time you run a build. The idea is to create a unique identifier for each open source component in the build and reference it against a database of existing components. You can also set policies to break the build if a blacklisted license is found.
  4. Component identification tools – these tools can help you produce SBOM (software bill-of-material), the list of OSS components in your product.

Tools worth checking

Open source and free tools:

Proprietary Tools:

And so many more…

Organizations:

Conclusions

For smaller projects, fully manual tracking might be sufficient to achieve license compliance. For more complex projects, especially the ones built in an agile style with regular releases, automation is better. Whichever way you choose to handle OSS compliance you should not ignore it for the sake of your project and sustaining open source community. Come by our forum to discuss or ask me questions.

How to deploy Zenko 1.1 GA on bare metal, private or public cloud

How to deploy Zenko 1.1 GA on bare metal, private or public cloud

We have been working hard on Zenko 1.1 release, and finally, it is here! Thanks to the dedicated and tireless work of Zenko team, our newest release comes with an array of useful new features. Now is a good time to try Zenko: you can deploy it on a managed Kubernetes (Azure, Amazon, Google) or on Minikube for a quick test. But what if you want to run Zenko on bare metal or on your own cloud, we suggest you deploy on MetalK8s. It’s an open source opinionated distribution of Kubernetes with a focus on long-term on-prem deployments. MetalK8s is developed at Scality to provide great functionality while reducing complexity for users and delivering efficient access to local stateful storage.

This tutorial comes from our core engineering team, and we use it on a daily basis to deploy and test Zenko. This guide has been developed as a collective effort from contributions made in this forum post.

Here are the steps we are using to deploy Zenko 1.1 with our OpenStack-based private cloud. Let’s do this!

Part 1: Deploying MetalK8s

This tutorial creates Zenko instance distributed on three nodes, but you can always repurpose it for as many servers as you wish.

1. Create three instances with the following characteristics:

  • Operating system: CentOS-7.6
  • Size: 8 CPUs and 32GB of RAM

2. If you are deploying on a private cloud create the following volumes (type: SSD):

  • one volume with a 280GB capacity
  • two volumes with a 180GB capacity

3. Attach a volume to each instance

4. SSH into a node:

$ ssh -A centos@<node-ip>

Pro-tip: If you use ssh -A from your computer into the first node this will forward your authentication agent connection and allow native ssh access to the remaining nodes in your cluster.

5. $ sudo yum install git vim -y
   $ git clone https://github.com/scality/metalk8s
   $ cd metalk8s/

6. Checking out the current stable version of MetalK8s

$ git checkout tags/1.1.0
$ mkdir -p inventory/zenko-cluster/group_vars
$ cd inventory/zenko-cluster/
7. $ vim hosts

Copy the following in your hosts file and update the IPs to your instance IPs:

# Floating IP addresses can be specified using the var `access_ip=<ip-address>` on the line corresponding to the attached server
node-01 ansible_host=10.200.3.179 ansible_user=centos # server with the larger volume attached
node-02 ansible_host=10.200.3.164 ansible_user=centos # server with the smaller volume attached
node-03 ansible_host=10.200.2.27  ansible_user=centos # server with the smaller volume attached

[bigserver]
node-01

[smallserver]
node-02
node-03

[kube-master]
node-01
node-02
node-03

[etcd]
node-01
node-02
node-03

[kube-node:children]
bigserver
smallserver

[k8s-cluster:children]
kube-node
kube-master
8. $ vim group_vars/bigserver.yml

Run this statement and copy the following into bigserver.yml (this is for the server that will provision Zenko Local Filesystem)

metalk8s_lvm_drives_vg_metalk8s: ['/dev/vdb']
metalk8s_lvm_lvs_vg_metalk8s:
lv01:
size: 100G
lv02:
size: 54G
lv03:
size: 22G
lv04:
size: 12G
lv05:
size: 10G
lv06:
size: 6G

Note: /dev/vdb on the first line is a default location of a newly attached drive, if this location is already in use on your machine you need to change this part. For example:

/dev/vda
/dev/vdb
/dev/vdc
etc...
9. $ vim group_vars/smallserver.yml

Run this statement and copy the following into smallserver.yml

metalk8s_lvm_drives_vg_metalk8s: ['/dev/vdb']
metalk8s_lvm_lvs_vg_metalk8s:
lv01:
size: 54G
lv02:
size: 22G
lv03:
size: 12G
lv04:
size: 10G
lv05:
size: 6G

10. This step is optional but highly recommended:

$ vim group_vars/all

Paste this into the group_vars/all and save:

$ kubelet_custom_flags:
- --kube-reserved cpu=1,memory=2Gi
- --system-reserved cpu=500m,memory=1Gi
- --eviction-hard=memory.available<500Mi

This adds resource reservations for system processes and k8s control plane along with a pod eviction threshold, thus preventing out-of-memory issues that typically lead to node/system instability. For more info see this issue.

11. Return to metalK8s folder

$ cd ~/metalk8s

12. And run the virtual environment

$ make shell

13. Make sure that you have ssh access to each other node in your cluster and run the following:

$ ansible-playbook -i inventory/zenko-cluster -b playbooks/deploy.yml

Deployment typically takes between 15-30 minutes. Once it is done, you will see a URL for the Kubernetes dashboard access along with a username/password in the output of the last task.

Notes

If you forget this password or need access to it again, it is saved under:

metalk8s/inventory/zenko-cluster/credentials/kube_user.creds

The MetalK8s installation created an admin.conf file:

metalk8s/inventory/zenko-cluster/artifacts/admin.conf

This file can be copied from your deployment machine to any other machine that requires access to the cluster (for example if you did not deploy from your laptop)

MetalK8s 1.1 is now deployed!

Part 2: Deploying Zenko 1.1

1. Clone Zenko repository:

$ git clone https://github.com/scality/zenko ~/zenko
$ cd zenko/

2.  Grab fresh Zenko 1.1 release:

$ git checkout tags/1.1.0
$ cd kubernetes/

3. You will be provided with the latest version of helm from MetalK8s installation we did in part 1. Now it’s time to actually deploy Zenko instance on three nodes we have prepared.

Run this command:

$ helm install --name zenko --set ingress.enabled=true
--set ingress.hosts[0]=zenko.local
--set cloudserver.endpoint=zenko.local zenko

4. Wait about 15-20 minutes while the pods stabilize.

5. You can confirm that the zenko instance is ready when all pods are in the running state. To check:

$ kubectl get pods

Note

It is expected that the queue-config pods will multiply until one succeeds. Any  “Completed” or  “Error” queue-config pods can be deleted.

Zenko is now deployed!

Part 3: Registering your Zenko instance with Orbit

Orbit is a cloud-based GUI portal to manage the Zenko instance you deployed in the previous two parts. It gives you insight into metrics and lets you create policies and rules to manage the data and replicate it between different public clouds. Here are the steps to register Zenko with Orbit.

1. Find cloudserver manager pod:

$ kubectl get pods | grep cloudserver-manager

2. Use the pod name to find the Zenko instance ID:

$ kubectl logs zenko-cloudserver-manager-7f8c8846b-5gjxk | grep 'Instance ID'

3. Now, find your Instance ID and head to Orbit to register your Zenko instance with your instance ID.

Your Orbit instance is now registered!

If you successfully launched a Zenko 1.1 instance with MetalK8s and Orbit using this tutorial, let us know. If you use this guide and get stuck or have any questions, let us know! Visit the forum and we can troubleshoot through any issues. Your input will also help to refine and constantly update this tutorial along the way. We’re always looking for feedback on our features and tutorials.

What are Kubernetes Operators and why you should use them

What are Kubernetes Operators and why you should use them

First, containers and microservices transformed the way we create and ship applications, shifting challenges to orchestrating many moving pieces at scale. Then Kubernetes came to save us. But the salty “helmsman” needs a plan to steer a herd of microservices and Operators are the best way to do that.

Hello Operator, what are you exactly?

The most commonly used definition online is: “Operators are the way of packaging, deploying and managing your application that runs atop Kubernetes”. In other words, Operators help in building cloud-native applications by automation of deployment, scaling, backup and restore. All that while being K-native application itself so almost absolutely independent from the platform where it runs.

CoreOS (who originally proposed the Operators concept in 2016) suggests thinking of an operator as an extension of the software vendor’s engineering team that watches over your Kubernetes environment and uses its current state to make decisions in milliseconds. An Operator essentially is codified knowledge on how to run the Kubernetes application.

Why Operators?

Kubernetes has been very good at managing stateless applications without any custom intervention.

But think of a stateful application, a database running on several nodes. If a majority of nodes go down, you’ll need to restore the database from a specific point following some steps. Scaling nodes up, upgrading or disaster recovery – these kinds of operations need knowing what is the right thing to do. And Operators help you bake that difficult patterns in a custom controller.

Some perks you get:

  • Less complexity: Operators simplify the processes of managing distributed applications. They take the Kubernetes promise of automation to its logical next step.
  • Transferring human knowledge to code: very often application management requires domain-specific knowledge. This knowledge can be transferred to the Operator.
  • Extended functionality: Kubernetes is extensible – it offers interfaces to plug in your network, storage, runtime solutions. Operators make it possible to extend K8s APIs with application specific logic!
  • Useful in most of the modern settings: Operators can run where Kubernetes can run: on public/hybrid/private, multi-cloud or on-premises.

Diving deeper

An Operator is basically a Kubernetes Custom Controller managing one or more Custom Resources. Kubernetes introduced custom resource definitions (CRDs) in version 1.7 and the platform became extensible. The application you want to watch is defined in K8s as a new object: a CRD that has its own YAML file and object type that the API server can understand. That way, you can define any specific criteria in the custom spec to watch out for.

CRD is a mean to specify a configuration. The cluster needs controllers to monitor its state and to match with the configuration. Enter Operators. They extend K8s functionality by allowing you to declare a custom controller to keep an eye on your application and perform custom tasks based on its state. The way Operator works is very similar to native K8s controllers, but it’s using mostly custom components that you defined.

This is a more specific list of what you need in order to create your custom operator:

  • A custom resource (CRD) spec that defines the application we want to watch, as well as an API for the CR
  • A custom controller to watch our application
  • Custom code within the new controller that dictates how to reconcile our CR against the spec
  • An operator to manage the custom controller
  • Deployment for the operator and custom resource

Where to start developing your Operator

Writing a CRD schema and its accompanying controller can be a daunting task. Currently, the most commonly used tool to create operators is Operator SDK. It is an open-source toolkit that makes it easier to manage and build Kubernetes native applications – Operators. The framework also includes the ability to monitor and collect metrics from operator-built clusters and to administrate multiple operators with lifecycle-manager.

You should also check this Kubernetes Operator Guidelines document on design, implementation, packaging, and documentation of a custom Operator.

The creation of an operator mostly starts by automating an application’s installation and then matures to perform more complex automation. So I would suggest starting small and wet your toes creating a basic operator that deploys an application or does something small.

The framework has a maturity model for provided tools that you can use to build the Operator. As you can see using Helm Operator Kit is probably the easiest way to get started, but not as powerful if you wish to build more sophisticated tool.

Operator maturity model from Operator SDK

Explore other operators

The number of custom operators for well-known applications is growing every day. In fact, Red Hat in collaboration with AWS, Google Cloud and Microsoft launched OperatorHub.io just a couple of months ago. It is the public registry for finding Kubernetes Operator backed services. You might find one that is useful for some components of your application or list your custom operator there.

Wrapping up

Kubernetes coupled with operators provides cloud-agnostic application deployment and management. It is so powerful that might lead us to treat cloud providers almost like a commodity, as you will be able to freely migrate between them and offer your product on any possible platform.

But is it a step to make Kubernetes easier or it actually adds even more complexity? Is it yet another tool that available, but just makes it more complicated for someone new? Is it all just going to explode in our face? So many questions…

If you have any thoughts or questions stop by the forum 🙂