The many questions that help guide software engineers building a proof-of-concept

The many questions that help guide software engineers building a proof-of-concept

Engineering teams are often asked by a marketing team or product manager to develop a minimal viable product (MVP) based on customer needs. It is the engineering team’s task to demonstrate the feasibility of the product by building a proof-of-concept (PoC). The path that engineers take at this stage can have a major impact on the quality of the product because there is no other time when it is as simple to discard code in the interest of a better solution. For this reason, it’s ideal to use an iterative process when working on the PoC, where each iteration helps answer questions that serve the product in the long-term. Below are some questions to help guide that process.

Application Behavior

  • Have we clearly defined the expected behavior of the application (i.e. provided an operational definition of “correctness”)?
  • Have we documented input/output expectations of the program? (This has the added benefit of helping to define a test plan/acceptance criteria.)
  • Have we defined what measurements can be used to evaluate the success of our system?
  • Have we defined a service level agreement? Are there any inherent conflicting requirements (e.g. an API that needs a response time guarantee that cannot be met)? If so, what high-level design or expectation changes can we make?
  • Have we defined performance and time complexity requirements?
  • Have we defined the sizing limits of the system? How can we ensure they are optimal?
  • Have we considered potential scalability issues? What is the optimal run time for each critical part of the application? Where could potential bottlenecks be, and can they be avoided or mitigated?

Product Delivery

  • How much time do we have to implement the application?
  • Have we clearly defined what a “proof-of-concept” solution looks like, and how can we know when we’ve arrived at one?
  • What is the most minimal way of implementing the product without sacrificing readability and maintainability?
  • Are there any languages or frameworks that can support the implementation? What are the pros and cons of using each language or framework?

Process Evaluation

  • How can we be confident we’ve arrived at a good answer to our questions?
  • Have we resolved all the known unknowns (e.g. have we done the needed research)?
  • Have we minimized unknown unknowns by seeking different perspectives, brainstorming potential failures, or confirming expectations with stakeholders?

What questions do you ask when developing a PoC? Let us know in the comments below.

Photo by Bonnie Kittle on Unsplash

Spread Data Across 100s of Datacenters With No Performance Hit

Spread Data Across 100s of Datacenters With No Performance Hit

We talked about QuadIron in more details during our #ZenkoLive chat. In case you missed the talk, here is the recording of the session. The main feature of QuadIron is its speed compared to the other libraries for erasure coding. In the previous blog post we highlighted the math behind it. The ZenkoLive gave a demo of its power. We took a video file of ~90MB, split into 90 fragments of 1MB each. For these fragments we generated 160 parities, totaling 250 pieces and an overhead of 2.77. We then deleted 100 parities to simulate the loss of 100 disks. Using QuadIron we rebuilt the video file in seconds.

Watch the recording below and ask questions on the forum.

How to port S3 apps to Azure with no changes

How to port S3 apps to Azure with no changes

Zenko Connect for Azure enables developers to immediately consume Azure Blob Storage with Amazon S3 applications without any application modifications. Based on the open source Zenko CloudServer code, it’s a free and easy tool to jump from S3 to Azure Blob Storage quickly.

Zenko Connect for Azure provides an Amazon Web Services (AWS) S3 API-compatible front end translator to Microsoft’s cloud storage service, Azure Blob Storage. The core capability of Zenko Connect is translation of S3 API calls into Azure Blob Storage API calls, for application-driven operations on S3 Buckets and Objects. This enables S3-enabled applications to access Azure Blob Storage services natively, without changing their storage API calls.

Zenko Connect for Azure is offered as a free application in the Microsoft Azure Marketplace (the only charges are for Azure infrastructure costs).

Step into a multi-cloud world with the free app Zenko Connect for Azure

Zenko Connect is a stateless service. It maintains and stores all data and metadata in the associated Azure Blob Storage account. The advantages of this stateless model are the capability for scale-out, load balancing, and simplified failover capabilities.

How to port S3 apps to Azure with no changes

Zenko Connect maps Amazon S3 buckets to Azure Blob Storage accounts and containers. As an application creates S3 objects in an S3 bucket, Zenko Connect stores them as blobs in the associated Azure Blob Storage account or container.

In this release of Zenko Connect for Azure (v. 1.0), API support focuses on:

  • Core create, read, update, and delete (CRUD) S3 operations on buckets and objects
  • Efficient upload of large objects through the S3 multi-part upload (MPU) APIs

Check the full Zenko Connect for Azure startup guide and full documentation to learn more.

Photo by rawpixel on Unsplash

Funky Foxes series: Meet Julien!

Funky Foxes series: Meet Julien!

Julien took part in the 2017 Zenko x 42Paris hackathon, and was a member of the winning team! One of our open source contributors, he agreed to have his portrait featured here!

What’s your role in Zenko? How did you get involved in it? How long have you been involved with it?
Scality has organized a hackathon in my school, 42 Paris. My team carried on with the hackathon’s project for 3 month to allow Zenko to support the Backblaze B2 API for object storage. Our work is currently a pull request, and we got school credit for it.

What other open source projects have you worked on? Which ones are you currently involved with?

For now I have only worked on open source projects once but, soon, I would like learn Golang to work on Terraform, a project by Hashicorp

Which of your skills helped you the most with Zenko?

Before working on Zenko’s CloudServer, I had never used Node.js so, at the beginning, it was tricky! It’s very difficult to understand callback into a huge codebase like Zenko’s CloudServer, therefore, I would say my main skills were bravery and obstinacy.

What do you think makes Zenko special?

Zenko gives you the possibility to work with multiple cloud providers. That is essential to keep independence from companies like Microsoft, Google or Amazon. It also lets you add support for other cloud provider like Backblaze, with much cheaper rates than the big ones.

Favorite food?

Raclette !

Favorite place?

Paris (France)!

Would you rather start every sentence with “This was going to be a joke” or end every sentence with “and then I wish I knew better”

“This was going to be a joke”

Would you rather everything you say never happens or everything you think always happen?

Everything I say never happens

Would you rather start a new Github account after you reached fame and prosperity with the current one, or have to rename that famous account i-make-great-bugs?

Have it rename i-make-great-bugs

Would you rather never go outside again, or never go inside again?

Never go inside again.

Find Julien on GitHub @jjourdai.

Funky Foxes Series: Meet Giacomo!

Giacomo is one of Zenko’s core team developers and he works from the SF office. Below, a profile and interview:

Giacomo Guiulfo Scality software developer Zenko project

Giacomo Guiulfo, Scality

What’s your role in Zenko? What made you decide to get involved in it? What are you most excited about?

My role in Zenko is simple: to implement new features and ensure their proper functioning to deliver an outstanding product. As a Software Developer, the tasks that fulfill this role include: writing well-documented code, designing and implementing different kinds of tests, and applying DevOps practices to improve infrastructure and continuously deliver Zenko.

The first time I heard about Zenko was during a hackathon that Scality organized at my school, 42 Silicon Valley. At that time, my exposure to the tech industry and especially the cloud landscape was still fairly new, and even though Zenko looked very promising, I didn’t quite understand its full potential. Nevertheless, half a year later I joined Scality and couldn’t be more excited to work on Zenko, this time with more knowledge and a better picture of what it means to the tech world. I’m excited to grow with a community that has great potential to cause an impact on how we think about the cloud. On the technical side, I’m excited to work with Kubernetes (especially metal-k8s, our own flavour of Kubespray), a very promising open-source container orchestration system that is used to deploy Zenko.

Why does multi-cloud storage matter?

These days we live in an application-centric world where the enterprise needs to solve problems using the best and most appropriate tools out there. Different clouds, whether public or private, have their own benefits and provide services that can be particularly useful for an application. That’s why multi-cloud is becoming the norm for designing cloud infrastructures. However, in terms of storage, many companies still want to manage their data locally in their own data centers, which prevents them from leveraging the true potential that clouds offer. Multi-cloud storage solves this problem by allowing data mobility between public or private clouds and letting enterprises choose where their data sets need to be. It also offers enhanced data availability and durability with data spread in the different clouds, cost optimization with the ability to use the most appropriate cloud for each application and all this without any concern of vendor lock-in.

What do you think makes Zenko special?

Multi-cloud storage and therefore, data mobility and true freedom, can only exist if there is a way to interact with the different cloud platforms using the right communication protocols, perhaps in a standardized way. This is where Zenko, a multi-cloud data controller, comes in. With Zenko, you can use the AWS S3 API as one interface for multi-cloud data storage. Better yet, all data written through Zenko is kept in its native format and it can be easily used by services on the underlying clouds. It also has some amazing features like metadata search, multi-cloud replication, and more on the way. Zenko is simply a one of a kind product.

Favorite song?
Fin del Tiempo, by Amen

Favorite food?
Tiradito Nikkei

Favorite place?
Machu Picchu

Follow Giacomo on github giacomoguiulfo twitter @giacomoguiulfo and instagram @giacomoguiulfo

Four critical differences between Google Cloud Storage and Amazon S3 APIs

Many junior devops engineers have floated the pipe dream that you could simply point any application to any cloud storage without ever touching the code. As it turns out, that’s not such a pie-in-the sky idea. Zenko abstracts all major clouds under a single namespace and a single API, namely the AWS S3 API and this removes all the headaches of support multiple APIs from the get go.

It’s a common misconception that cloud storage APIs are similar enough that moving from one provider to another is just a matter of changing a host name in a configuration file. This might have been mostly true in the early stage of cloud but as you’ll see, it’s far from being true now.

Let’s compare key elements of Google Cloud Storage API (GCS) to AWS S3 API (S3):

  1. Multipart Upload or how to efficiently upload large pieces of data
  2. Object-level tagging or how to assign easily searchable metadata to objects
  3. Object versioning or protecting against accidental deletion and providing rollback to your users
  4. Replication or how to make sure they’re always a copy of your data somewhere else
Google Cloud Storage Amazon S3
Multipart upload The application needs the logic API tracks the pieces
Object-level tagging Not available Supported since Nov, 2016
Object versioning DELETE request without version moves from ‘master’ to ‘archive’. There is no concept of ‘version stack’. DELETE without version specified applies DELETE marker to master. You still get the latest version of an object if master is deleted.
Replication Data stored redundantly with Multi-Regional Storage in a fixed manner. Flexible and dynamic control with Cross Region Replication API

Multipart upload

Though GCS does have a method for merging multiple object into a single larger one, it lacks a counterpart to AWS’s popular multipart upload API. Here’s how multipart upload (MPU) works on S3:

  1. You initiate the upload by creating a multipart upload object
  2. You upload the object parts in parallel over multiple HTTP requests
  3. After you have uploaded all the parts, you complete the multipart upload.
  4. Upon receiving the complete multipart upload request, Amazon S3 constructs the object from the uploaded parts.

In that model, S3 keeps track of all the uploaded parts of a MPU. For example, aborting an MPU will remove all associated parts and they take care of managing the state of your upload for you. Objects only appears in your bucket after all the uploading is done.

In GCS, you’re in charge of keeping track of each part, of piecing them together and you have to write the corresponding logic:

  1. You upload “parts” of your object as individual objects in a bucket
  2. Perform a compose method on that list of objects, limited to 32 item per operation
  3. You repeat the compose operation by batches of 32 until the entire full final object is stitched together.

This clearly is a cumbersome process, it’s possible to merge in parallel for faster stitching together of a large object but it’s not trivial and requires a somewhat complex logic on the client side.

Developers also need to keep in mind that GCS allows a maximum of 1024 parts, while S3 allows 10,000 and both share the same 5TB maximum file size.

Update: On June 21, 2018 GCS removed the limit on the number of components in a composite object. Learn more on our forum.

Object-level tagging

Object tagging is a way to categorize data with multiple key-value pairs. It’s a useful way to locate data and is much more powerful than object name prefix-based search. You can think of object tagging as similar to Gmail tags by opposition to filesystem folders. Objects tags can also influence S3 Lifecycle and cross region replication policies. This API is relatively new for S3 but unfortunately it has no equivalent in GCS yet.

This functionality can not be migrated over from S3 to GCS, so check if your application requires tagging.

Object versioning

Both GCS and S3 support object versioning and enable the retrieval of objects that are deleted or overwritten. But both implementation differ in subtle ways that make them not fully interchangeable.

Think of the AWS object versioning as a stack of versions ordered by time:

  • Each object has a master version that always points to the most recent entry in the stack
  • Any operation that doesn’t specify a version works on that master version
  • This includes delete operations, ie deleting an object without specifying a version creates a DELETE MARKER
  • It’s possible to get or delete a specific version by using a version ID

GCP behaves differently, for each object, it maintains a MASTER version and an ARCHIVE version:

  • Deleting an object without specifying a version id moves it from master to archive and does not create a DELETE MARKER
  • Deleting a master object by using its version ID permanently destroys its data and does not move it to the archive
  • There’s no concept of a stack so even if an archive version of an object exists, deleting the master version does not promote the archive to master. A get operation on the object will return a 404 not found code.
These differences are not obvious and these two versioning implementations are not interchangeable.

Replication

Replication is a way to copy object across buckets in different geographical locations and increase both data protection and availability. It’s a storage best practice, keeping a remote copy is one of the best insurance and doubles your data durability.

S3 supports replication through their Cross Region Replication (CRR) API and supports two way synchronisation of buckets.

GCS doesn’t have a replication API and lacks the flexibility of S3 CRR but it can still redundantly store data across locations by specifying a Multi-Regional Storage bucket location. This means that GCS stores your data redundantly in at least two geographic places separated by at least 100 miles within the multi-regional location of the bucket but you cannot precisely control which region like with AWS.

Both GCS and S3 provide geo redundant storage but AWS implementation supports more locations, flexibility and API control.

Key takeaway: two incompatible cloud storage protocols

The GCP and AWS S3 API are not interchangeable and require significant adaptation of your application and client logic to migrate from one to the other. When looking at object storage compatible applications, S3 is clearly the most widely supported API by far. That’s why we decided to implement the Amazon S3 API for our multi-cloud controller, Zenko.

Try a sandbox version of Zenko very quickly from Zenko Orbit, our hosted management portal.