CI/CD worker nodes as virtual machines or K8s Containers?

Tags

architecture, CICD, Containers, decision matrix, devops, jenkins, maven, reference architecture, stress test

When it comes to CI/CD deployments, something that doesn’t show very often in documentation is the pros and cons of running your worker nodes as containers in a Kubernetes environment or as (virtual) machines in a cloud environment.

You don’t need to be adopting microservices solutions to potentially benefit from using Kubernetes environments to build apps. It would be true to say that if you use CI/CD tooling that is K8s aware like Argo, Tekton, and Jenkins X you won’t be able to maximize the use of their features.

You can find many of the tools used in CI/CD pipelines such as Sonarqube for example having prebuilt containers. Providing container images for tools is a great way to help people to see and try a tool without first going through the investment of setup. But having a container is only half the story. Ideally, you need K8s config to ensure that the (secondary) artifacts such as code coverage data, and outputs from lint are also retained.

If these considerations aren’t addressed, then will the CI/CD pipeline(s) are going to enable the acceleration in delivery and help eliminate 1st-time errors?

So our choice is a more advanced K8s configuration which ensures we don’t lose coverage data for example from our tools BUT working/transient folders such as maven’s (.m2) repository aren’t persisted. But with this is the fact we can’t treat containers as cattle and reduce the risks of transient dependencies etc. Plus the potential of a great deal more elasticity in handling build demands. Or we can keep things simple and sacrifice the benefits containers offer. This latter approach means the only gain is preinstalled software. Most CI/CD tools can have their deployment automated without too much effort, or manually deployed and imaged.

As an observation, it is interesting to note that GitHub Actions tends to steer people towards having VMs rather than containers for the workers. Although with some effort you could realize the workers as containers. But a K8s Operator for GitHub Actions would make for a very powerful solution (an unofficial implementation can be found here).

Pros (container on OS) or pure VM

Just container deployment is easier
Operational effort for managing containerized tools isn’t much different from native deployment (assuming you don’t change the container definition)
The learning curve for running up a Docker image is smaller than needing to know K8s and if not using an opinionated environment or running K8s that is a managed service

Pros (Container on K8s)

Problems of builds working because of the local staging of dependencies not explicitly identified in build files resolve (a classic problem with maven’s .m2 folder) are easily overcome by blowing away containers – this goes back to the cattle vs. pets ideas
Scaling in a secure manner can be easier – depending upon the underlying infrastructure (spinning up extra workers can be handled by K8s)
Can keep container config simple in terms of performance tuning the tools (e.g., when to thread etc.) by keeping container as simple as possible and letting K8s do the work.
Segmentation of resources via namespaces rather than network

Cons

To scale we need to extend networks and instantiate new servers
More likely to have fewer instances and tune tooling
Persistence of secondary artifacts likely to reside within the Container – meaning greater impact if the container needs to be reset

Cons

The learning curve of managing K8s is greater
Need to monitor K8s

As you can see the benefits of both approaches are pretty strong. So how do you decide? We can use a stress test also sometimes called a decision matrix – an idea I’ve written about before (Decision Matrix aka ‘Stress Test’ as a vehicle to make decisions easier).

Consideration	VMs	Containers
The team has expertise with Docker & K8s		Y
The majority/ all pipeline tools have ready-made containers or are easy to containerize	*1	Y
The workload is relatively consistent / don’t want to compute workload to be too elastic in behavior, so operating costs are very predictable.	Y	*1
Test environments used with the CI/CD pipelines can be automated in their setup and teardown.	*1	Y
There is an acceptable ROI for establishing a K8s environment. For example, if you’re approaching building a K8s environment from first principles, and there is little or no other K8s use then making a case for the ROI will be very difficult.	Y

Stress Test

*1 – can be done, but the benefits of the approach are reduced

As you can see from our matrix there is a fair case to be considered, but some really significant questions to be addressed.

As part of the Oracle Reference Architecture (RA) team we’ve taken this further and outlined these ideas, and then mapped them onto the different cloud CI/CD architectures to help make it easy to determine which RA will fit your needs best, and therefore allow you leverage the created guidance and automation to bootstrap the process. Take a look at the fleshed-out view in the Architecture Center.