Infrastructure as Code (IaC) should be treated the same way as any other code. That is to say that we should be considering configuration management, testing, regression, code quality, and coverage. We should be addressing these points for the same reasons we address them with our application code. Such as ensuring that we don’t introduce bugs as things evolve and develop, ensuring that the code is maintainable over a prolonged period etc.
The problem is that the only real way to test IaC is to run it. Particularly with the likes of Terraform where it is largely declarative rather than containing a lot of logic. This point is nicely conveyed by Yevgeniy Brikman’s presentation (below)
The presentation goes on to illustrate Terratest which has the look and feel of JUnit or any other xUnit test framework. Terratest is implemented in Golang, But to be honest, given the nature of Terraform ( largely declarative meaning it enables ideas of composition and not sophisticated logic) it means the writing of the tests isn’t going to demand anything clever like how to achieve polymorphic behavior through Go’s type structures.
While Yevgeniy focussed on testing by invoking an application on the infrastructure deployed something we’ve described though our Platform Test logic (more here). You may wish to test things further by interrogating infrastructure components. For example, do I have the right number of nodes in a dynamic group or are container or server logs going into the cloud monitoring services.
Performing such checks is very easy with OCI as it provides a Golang SDK making it very easy to write tests that can call the OCI APIs and interrogate the setup. Better still when considering whether the Terraform configuration will behave correctly to support dynamic/auto-scaling can be done easily without modifying the Terraform configurations as part of the Terratest logic can include Go API calls to temporarily modify scaling triggers or invoking code that can stimulate OCI dynamic features.
Testing App Configuration
There is an interesting question to be considered. There is a point of separation between when to use Terraform (or Pulumi and others for that matter) and tools better suited to application deployment and configuration like Ansible and Chef. Therefore should we separate the testing of these details? Maybe I am too purist but seeing local and remote execs in Terraform as these actions are very opaque and can be used to conceal things or unwittingly depend on the way Terraform handles its dependency graph.
Of course, Ansible has its test framework ansible_test and has the means to measure test coverage. So one possibility is to treat Ansible as a separate module, independently test it, and then integrate its use in the wider picture of deploying infrastructure.
When building Node solutions, even if you’re not going to publish the code to a public repository you’re likely to be using package.json to declare the dependencies for your app. Doing this makes it easier to build and deploy a utility. But if you’re conversant with several languages there is a tendency to just adapt your existing skills to work with others. The downside of this is small tooling nuances can catch you off guard and consume time while figuring them out. The workings of packages with NPM (as shown below) is one possible case.
If you create the package.json using npm init to create the initial version of the file, it is fairly common to set values to default. In the case of the license, this is an ISC license. This is easily forgotten. The problem here is twofold:
Does the license set reflect the constraints of the dependencies and their licenses
Does the default license reflect the position you want?
Looking at the latter point first, This is important as organizations have matured (and tooling greatly improved) when it comes to understanding how open source licensing can impact. This is particularly important for any organizations leveraging open source as part of their revenue generating activities either ‘as a service’ but also selling software solutions. If you put the wrong license here the license checking tools often protecting code repositories may reject your code, even in internal only use cases (yes this tripped me up).
To help overcome this issue you can install a tool that will analyze the dependencies and optionally their dependencies and report back on your license exposure. This tool is called license-report. Once installed (npm install -g license-report) we just need to point the tool to the package.json file. e.g. license-report package.json. We can make the results a lot more consumable by outputting the content in a number of formats. For example a simple text value:
From this, you could set your license declaration in package.json or validate that your preferred license won’t conflict,
I’ve been a fan of Railroad syntax diagrams for a long time. I’ve always found them an easy way to understand the syntactical options and the reserved/keywords in an efficient manner.
Example of Railroad Syntax diagram
I have been digging around in the documentation to find a keyword in the OCI Policies syntax that the common cases don’t use. After a bit of rooting around, I found what I needed. But a Railroad representation would have helped me get the expression correct effortlessly and without so much effort.
Once I solved my problem, I decided to see if I could find something that could easily create the railroad diagrams and encountered a fantastic bit of code on GitHub from Tab Atkins Jr. It’s a neat bit of JavaScript, which can even be run from their GitHub pages – go here. Tab has taken the time to document the tool well, so working out the syntax to define the diagram is straightforward (not that you need to read it much as the tool is well written).
The following diagrams show the syntax for writing OCI Policies in a single image and with the full syntax broken into 2 images to make it a little easier to read on the screen. But also address the fact often you don’t need the Where clause.
If the diagrams need to be updated the source to use with the tools is in my GitHub repository. But a really cool feature of the utility is that the information to populate the editor view is included in the URL (does make for a long URL) but it means this link will take you directly to the view & editor if you want to tinker with the definition. So the links are:
I’ve designed a variety of GraphQL schemas and developed microservice backends. But not done much with configuring the Apollo implementation of a GraphQL server until recently. This may reflect the fact my understanding of JavaScript doesn’t extend into the world of Node.JS as much as I’d like (the problem with being a multi-language developer is you’re likely to find your way around many languages but never be a master of one). Anyway, the following content is about the implementation within a GraphQL server part of a solution. It may be these pointers are just for my benefit you might find them helpful as well.
To make it easy to reference the code, we’ve added entries (n) into the code, where n is a number. This is not part of the code. But there to make the different lines referenceable. Where code should go but is not relevant to the point being made I’ve added ellipsis (…)
Dynamic loading and server configuration
import { ApolloServer } from 'apollo-server';
import { loadFilesSync } from '@graphql-tools/load-files';
import { resolvers } from './resolvers.js'; (1)
import ProviderInternalAPI from './ProviderInternalAPI.js'; (1)
import EventsInternalAPI from './EventsInternalAPI.js'; (1)
const server = new ApolloServer({
debug : true, (2)
typeDefs: loadFilesSync('./schema.graphql'), (3)
resolvers,
dataSources: () => {
return {
eventsInternalAPI: new EventsInternalAPI(), (4)
providerInternalAPI: new ProviderInternalAPI() (4)
pro
};
}});
There is the potential to dynamically load the resolvers rather than importing each JavaScript file as we see on lines (1). The mechanics to do this is documented here. It would be cool if an opinionated implementation was provided. As shown by (3) we can take a independent schema file being loaded. The Apollo example approach for this didn’t seem to work for us, although both approaches make use of graphql-tools in a synchronous manner.
We can switch on debugging (2) for the GraphQL server, although the level of information published doesn’t appear to be significant. Ideally this setting is changed for production.
Defining the resolvers
The prefix for each resolver (1) must correlate to the name in the schema of the mutator or query (not the type as you would expect with Java). Often we don’t need all the parameters for the resolver. The documentation describes replacing each unused parameter with one or more underscores (i.e _, __ ). The underscore denoting the field not in use. However we can satisfy the indication of not being used, but keep the meaning of each position by using the underscore then a name (i.e. _parent, _args ) as shown in (2).
By taking the response into a variable (3) we can optionally log it. Trying to return using invocation line would result in the handler object rather than the payload itself. By taking the result into a variable we can log the content if desired and return the content.
The use of the backward quote is a node feature. It allows us to incorporate variables into a string by referencing it within ${}(4).
We need to supply the GraphQL server with instances with a layer of code that will interact with the resolvers. We can instantiate the instances in the declaration. The naming of the object is important (4) to the resolver.js (declarations).
import { useLogger } from "@graphql-yoga/node";
...
latestEvent (1): async (_parent, _args, { dataSources }, _info) (2) => {
if (log) { console.log("resolvers - get latest event"); }
let responseValue = await dataSources.eventsInternalAPI.getLatestEvent(); (3)
if (log) { console.log(`(4) Resolver response for latest event:\n ${responseValue}`); }
return responseValue;
},
To handle the use of resolvers within a larger resolver we need to declare the resolution outside of the Query and Mutator blocks (but inside the whole declaration block)(1). The name provided needs to match the parent entity that the query resolver contributes to.
To then provide values from the outer resolution we need to prover to the chained resolution use the naming as represented in the GraphQL schema as shown by (2). The GraphQL engine will resolve the mapping values.
Web resolver URL
// GET
async getProvider(code) {
console.log("getProvider (%s) directing to %s",code,this.baseURL);
return this.get(`provider?code=${code} (1)`);
}
The URL parameters need to be appended to the base URL path for the parent class to use in the invocation as shown by (1). The Apollo examples showed a setter option but we didn’t see the URI being addressed properly. This approach produces the relevant requirement.
Let’s be honest we’re not all command line warriors when it comes to Kubernetes. I can get around Kubectl but the time it takes to key in a CLI command you can get the same information in a couple of clicks of the UI. For me, Kubectl is for automating my tasks, for example pushing a local build into a image repository, initiating a refresh deployment and ensuring old container instances are flushed out.
Lens view
K8s Dashboard
The only problem is that the K8s dashboard requires a lot of config work to secure its deployment, and do you want to be deploying such tools in a production environment? A colleague suggested I look at Lens. A tool that offers both Personal (free) and Team licensed versions and both versions deploy to Windows, Linux, and Mac natively so installation doesn’t require any messing around.
I have to say I have been very impressed with Lens. Everything useful about the K8s dashboard is here, but without needing to deploy anything to your cluster as lens runs as a local thick app. Just like the K8s dashboard you need the privileges to talk to the K8s APIs. But the Visualization is all local and the way the data is retrieved means the UI is very reactive.
Lens supports extensions, although to date I’ve not tried any of the extensions personally – you can see a list of extensions here. I will be trying out a couple Of extensions in due course. For example:
Network Policy Viewer
Certificate Info (via K8s secrets)
Lens goes further by the fact you can connect to multiple clusters from a single viewer instance. So no need for multiple deployments of the dashboard or creating an additional management cluster.
I only have one minor grumble today with the implementation. When using a console facility to access a container it is not possible to paste into the console any text/script or copy out any of the log contents. The latter can make generating things like JIRA tickets a bit annoying. So far I’ve worked around it by creating screenshots.
When configuring Fluentd we often need to provide credentials to access event sources, targets, and associated services such as notification tools like Slack and PagerDuty. The challenge is that we don’t want the credentials to be in clear text in the Fluentd configuration.
Using Env Vars
In the Logging In Action with Fluentd book, we illustrated how we can take the sensitive values from environment variables so the values don’t show up in the configuration file. But, we’ve seen regularly the question of how secure is this, can’t the environment variable be seen by everyone on that machine?
The answer to this question comes down to having a deeper understanding of how environment variables work. There is a really good explanation here. The long and short of it is that environment variables can only be seen by the process that creates the variable and any child process will receive a copy of the parent’s variables.
This means that if we create the variable in a shell, only that shell and any processes launched by that shell can see the environment variable. So as long as we don’t set variables up as part of a system-level configuration then we already have a level of security. So we could wrap the start of Fluentd with a script that sets the environment variables needed. Then everything launches that script.
The following isn’t unique to OCIR, as it will hold true for any K8s Deployment YAML configuration that works with an Open Container Initiative compliant registry. To define the containers part of the YAML file we need to provide an attribute that can be used to confirm the legitimacy of the request. To do this we need to supply a token. However, we don’t want this token to be visible in plain sight in our YAML. The solution to this is to set up a secret within Kubernetes.
In the following YAML extract, we can see the secret is named.
This does mean we need to create the secret. As this is a one-off task the easiest step is to create the secret by hand. To do that we use the command:
This naturally leads to the next question where do we get the secret?
This step is straightforward. Navigating using the user icon top right (highlighted in the screenshot below), select the User Settings option to get to the screen shown below. Then use the right-hand menu option highlight (Auth Tokens). This displays a section of the UI showing your current auth tokens and provides a button that will popup a window to guide you through creating a new auth token.
In a previous blog (here) I wrote about the structure and naming of assets to be applied to OCIR. What I didn’t address is the interesting challenge of what if my development machine has a different architecture to my target environment. For example, as a developer, I have a nice shiny Mac Book Pro with the M1 chipset which uses an ARM architecture. However, my target cloud environment has been built and runs with an AMD64 chipset? As we’re creating binary images it does raise some interesting questions.
As we’re creating our containers with Docker, this addresses how to solve the problem with Docker. Other OCI Compliant containers will address the problem differently.
Buildx
Buildx is a development feature in Docker which makes use of a cross-platform build capability. When using buildx we can specify one or more build platform types. These are specified using the –platform parameter. In the code below we use it to define the Linux AMD64 architecture mentioned (linux/amd64). But we can make the parameter a comma-separated list targeting different platform types. When that is done, multiple images will be built. By default, the build will happen in sequence, but it is possible to switch on additional process threads for the Docker build process to get the build process running concurrently.
Unlike the following example (which is only intended for one platform, if you are building for multiple platforms then it would be recommended that the name include the platform type the image will work for. For production builds we would promote that idea regardless, just as we see with installer and package manager-related artifacts.
If you compare this version of the code to the previous blog (here) there are some additional differences. Now I’ve switched to setting the target tag as part of the build. As we’re not interested in hanging onto any images built we’ve included the target repository in the build statement. Immediately push it to OCIR, after all the images won’t work on our machine.
A container registry is as essential as a Kubernetes service as you want to manage the deployable resources. That registry could be the public Docker repository or something else. In most people’s cases, the registry needs to be private as you don’t want to expose your product assets to potential external tampering. As a result, we need a service such as Oracle’s container registry OCIR.
The re of this blog is going to walk through how to push a container you’ve built into OCIR and a gotcha that can trip up users if you make assumptions about how the registry works.
Build container
Let’s assume you’re building your microservices locally or retrieving vetting 3rd party services for use. In both cases, you want to manually push your assets into OCIR manually rather than have an automated build pipeline do it for you.
This creates a container locally, and we can see the container listed using the command:
docker images
Setup of OCIR
We need an OCIR to target so the easiest thing is to manually create an OCIR instance in one of the regions, for the sake of this illustration we’ll use Ashburn (short code is IAD). To help with the visibility we can put the registry in a separate compartment as a child of the root. Let’s assume we’re going to call the registry GraphQL. So before creating your OCIR set up the compartment as necessary.
fragment of the compartment hierarchy
In the screenshot, you can see I’ve created a registry, which is very quick and easy in the UI (in the menu it’s in the Developer Services section).
The Oracle meu to navigate to the OCIR servicethe UI to create a OCIR
Finally, we click on the button to create the specific OCIR.
Deployment…
Having created the image, and with a repo ready we can start the steps of pushing the container to OCIR.
The next step is to tag the created image. This has to be done carefully as the tag needs to reflect where the image is going using the formula <region name>/<tenancy name/<registry name>:<version>. All the registries will be addressed by <region short code>.ocir.io In our case, it would be iad.ocir.io.
docker tag graph-svr:latest iad.ocir.io/ociobenablement/graphql-svr:v0.1-dev
As you may have realized the tag being applied effectively tells OCI which instance of OCIR to place the container in. Getting this wrong can be the core of the gotcha previously mentioned and we’ll elaborate upon it shortly.
To sign in you’ll need an auth token as that is passed as the password. For simplicity, I’ve passed the token in the docker command, which Docker will warn you of as being insecure, and suggest it is passed in as part of a prompt. Note my token will have been changed by the time this is published. The username is built on the structure of <cloud tenancy name>/identitycloudservice/<username>. The identitycloudservice piece only needs to be included for your authentication is managed through IDCS, as is the case here. The final bit is the URI for the appropriate regional OCIR address, as we’ve used previously.
With hopefully a successful authentication response we can push the container. It is worth noting that the Docker authenticated connection will timeout which is why we’ve put everything in place before connecting. The push command is very simple, it is the tag name assigned to the artifact including the version number.
When we deal with repositories from Git to SVN or Apache Archiva to Nexus we work with a repository that holds multiple different assets with multiple versions of those assets. as a result, when we identify an asset uniquely we would expect to name things based on server/location, repository, asset name, and version. However, here each repository is designed for one type of asset but multiple versions. In reality, a Docker repository works in the same manner (but the extended path impact is different).
This means it becomes easy to accidentally define a tag with an extra element. Depending upon your OCI tenancy privileges if you get the path wrong, OCI creates a new root compartment container repository with a name that is a composite of the name elements after the tenancy and puts your artifact in that repository, not the one you expected.
We can address this in several ways, first and probably the best option is to automate the process of loading assets into OCIR, once the process is correct, it will remain correct. Another is to adopt a principle of never holding repositories at the root of a tenancy, which means you can then explicitly remove the permissions to create repositories in that compartment (you’ll need to explicitly grant the permissions elsewhere in the compartment hierarchy because of policy inheritance. This will result in the process of pushing a container to fail because of privileges if the tag is wrong.
Visual representation of structure differences
Repository Structure
Registry Structure
Condensed to a simple script
These steps can be condensed to a simple platform neutral script as follows:
This script would need modifying for each container being built, but you could easily make it parameterized or configuration drive.
A Note on Registry Standards
Oracle’s Container Registry has adopted the Open Registries standard for OCIR. Open Registries come under the Linux Foundation‘s governance. This standard has been adopted by all the major hyperscalers (Google, AWS, Azure, etc). All the technical spec information for the standard is published through GitHub rather than the main website.
You must be logged in to post a comment.