Logging In Action – almost there

Tags

, , , , ,

We’ve got the peer review comments back on the completed 1st draft back of the book. So I’d like to take this opportunity to thanks those who have been involved as peer reviewers, particularly those involved in the previous review cycles. I hope the reviewers found it satisfying to see between iterations that their suggestions and feedback have been taken on board and where we can.

The feed back is really exciting to read. Some tweaks and refinements to do to address the suggestions made.

The work on the Kubernetes and Docker elements and the chapter which has become available on MEAP has helped round that aspect off. But importantly, the final chapters help address the wider challenges of logging, and some of the feedback positively reflects this.

To paraphrase the comments, we’ve addressed the issues of logging which don’t get the attention that they deserve. Which for me is a success.

Oracle Cloud + GitHub Actions

Tags

, , , , , , , , , , , , , ,

While there is a deserved amount of publicity around the introduction of ARM compute onto OCI with the ARM Ampere CPU offering, and the amazing level of always free compute being provided (24GB of memory and 4 cores which can be used in any combination of servers). There have been some interesting announcements that perhaps haven’t drawn as much attention that they deserve. This includes OCI support for GitHub Actions, plus several new DevOps services and an Artifact Registry. We’ll comeback to the new services in another post. Today, let’s look at GitHub Actions.

Support for GitHub Actions

GitHub recently introduced a new feature called GitHub Actions. In essence it has become possible to establish a basic event driven CI/CD pipeline driven directly by events in GitHub. The Actions (sometimes also referred to as workflows) are executed by what are known as GitHub Runners, if you’re familiar with Jenkins, then you could think of these as slave nodes. The smart thing is that you can either use GitHub provided (aka GitHub Hosted) compute to execute the actions (effectively a bit of Azure) and pay for the service to execute the actions. If you’re using GitHub professionally or setup your own workers in other clouds or even potentially on-premise, these are known as self hosted GitHub runners. The other clouds includes Oracle who have provided a configuration that can build runner nodes (and yes you can use the Ampere free compute).

Launching a OCI Always Free GitHub Runner

There are a number of ways to get a runner working both using paid VMs or using the always free capacity.

For someone who produces utilities and makes them freely available; and wants to run at least unit tests on code and code checks. The ability to automatically run the unit tests is particularly handy. When extending an existing piece of functionality I tend to focus on testing that bit of code, this means I don’t need to rerun lots of tests, I can set the actions up to repeat those tests as I make changes and if I break something then I’ll know immediately. So combining the free compute with the ability to auto execute tests on a commit without needing to run my own dedicated Jenkins server is perfect.

The compute can be setup either by using the scripts generated by the GitHub UI for setting up runners, which can be seen when you navigate to the Runner configuration (settings menu at the stop of the page, and Runners on the left menu) as highlighted here with 1 & 2

We need to set the Operating System and Architecture for the runner (3), this will impact the binaries needed, and the code offered in the centre of the screen (4). This does mean that you need to run these scripts and ensure that the processes start and stop safely within the VM. The alternative in Oracle cloud is to use predefined image using a link like this (https://cloud.oracle.com/resourcemanager/stacks/create?zipUrl=https://github.com/oracle-quickstart/oci-github-actions-runner/releases/download/orm-deploy/orm.zip). As you can see from the URL itself, it is passing to the OCI a stack that has been already created (and made freely available in GitHub).

The library of Oracle quick start configurations can be found at https://github.com/oracle-quickstart

Following the URL will take us into the Resource Manager which presents a multi paged form capturing the relevant details that will enable it to generate the server and required supporting infrastructure in the correct compartment. The first step is to acknowledge an constraints, licensing conditions that the package imposes as highlighted with the red square.

Once complete we casn move onto the next page. We can see the example screens below the A1 (Ampere) CPU being selected using up 4 GB of RAM and 1 OCPU of the 4 available as free.

VM Shape set to the new AMD Ampere chipset,. The final step is whether the node gets positioning within an existing in part of our network.

Aside from choosing the compute shape, I’ve selected a version of the Oracle Linux v8 suitable for the processor type.

You’ll not, you will also be promoted for the GitHub details, with your repository to be supported, and the associated token, which comes from the presented script values in the initial Runner setup, we highlighted the value in the first diagram as (5) and should correlate to (A) in the details below.

There is a gotchya to be careful of. When we put our repository into a browser URL it really isn’t bothered by the terminating slash or not in the name, but as shown below as (B) we have included the terminating slash. The authentication of communication is very sensitive to such details. People have reported that they see issues with the Runner being connected to GitHub, but all executions of actions fail (when we saw this within 1-2 seconds) and when you try to drill in, there is no information. To confirm the issue, you need to SSH into the runner server and navigate into the /actions-runner/_diags folder to see if things being reported in the log files are showing as going awry.

Once the runner is deployed, then it will show up after all while in the list of runners. Note in this screenshot we have a fresh runner having experienced the previously described connectivity issue. To see the runners defined, again we go through Settings (1) on the top menu and Actions (2) on the left. Once the runner has made successful contact then it will show in the list of runners.

Runner’s Job

With the runner ready we need to create a Job. This bit is a little tricky. You can develop from scratch your own Job definitions, but it will require some effort in mastering the YAML notation. The alternative is to see if there is a prepared job definition. It is possible to use pre-existing templates by clicking on the New workflow button as highlighted below. This will take you the catalogue of predefined flows.

The following image shows the predefined list of workflows. GitHub will make a recommendation based on its analysis of the repository, in this case it has filtered out to show Python workflows. But you’re not bound to these, this is just GitHub trying to simplify the effort involved.

To get a job definition started, from the workflow templates, select the Set up this workflow button. This will result in a copy of the YAML configuration file being put into a folder within the project as you can see in the next image.

Looking more closely at the YAML file describing the action, we see the following, which describes when the action be executed, in this case a push or pull on the main branch.

The default example is geared to Ubuntu which brings us to one the niggles we’ve encountered. The job will try to setup the environment for us.

GitHub Runner Constraint

You’ll note that there is a step that uses python-version: ${{ matrix.python-version }} – this is going to try setup Python for the three versions listed in the matrix list defined. That doesn’t sound problematic until you dig into the documentation provided by the link https://help.github.com/actions/language-and-framework-guides/using-python-with-github-actions. This web page describes how Python works with GitHub Runners. For a Python on a hosted GitHub runner you have to do things yourself. Fortunately the VM configuration provided mean Python 3.6 is installed that will work on the ARM architecture. But we do need to remove a chunk of logic that sets up the runner. Incidentally, if you look at the link to the supported Python versions, they work for both Windows and Linux but only on an x86 CPU.

We do still want to retain the use of PIP to ensure that latest versions of the packages are always in use.

# This workflow will install Python dependencies, run tests and lint with a variety of Python versions
# For more information see: https://help.github.com/actions/language-and-framework-guides/using-python-with-github-actions

name: Python package

on:
  push:
    branches: [ main ]
  pull_request:
    branches: [ main ]

jobs:
  build:

    runs-on: ubuntu-latest
    strategy:
      fail-fast: false
      matrix:
        python-version: [3.7, 3.8, 3.9]

    steps:
    - uses: actions/checkout@v2
    - name: Set up Python ${{ matrix.python-version }}
      uses: actions/setup-python@v2
      with:
        python-version: ${{ matrix.python-version }}
    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip
        python -m pip install flake8 pytest
        if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
    - name: Lint with flake8
      run: |
        # stop the build if there are Python syntax errors or undefined names
        flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
        # exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
        flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
    - name: Test with pytest
      run: |
        pytest

Having removed the Python setup configuration, we need to ensure the utilities Flake8 and PyTest can be called either by modifying the PATH variable or tweaking the way the utilities are invoked. Rather than messing with the VM OOTB configuration or writing a script that is pulled onto the runner to perform further setup I’ve opted to adjust the call. These changes can be seen at https://github.com/mp3monster/oci-utilities/blob/main/.github/workflows/python-package.yml, and …

# This workflow will install Python dependencies, run tests and lint with a variety of Python versions
# For more information see: https://help.github.com/actions/language-and-framework-guides/using-python-with-github-actions

name: Python package

on:
  push:
    branches: [ main ]


jobs:
  build:

    runs-on: oci
    strategy:
      fail-fast: false


    steps:
    - uses: actions/checkout@v2
    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip
        python -m pip install flake8 pytest
        if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
    - name: Lint with flake8
      run: |
        # stop the build if there are Python syntax errors or undefined names
        python -m flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
        # exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
        python -m flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
    - name: Test with pytest
      run: |
        python -m pytest

We’ve also made use of the requirements.txt to ensure the OCI Python SDK is installed for use. We now have an action that performs code analysis with Flake8 (details also here) which will spot any obvious coding errors and pytest which includes the ability to locate and run the native Python unittest based tests. All the results are then bundled up and returned back to GitHub.

By clicking on the build we can review all of the steps the run takes and the results of flake8 and pytest for example.

Possible Risk

GitHub’s documentation raises the risk of connecting Actions to public repositories. This comes down a couple of things. Firstly, the actions of public users could depending upon your configuration trigger the workflow (which is why we’ve removed the pull request from the triggers). Within GitHub you can limit on a public project who can push, in this case, we’ve put the maximum limits into the Settings, so only approved collaborators are allowed to perform any of the approved operations like fork etc.

Whilst this reduces the opportunities to keep triggering the workflow – consuming your resources, which may cost but also could be a form of denial of service by the fact it will be disrupting your genuine worker runs.

There are a couple of other mitigations that could be applied to further tighten this up:

  • schedule the Runner so that it is shutdown and started up only when people are likely to be working on the code base
  • Attach the job not to a public view of the code, but a private branch. All development is against the private branch and then, a successful result triggers the runner to merge the changes into the publicly visible branch.

Conclusion

Knowing the limitations, and the syntax of the workloads not looking difficult to master, we’ll probably develop more unit tests and deploy them. Plus also go back to the Groovy stuff as well. The big test will be how easy it is to reconstruct a container image of the Log Simulator.

Useful Links

Crazy streaming music service idea?

I don’t often write about music, but this demented idea came to mind largely because Mrs Monster is more of a visual person whereas I prefer to have music on. The obvious intersection is music videos, so why can’t Spotify keep the metadata for videos on you tube. If you play a song on Spotify with a video, then YouTube is told to stream it to you if you have a session active.

The hardest bit for this would be linking the media together with the metadata but I’m sure artists and fans a like would crowd source that metadata quickly enough. Should You Tube actually pay artists for streams then it helps bolster the musician’s streaming income.

It would be interesting to see what it did to the music video interest if such as an idea took off as there would be incentive to get videos for every track. Perhaps with a little luck it would encourage artists to support grass roots film makers.

Some might say why not just stream video playlists, well how many people’s TV speakers get anywhere near the fidelity of a good hifi? Audio streams within many You Tube videos are often inferior because the key element is the visual not the audio.

The other thing is music on You Tube is entire album or just the tracks with videos recorded. But artists often have incredible b-sides or remixes – they currently don’t get loaded onto You Tube.

Brain dump over.

The Air Gap a Security Fallacy?

Tags

, ,

Securing systems through an air gap is an idea goes back decades, and through the 50s to as recently as the 2000s the idea that you could safely and successfully run and maintain systems by simply not connecting to a network protects them from vulnerabilities may have been true. But in the last ten or more years, I would argue it is a fallacy, that can lull people into a false sense of security and therefore more likely to take more risks (or atleast not be as careful as we should be). This idea is a well established piece of psychology with modern cars (here for example). This is known as Risk Compensation (also more here) or an aspect of behaviour adaptation.

Whilst this is rather theoretical, perhaps we should illustrate in practical terms why it is both a fallacy, and in the modern day simply impractical.

The need for data

In the 50s, 60s, and possibly 70s, the rate of change and amount of data that any software system that needed to be securely cross through the air gapping was comparatively small. Master data (be that map data, configuration files for the software) and even the amount of code needed was small and didnt change at a frequency it does today. As a result it was practical for such data to be shipped into an airgapped system entirely by rekeying and atleast every bit and byte could be eye balled and the frequency of that data movement was slow.

Today the data volumes are simply in a different league, gigabytes, terabytes and petabytes, not bytes and kilobytes. As a result data transfer into or out of an air gapped solution is done by the use of a data device be that a floppy disk or a USB stick. For example maps aren’t simply a coastal outline, the are typically fully featured with topographic detail, even over laid with imagery with incredible levels of precision. We can no longer eyeball the data being moved to ensure nothing lurks within.

This inevitably means something malicious stands a chance of being transferred without being spotted. Of course the counter is to this is to use anti-malware tools, but whilst it reduces the risk it isn’t a guarantee, and I’ll explain why in a moment.

We build solutions from many many parts

Software today is built from tens and hundreds of millions of lines of code. It is reported that Linux repository contains almost 30 million lines of code (here) having grown by a million lines in just a single year. When Java as a programming language was formally released, I bet no one thought they’d be releasing language updates every 6 months, but that’s what happens now. Even with the open source principle that by having the code open to all, many eyes make all bugs shallow, doesn’t mean there won’t inevitably be bugs, and found after code has deployed. Fortunately major open source projects tend to benefit from best practises and sponsored tools to help, as well as plenty of eyes so the bugs are found and addressed. But this does mean that patches are needed, and patches need to be applied quickly before the bug can cause a problem or worse a vulnerability.

Software complexity has reached a state where we build solutions by aggregating other parts which in turn are built through dependencies. Through this accumulation we see frameworks being compiled. This is the essence of Springboot and many other technologies, using tooling to define our framework needs and it pulling together the accumulation of libraries needed.

It would be easy to say, well don’t build software through the use of components, but reality is that we want to build our secure systems in a cost effective manner, that means cloud, that means using modern software development techniques, that we elevate our progress by using building blocks.

Defence follows attack for malware

We tend to assume safety by having malware tools, and the premise is that we protect our air gap by using anti malware on the devices we use to transfer data across the gap. The problem is that when it comes to malware we only get new detection finger prints for each new attack that is discovered, and that detection is not guaranteed to be formulated before at least a few organizations have become victim or recognized something suspicious. It is only after the cause or suspect payload has been analyzed by malware providers and determined the way to finger print the malware. If malware is identified attacking some organizations the chances are it has penetrated others.

It is the fact that defence follows attack that our personal machines are why malware tools not only scan payloads arriving but also as scheduled scans of complete systems.

The problem is that malware can also drive risk compensation, impart because not because not many of us are knowingly impacted by malware before a fingerprint is rolled, as a result we tend to trust our malware to protect us regardless.

The human factor

Content traversing the air gap is by the very nature a human process, and humans are both prone to error, such as failing to follow processes that are in place to protect the gap. Worse, the human processes create an opportunity for someone intending ill intent to exploit process loopholes or errors. Stuxnet was the ultimate proof of this kind of failure (see here). Other examples of human elements involved in breaches – Snowden, Chelsea Manning and many, many others.

Breached trusted sources

Sometimes referred to as a supply chain breach (more here). we put trust into the third parties who provide software even through manual channels, this is an aspect of the composition problem and trying to catch it is problematic. But those who have fallen foul of Solarwinds whilst the victim of a supplier solution being subverted, are also to a degree as a victim of human error. It has been identified that had the users outbound network had been constrained to control outbound flows would not have had he Solarwinds compromise work.

Tempest and other technology biproducts

The Tempest style of attack (more here) has been around many years, which basically works by old style cathode ray screens (for those of us who have been around a few years) giving of radiation. If you’re in a reasonable distance (tens of metres) then it is possible to dial into the radiation and end up seeing what the intended screen is seeing. Anything appear on screen, appears on the listening device. Whilst this problem has been solved – from Faraday cage environments to just using LCD style screens, it is nieve to think that there aren’t other possible similar attacks. Malware turns speakers into high frequency transmitters, even microphones can be subverted to emit data.

Realisation

The key point I’m driving at, is operating in isolation is unrealistic, too much data, too much code, too many dependencies. Even if we wrote everything from the ground up, had the resources to patch and maintain all that code, then there are still ways to jump the gap. So let’s accepted we’re connected, don’t sucker our selves with Risk Compensation or use language that suggests we aren’t connected like describing mechanisms that mitigate the connectivity as oxymoron’s like virtual air gaps.

What is the solution?

I’m not advocating we expose ourselves to the ‘wild west’ and accept the inevitable consequences, far from it.

When I was learning to drive, the most valuable piece of advice I was given was assume everyone else on the road is a complete idiot, and that will mean the chances of an accident will far lower. Both accident’s I’ve been involved with came about because I put too much trust in the other road user to obey the Highway Code. Fortunately neither incident was serious.

But if we take this approach of assuming everyone else is an idiot when developing, then we position ourselves well. I look at all the conditions in which I might find my code, and what other code around me could do, and do wrongly then I can take the appropriate defences assuming that these things will happen. If we all work that way, then the chances are that if I make a mistake then someone else in their part of the solution is likely to have a mitigation. In practical terms the UI should validate data, but my mid tier and backend should validate the data.

There is the question of cost, so there does need to be a cost-benefit decision, the cost of development vs the cost of a security breach.

There are a broad range of tools to help us develop security into code and configuration from SonarQube to Anchore and WhiteSource. But these tools are only of value to us when understand what they bring, how to use them effectively but most crucially appreciate the limits. Blindly trusting to the tools will only take us back to the original problem of risk compensation.

A final thought

Whilst I have pointed to trying to develop to prevent all issues, but ultimately we come to a cost/benefit trade off. Those trade offs need to be understood. The principle of a RAID log in software projects has been around for a long time. But culturally there is a real challenge here. A large number of risks is seen as a very bad thing, particularly when there is no mitigation. As a result only the significant risks tend to be logged. The truth is risks themselves aren’t the issue, the challenge is whether we understand the consequences of the risk and the probability of that risk occurring. The current status quo means that the sum of lots of small risks are never seen. We should encourage people to identify in the development the risks, just like we can take code with TODO annotations that can be pulled together. Then those cost benefit trade offs are visible. If there is available capacity, then those security trade offs can be revisited on a cost benefit basis.

A large RAID log is not a bad place, it is an informed place, and being informed (and by implication understanding) of all the risks big and small allows for effective judgement.

.

New and coming to a screen near you soon

Tags

, , , , , ,

Last night saw the final chapter of Logging in Action with Fluentd go back to my editor. The next step is that Chapter (and others I hope) will go to MEAP, so early readers not only get the final chapter, but also the raft of improvements we’ve made. Along with that, the manuscript goes for a full peers review. Once that’s back, its time for a round of edits as I address the feedback then into copy editing and Manning sign off review.

As you might have guessed, we’ve kept busy with an article in the 25th edition of OraWorld. This follows Part 1 talking about GraphQL with a look at considerations for API Security.

In addition to that we’re working on a piece around automation of OCI management activities such as setting up developers, allowing them a level of freedom to experiment without accidentally burning through all your credits by spinning up Exadata servers or 500 node Kubernetes clusters.

We might even have some time to write more about APIs and integration.

Restriction on custom logging for OCI always free

Tags

, , , ,

OCI Always Free compute node has a restriction that isn’t clearly documented or obvious when you go to a instantiate such compute resources. That restriction is the absence of OCI Custom logging. This is a little surprising given that this capability is based on Fluentd and the compute footprint needed by Fluentd is so small. In the following screen shot, as you can see when configuring the compute, there is no reason to believe you can’t use OCI Logging for custom logs.

Configuration for a logging agent on the Always Free VM

But when you go to configure the custom logging on your running compute, you can see that the feature is disabled with the message about the restriction. It would have been nice, to have the warning on the creation phase, as if I’d manually setup the VM then went to switch on OCI Logging knowing where I’d deployed my applications, I’d have wasted time in the setup.

Custom Logging Limitation

Solution, use one of the AMD Flex or Previous Generation to minimize the footprint to your needs.

UPDATE 09th June 2021

We’ve been told that the this constraint has been addressed. In addition Oracle also introduced the new Ampere offering which allows for nodes with a form factor of upto 4 OCPU and 24GB of RAM using the new ARM chips. You can also use variations on this such as 4x 1 OCPU 6GB RAM

Creating screenshots of application shells – easing the writing process

Tags

, , , , ,

If you hadn’t noticed, I have been involved with writing several books as well as various blogs and journal contributions. One of the challenges when it comes to books particularly is when wanting to share a screenshot of a shell/console Window, be that a Linux shell (bash, ZSH, korn etc) Windows cmd or PowerShell.

All the different shells configured, note Git and Ubuntu in the list and integration to support Azure as well.
All the different shells configured, note Git and Ubuntu in the list and integration to support Azure as well.

The issue is that by default these UIs have black or dark backgrounds with white text. For a blog, or online content it isn’t really an issue, other than possibly aesthetic reasons. But when it comes to printing you’re likely to find the book editors asking if the colours can be reversed to avoid quality problems for printing (and cost i.e. less ink).

Until recently, I hadn’t found an elegant way to toggle colour settings back and forth, as I prefer the dark background when working normally (for a start its all the visual cues about what the screen is). Microsoft has been working on a new terminal app called Windows Terminal. I have to admit to being suspicious, as I understand it, PowerShell and it’s UI was meant to do away with the cmd shell. Windows Terminal is meant to supersede the cmd shell and having worked with it, I think it will comfortably tick that box and more. Microsoft have made the beta edition and support tools available via GitHub if you’re so inclined as they’re running the development as an open source project.

Whilst it is now possible to configure the look of the terminal, that’s the beginning as we can configure the drop down option create tabs of the shells with different configurations.

Windows Terminal running two different types of shell - one connected to Oracle cloud and a second local classic cmd shell.
Windows Terminal running two different types of shell – one connected to Oracle cloud and a second local classic cmd shell.
Configuration options for colour etc within the tool
Configuration options for colour etc within the tool

Blog Post on Oracle.com and more

Tags

, , , , , , , , , ,

We recently received an invite to write a guest blog post for Oracle. We’re please to say it has gone live, and can be found at https://blogs.oracle.com/cloud-infrastructure/oracle-cloud-infrastructure-logging-and-alert-rapid-smoke-testing-of-config-and-alerts. A little different to my typical posts. Hope you find it interesting.

Opening of the blog post on blogs.oracle.com
my Author Profile on blogs.oracle.com

World Festival Conference

We’ve also scored another success, this time we’ve been invited to speak at WorldFestival in August, this is an online conference organized by the same team behind DeveloperWeek. This is the first time outside of an Oracle linked event where I’ve been amongst the first few named speakers, so proud of that. The conference looks really interesting as it looks beyond just core developer themes with conference tracks on Space & Transportation, Smart Cities, Robotics, Digital Health to name a few of the 12 streams. Worth checking out.

WorldFestival Conference
World Festival Conference Themes