Securing systems through an air gap is an idea goes back decades, and through the 50s to as recently as the 2000s the idea that you could safely and successfully run and maintain systems by simply not connecting to a network protects them from vulnerabilities may have been true. But in the last ten or more years, I would argue it is a fallacy, that can lull people into a false sense of security and therefore more likely to take more risks (or atleast not be as careful as we should be). This idea is a well established piece of psychology with modern cars (here for example). This is known as Risk Compensation (also more here) or an aspect of behaviour adaptation.
Whilst this is rather theoretical, perhaps we should illustrate in practical terms why it is both a fallacy, and in the modern day simply impractical.
The need for data
In the 50s, 60s, and possibly 70s, the rate of change and amount of data that any software system that needed to be securely cross through the air gapping was comparatively small. Master data (be that map data, configuration files for the software) and even the amount of code needed was small and didnt change at a frequency it does today. As a result it was practical for such data to be shipped into an airgapped system entirely by rekeying and atleast every bit and byte could be eye balled and the frequency of that data movement was slow.
Today the data volumes are simply in a different league, gigabytes, terabytes and petabytes, not bytes and kilobytes. As a result data transfer into or out of an air gapped solution is done by the use of a data device be that a floppy disk or a USB stick. For example maps aren’t simply a coastal outline, the are typically fully featured with topographic detail, even over laid with imagery with incredible levels of precision. We can no longer eyeball the data being moved to ensure nothing lurks within.
This inevitably means something malicious stands a chance of being transferred without being spotted. Of course the counter is to this is to use anti-malware tools, but whilst it reduces the risk it isn’t a guarantee, and I’ll explain why in a moment.
We build solutions from many many parts
Software today is built from tens and hundreds of millions of lines of code. It is reported that Linux repository contains almost 30 million lines of code (here) having grown by a million lines in just a single year. When Java as a programming language was formally released, I bet no one thought they’d be releasing language updates every 6 months, but that’s what happens now. Even with the open source principle that by having the code open to all, many eyes make all bugs shallow, doesn’t mean there won’t inevitably be bugs, and found after code has deployed. Fortunately major open source projects tend to benefit from best practises and sponsored tools to help, as well as plenty of eyes so the bugs are found and addressed. But this does mean that patches are needed, and patches need to be applied quickly before the bug can cause a problem or worse a vulnerability.
Software complexity has reached a state where we build solutions by aggregating other parts which in turn are built through dependencies. Through this accumulation we see frameworks being compiled. This is the essence of Springboot and many other technologies, using tooling to define our framework needs and it pulling together the accumulation of libraries needed.
It would be easy to say, well don’t build software through the use of components, but reality is that we want to build our secure systems in a cost effective manner, that means cloud, that means using modern software development techniques, that we elevate our progress by using building blocks.
Defence follows attack for malware
We tend to assume safety by having malware tools, and the premise is that we protect our air gap by using anti malware on the devices we use to transfer data across the gap. The problem is that when it comes to malware we only get new detection finger prints for each new attack that is discovered, and that detection is not guaranteed to be formulated before at least a few organizations have become victim or recognized something suspicious. It is only after the cause or suspect payload has been analyzed by malware providers and determined the way to finger print the malware. If malware is identified attacking some organizations the chances are it has penetrated others.
It is the fact that defence follows attack that our personal machines are why malware tools not only scan payloads arriving but also as scheduled scans of complete systems.
The problem is that malware can also drive risk compensation, impart because not because not many of us are knowingly impacted by malware before a fingerprint is rolled, as a result we tend to trust our malware to protect us regardless.
The human factor
Content traversing the air gap is by the very nature a human process, and humans are both prone to error, such as failing to follow processes that are in place to protect the gap. Worse, the human processes create an opportunity for someone intending ill intent to exploit process loopholes or errors. Stuxnet was the ultimate proof of this kind of failure (see here). Other examples of human elements involved in breaches – Snowden, Chelsea Manning and many, many others.
Breached trusted sources
Sometimes referred to as a supply chain breach (more here). we put trust into the third parties who provide software even through manual channels, this is an aspect of the composition problem and trying to catch it is problematic. But those who have fallen foul of Solarwinds whilst the victim of a supplier solution being subverted, are also to a degree as a victim of human error. It has been identified that had the users outbound network had been constrained to control outbound flows would not have had he Solarwinds compromise work.
Tempest and other technology biproducts
The Tempest style of attack (more here) has been around many years, which basically works by old style cathode ray screens (for those of us who have been around a few years) giving of radiation. If you’re in a reasonable distance (tens of metres) then it is possible to dial into the radiation and end up seeing what the intended screen is seeing. Anything appear on screen, appears on the listening device. Whilst this problem has been solved – from Faraday cage environments to just using LCD style screens, it is nieve to think that there aren’t other possible similar attacks. Malware turns speakers into high frequency transmitters, even microphones can be subverted to emit data.
The key point I’m driving at, is operating in isolation is unrealistic, too much data, too much code, too many dependencies. Even if we wrote everything from the ground up, had the resources to patch and maintain all that code, then there are still ways to jump the gap. So let’s accepted we’re connected, don’t sucker our selves with Risk Compensation or use language that suggests we aren’t connected like describing mechanisms that mitigate the connectivity as oxymoron’s like virtual air gaps.
What is the solution?
I’m not advocating we expose ourselves to the ‘wild west’ and accept the inevitable consequences, far from it.
When I was learning to drive, the most valuable piece of advice I was given was assume everyone else on the road is a complete idiot, and that will mean the chances of an accident will far lower. Both accident’s I’ve been involved with came about because I put too much trust in the other road user to obey the Highway Code. Fortunately neither incident was serious.
But if we take this approach of assuming everyone else is an idiot when developing, then we position ourselves well. I look at all the conditions in which I might find my code, and what other code around me could do, and do wrongly then I can take the appropriate defences assuming that these things will happen. If we all work that way, then the chances are that if I make a mistake then someone else in their part of the solution is likely to have a mitigation. In practical terms the UI should validate data, but my mid tier and backend should validate the data.
There is the question of cost, so there does need to be a cost-benefit decision, the cost of development vs the cost of a security breach.
There are a broad range of tools to help us develop security into code and configuration from SonarQube to Anchore and WhiteSource. But these tools are only of value to us when understand what they bring, how to use them effectively but most crucially appreciate the limits. Blindly trusting to the tools will only take us back to the original problem of risk compensation.
A final thought
Whilst I have pointed to trying to develop to prevent all issues, but ultimately we come to a cost/benefit trade off. Those trade offs need to be understood. The principle of a RAID log in software projects has been around for a long time. But culturally there is a real challenge here. A large number of risks is seen as a very bad thing, particularly when there is no mitigation. As a result only the significant risks tend to be logged. The truth is risks themselves aren’t the issue, the challenge is whether we understand the consequences of the risk and the probability of that risk occurring. The current status quo means that the sum of lots of small risks are never seen. We should encourage people to identify in the development the risks, just like we can take code with TODO annotations that can be pulled together. Then those cost benefit trade offs are visible. If there is available capacity, then those security trade offs can be revisited on a cost benefit basis.
A large RAID log is not a bad place, it is an informed place, and being informed (and by implication understanding) of all the risks big and small allows for effective judgement.
Last night saw the final chapter of Logging in Action with Fluentd go back to my editor. The next step is that Chapter (and others I hope) will go to MEAP, so early readers not only get the final chapter, but also the raft of improvements we’ve made. Along with that, the manuscript goes for a full peers review. Once that’s back, its time for a round of edits as I address the feedback then into copy editing and Manning sign off review.
As you might have guessed, we’ve kept busy with an article in the 25th edition of OraWorld. This follows Part 1 talking about GraphQL with a look at considerations for API Security.
In addition to that we’re working on a piece around automation of OCI management activities such as setting up developers, allowing them a level of freedom to experiment without accidentally burning through all your credits by spinning up Exadata servers or 500 node Kubernetes clusters.
We might even have some time to write more about APIs and integration.
OCI Always Free compute node has a restriction that isn’t clearly documented or obvious when you go to a instantiate such compute resources. That restriction is the absence of OCI Custom logging. This is a little surprising given that this capability is based on Fluentd and the compute footprint needed by Fluentd is so small. In the following screen shot, as you can see when configuring the compute, there is no reason to believe you can’t use OCI Logging for custom logs.
But when you go to configure the custom logging on your running compute, you can see that the feature is disabled with the message about the restriction. It would have been nice, to have the warning on the creation phase, as if I’d manually setup the VM then went to switch on OCI Logging knowing where I’d deployed my applications, I’d have wasted time in the setup.
Solution, use one of the AMD Flex or Previous Generation to minimize the footprint to your needs.
UPDATE 09th June 2021
We’ve been told that the this constraint has been addressed. In addition Oracle also introduced the new Ampere offering which allows for nodes with a form factor of upto 4 OCPU and 24GB of RAM using the new ARM chips. You can also use variations on this such as 4x 1 OCPU 6GB RAM
If you hadn’t noticed, I have been involved with writing several books as well as various blogs and journal contributions. One of the challenges when it comes to books particularly is when wanting to share a screenshot of a shell/console Window, be that a Linux shell (bash, ZSH, korn etc) Windows cmd or PowerShell.
The issue is that by default these UIs have black or dark backgrounds with white text. For a blog, or online content it isn’t really an issue, other than possibly aesthetic reasons. But when it comes to printing you’re likely to find the book editors asking if the colours can be reversed to avoid quality problems for printing (and cost i.e. less ink).
Until recently, I hadn’t found an elegant way to toggle colour settings back and forth, as I prefer the dark background when working normally (for a start its all the visual cues about what the screen is). Microsoft has been working on a new terminal app called Windows Terminal. I have to admit to being suspicious, as I understand it, PowerShell and it’s UI was meant to do away with the cmd shell. Windows Terminal is meant to supersede the cmd shell and having worked with it, I think it will comfortably tick that box and more. Microsoft have made the beta edition and support tools available via GitHub if you’re so inclined as they’re running the development as an open source project.
Whilst it is now possible to configure the look of the terminal, that’s the beginning as we can configure the drop down option create tabs of the shells with different configurations.
We recently received an invite to write a guest blog post for Oracle. We’re please to say it has gone live, and can be found at https://blogs.oracle.com/cloud-infrastructure/oracle-cloud-infrastructure-logging-and-alert-rapid-smoke-testing-of-config-and-alerts. A little different to my typical posts. Hope you find it interesting.
World Festival Conference
We’ve also scored another success, this time we’ve been invited to speak at WorldFestival in August, this is an online conference organized by the same team behind DeveloperWeek. This is the first time outside of an Oracle linked event where I’ve been amongst the first few named speakers, so proud of that. The conference looks really interesting as it looks beyond just core developer themes with conference tracks on Space & Transportation, Smart Cities, Robotics, Digital Health to name a few of the 12 streams. Worth checking out.
The latest edition of OraWorld has become available to today. With its blend of insight into the Oracle community, and Oracle technologies from database to modern apps. I have to own up and say, I mention the magazine not only because of the beautifully crafted independent insights, but also it includes an article from myself. Taking a look at GraphQL what it is and how recent new Oracle product features could make a big difference to the GraphQL adoption opportunities.
The next edition should include a follow up article to this focussing on API security considerations.
The book has had a title change as Manning found that links the book was clashing with other solutions using the term ‘Unified Logging’. With the name change it helps bring the book inline with the Manning naming with their In action series. This means the book website is now https://www.manning.com/books/logging-in-action.
With the name change we’ve agreed that there should an additional chapter added. As I’d written the book with a view that everything we cover applies to both modern solutions such as Microservices coming from the CNCF camp but equally relevant to more traditional IT landscapes. Within the book we have explianed how things are positioned and can be used in Kubernetes, but it was agreed with our editorial team that not tackling the configuration of Fluentd with Kubernetes and Docker was to an extent ignoring a key community that will be using Fluentd. So the new chapter will be introduced to address this aspect.
In terms of progress we’re into the 1’s – 1 Chapter to start (the new one), 1 Chapter back from the Technical Editor (Logging Best Practises) – some edits to be done, 1 Chapter now with the editor (How To Create Custom Plugins), 1 Chapter being finished (Logging Frameworks) and finally 1 peer review cycle to go.
Given the lovely review comments that have been quoted on the book’s page. I can only recommend if you have an interest in logging and monitoring then check it out through Manning Early Access Programme (MEAP).
I was fortunate enough to record a podcast with the team at Adventures In Dev Ops just before Christmas. The recording has been fine tuned and now available on their web site here. From my perspective, the discussion was really interesting and explored a wide range of areas around the challenges of monitoring.
As the podcast is linked to the book we’re writing for Manning (Unified Logging With Fluentd), there is a discount code currently running – poddevopsadv20.
Thanks to Charles Wood and Jeffrey Groman for having me on as a guest.
Other news …
I will be presenting at the online conference Blueprint LDN, check out the subjects being covered, looks very interesting.