Securing systems through an air gap is an idea goes back decades, and through the 50s to as recently as the 2000s the idea that you could safely and successfully run and maintain systems by simply not connecting to a network protects them from vulnerabilities may have been true. But in the last ten or more years, I would argue it is a fallacy, that can lull people into a false sense of security and therefore more likely to take more risks (or at least not be as careful as we should be). This idea is a well established piece of psychology with modern cars (here for example). This is known as Risk Compensation (also more here) or an aspect of behaviour adaptation.
Whilst this is rather theoretical, perhaps we should illustrate in practical terms why it is both a fallacy, and in the modern day simply impractical.
The need for data
In the 50s, 60s, and possibly 70s, the rate of change and amount of data that any software system that needed to be securely cross through the air gapping was comparatively small. Master data (be that map data, configuration files for the software) and even the amount of code needed was small and didnt change at a frequency it does today. As a result it was practical for such data to be shipped into an airgapped system entirely by rekeying and atleast every bit and byte could be eye balled and the frequency of that data movement was slow.
Today the data volumes are simply in a different league, gigabytes, terabytes and petabytes, not bytes and kilobytes. As a result data transfer into or out of an air gapped solution is done by the use of a data device be that a floppy disk or a USB stick. For example maps aren’t simply a coastal outline, the are typically fully featured with topographic detail, even over laid with imagery with incredible levels of precision. We can no longer eyeball the data being moved to ensure nothing lurks within.
This inevitably means something malicious stands a chance of being transferred without being spotted. Of course the counter is to this is to use anti-malware tools, but whilst it reduces the risk it isn’t a guarantee, and I’ll explain why in a moment.
We build solutions from many many parts
Software today is built from tens and hundreds of millions of lines of code. It is reported that Linux repository contains almost 30 million lines of code (here) having grown by a million lines in just a single year. When Java as a programming language was formally released, I bet no one thought they’d be releasing language updates every 6 months, but that’s what happens now. Even with the open source principle that by having the code open to all, many eyes make all bugs shallow, doesn’t mean there won’t inevitably be bugs, and found after code has deployed. Fortunately major open source projects tend to benefit from best practises and sponsored tools to help, as well as plenty of eyes so the bugs are found and addressed. But this does mean that patches are needed, and patches need to be applied quickly before the bug can cause a problem or worse a vulnerability.
Software complexity has reached a state where we build solutions by aggregating other parts which in turn are built through dependencies. Through this accumulation we see frameworks being compiled. This is the essence of Springboot and many other technologies, using tooling to define our framework needs and it pulling together the accumulation of libraries needed.
It would be easy to say, well don’t build software through the use of components, but reality is that we want to build our secure systems in a cost effective manner, that means cloud, that means using modern software development techniques, that we elevate our progress by using building blocks.
Defence follows attack for malware
We tend to assume safety by having malware tools, and the premise is that we protect our air gap by using anti malware on the devices we use to transfer data across the gap. The problem is that when it comes to malware we only get new detection finger prints for each new attack that is discovered, and that detection is not guaranteed to be formulated before at least a few organizations have become victim or recognized something suspicious. It is only after the cause or suspect payload has been analyzed by malware providers and determined the way to finger print the malware. If malware is identified attacking some organizations the chances are it has penetrated others.
It is the fact that defence follows attack that our personal machines are why malware tools not only scan payloads arriving but also as scheduled scans of complete systems.
The problem is that malware can also drive risk compensation, impart because not because not many of us are knowingly impacted by malware before a fingerprint is rolled, as a result we tend to trust our malware to protect us regardless.
The human factor
Content traversing the air gap is by the very nature a human process, and humans are both prone to error, such as failing to follow processes that are in place to protect the gap. Worse, the human processes create an opportunity for someone intending ill intent to exploit process loopholes or errors. Stuxnet was the ultimate proof of this kind of failure (see here). Other examples of human elements involved in breaches – Snowden, Chelsea Manning and many, many others.
Breached trusted sources
Sometimes referred to as a supply chain breach (more here). we put trust into the third parties who provide software even through manual channels, this is an aspect of the composition problem and trying to catch it is problematic. But those who have fallen foul of Solarwinds whilst the victim of a supplier solution being subverted, are also to a degree as a victim of human error. It has been identified that had the users outbound network had been constrained to control outbound flows would not have had he Solarwinds compromise work.
Tempest and other technology biproducts
The Tempest style of attack (more here) has been around many years, which basically works by old style cathode ray screens (for those of us who have been around a few years) giving of radiation. If you’re in a reasonable distance (tens of metres) then it is possible to dial into the radiation and end up seeing what the intended screen is seeing. Anything appear on screen, appears on the listening device. Whilst this problem has been solved – from Faraday cage environments to just using LCD style screens, it is nieve to think that there aren’t other possible similar attacks. Malware turns speakers into high frequency transmitters, even microphones can be subverted to emit data.
The key point I’m driving at, is operating in isolation is unrealistic, too much data, too much code, too many dependencies. Even if we wrote everything from the ground up, had the resources to patch and maintain all that code, then there are still ways to jump the gap. So let’s accepted we’re connected, don’t sucker our selves with Risk Compensation or use language that suggests we aren’t connected like describing mechanisms that mitigate the connectivity as oxymoron’s like virtual air gaps.
What is the solution?
I’m not advocating we expose ourselves to the ‘wild west’ and accept the inevitable consequences, far from it.
When I was learning to drive, the most valuable piece of advice I was given was assume everyone else on the road is a complete idiot, and that will mean the chances of an accident will far lower. Both accident’s I’ve been involved with came about because I put too much trust in the other road user to obey the Highway Code. Fortunately neither incident was serious.
But if we take this approach of assuming everyone else is an idiot when developing, then we position ourselves well. I look at all the conditions in which I might find my code, and what other code around me could do, and do wrongly then I can take the appropriate defences assuming that these things will happen. If we all work that way, then the chances are that if I make a mistake then someone else in their part of the solution is likely to have a mitigation. In practical terms the UI should validate data, but my mid tier and backend should validate the data.
There is the question of cost, so there does need to be a cost-benefit decision, the cost of development vs the cost of a security breach.
There are a broad range of tools to help us develop security into code and configuration from SonarQube to Anchore and WhiteSource. But these tools are only of value to us when understand what they bring, how to use them effectively but most crucially appreciate the limits. Blindly trusting to the tools will only take us back to the original problem of risk compensation.
A final thought
Whilst I have pointed to trying to develop to prevent all issues, but ultimately we come to a cost/benefit trade off. Those trade offs need to be understood. The principle of a RAID log in software projects has been around for a long time. But culturally there is a real challenge here. A large number of risks is seen as a very bad thing, particularly when there is no mitigation. As a result only the significant risks tend to be logged. The truth is risks themselves aren’t the issue, the challenge is whether we understand the consequences of the risk and the probability of that risk occurring. The current status quo means that the sum of lots of small risks are never seen. We should encourage people to identify in the development the risks, just like we can take code with TODO annotations that can be pulled together. Then those cost benefit trade offs are visible. If there is available capacity, then those security trade offs can be revisited on a cost benefit basis.
A large RAID log is not a bad place, it is an informed place, and being informed (and by implication understanding) of all the risks big and small allows for effective judgement.