Securing systems through an air gap is an idea goes back decades, and through the 50s to as recently as the 2000s the idea that you could safely and successfully run and maintain systems by simply not connecting to a network protects them from vulnerabilities may have been true. But in the last ten or more years, I would argue it is a fallacy, that can lull people into a false sense of security and therefore more likely to take more risks (or at least not be as careful as we should be). This idea is a well established piece of psychology with modern cars (here for example). This is known as Risk Compensation (also more here) or an aspect of behaviour adaptation.
Whilst this is rather theoretical, perhaps we should illustrate in practical terms why it is both a fallacy, and in the modern day simply impractical.Continue reading
A couple of years ago I got to discuss some of the design ideas behind API Platform Cloud Service. One of the points we discussed was how API Platform CS kept the configuration of APIs entirely within the platform, which meant some version management tasks couldn’t be applied like any other code. Whilst we’ve solved that problem (and you can see the various tools for this here API Platform CS tools). The argument made that your API policies are pretty important, if they get into the public domain then people can better understand to go about attacking your APIs and possibly infer more.
Move on a couple of years, Oracle’s 2nd generation cloud is established an maturing rapidly (OCI) and the organisational changes within Oracle mean PaaS was aligned to SaaS (Oracle Integration Cloud, Visual Builder CS as examples) or more cloud native IaaS. The gateway which had a strong foot in both camps eventually became aligned to IaaS (note that this doesn’t mean that the latest evolution of the API platform (Oracle Infrastructure API) will lose its cloud agnostic capabilities, as this is one of unique values of the solution, but over time the underpinnings can be expected to evolve).
Any service that has elements of infrastructure associated with it has been mandated to use Terraform as the foundation for definition and configuration. The Terraform mandate is good, we have some consistency across products with something that is becoming a defacto standard. However, by adopting the Terraform approach does mean all of our API configurations are held outside the product, raising the security risk of policy configuration is not hidden away, but conversely configuration management is a lot easier.
This has had me wondering for a long time, with the use of Terraform how do we mitigate the risks that API CS’s approach was trying to secure? But ultimately the fundamental question of security vs standardisation.
Any security expert will tell you the best security is layered, so if one layer is found to be vulnerable, then as long as the next layer is different then you’re not immediately compromised.
What this tells us is, we should look for ways to mitigate or create additional layers of security to protect the security of the API configuration. These principles probably need to extend to all Terraform files, after all it not only identifies security of not just OCI API, but also WAF, networks that are public and how they connect to private subnets (this isn’t an issue unique to Oracle, its equally true for AWS and Azure). Some mitigation actions worth considering:
- Consider using a repository that can’t be accidentally exposed to the net – configuration errors is the OWASP Top 10. So let’s avoid the mistake if possible. If this isn’t an option, then consider how to mitigate, for example …
- Strong restrictions on who can set or change visibility/access to the repo
- Configure a simple regular check that looks to see if your repos have been accidentally made publicly visible. The more frequent the the check the smaller the potential exposure window
- Make sure the Terraform configurations doesn’t contain any hard coded credentials, there are tools that can help spot this kind of error, so use them. Tools exist to allow for the scanning of such errors.
- Think about access control to the repository. It is well known that a lot of security breaches start within an organisation.
- Terraform supports the ability to segment up and inject configuration elements, using this will allow you to reuse configuration pieces, but could also be used to minimize the impact of a breach.
- Of course he odds are you’re going to integrate the Terraform into a CI/CD pipeline at some stage, so make sure credentials into the Terraform repo are also secure, otherwise you’ve undone your previous security steps.
- Minimize breach windows through credentials tokens and certificate hanging. If you use Let’s Encrypt (automated certificate issuing solution supported by the Linux Foundation). Then 90 day certificates isn’t new.
This may sound a touch paranoid, but as the joke goes….
Just because I’m paranoid, it doesn’t mean they’re not out to get me
Fundamental Security vs Standardisation?
As it goes the standardisation is actually a dimension of security. (This article illustrates the point and you can find many more). The premise is, what can be ensured as the most secure environment, one that is consistent using standards (defacto or formal) or one that is non standard and hard to understand?
To varying degrees, most techies are aware of the security vulnerabilities identified in the OWASP Top 10 (SQL Injection, trying to homebrew Identity management etc), although I still sometimes have conversations where I feel the need to get the yellow or red card out. But the bottom line is that these risks are perhaps more appreciated because it is easier to understand external entities attacking seeking direct attacks to disrupt or access information. But there are often subtler and at least more costly to repair attacks such as internal attacks and indirect attacks such as compromising software deployment mechanisms.
This, later attack Is not a new risk, as you can see from the following links, been recognised by the security community for some time (you can find academic papers going back 10+ years looking at the security risks for Yum and RPM for example).
- Survivable Key Compromise in Software Update Systems
- Consequences of Insecure Software Updates
- Attacks on Package Manager
- The Problem of Package Manager Trust
But software is becoming ever more pervasive, we’re more aware than ever that maintaining software to the latest releases means that known vulnerabilities are closed. As a result, we have seen a proliferation in mechanisms to recognise the need to update and deploying updates. 10 years ago, updating frameworks where typically small in number and linked to vendors who could/had to invest in making the mechanisms as a secure as possible – think Microsoft, Red Hat. However we have seen this proliferate, any browser worthy of attention has automated updating let alone the wider software tools. As development has become more polyglot every language has its central repos of framework libraries (maven central, npm, chocolatey ….). Add to this the growth in multi-cloud and emphasis on micro deployments to support microservices and the deployment landscape gets larger and ever more complex and therefore vulnerable.
What to do?
At the time of writing the Oracle API Platform doesn’t support the use of Socket connections for handling API data flows. Whilst the API Platform does provide an SDK as we’ve described in other blogs and our book it doesn’t allow the extension of how connectivity is managed.
The use of API Gateways and socket-based connectivity is something that can engender a fair bit of debate – on the one hand, when a client is handling a large volume of data, or expects data updates, but doesn’t want to poll or utilize webhooks then a socket strategy will make sense. Think of an app wanting to listen to a Kafka topic. Conversely, API gateways are meant to be relatively lightweight components and not intended to deal with a single call to result in massive latency as the back-end produces or waits to forward on data as this is very resource-intensive and inefficient. However, a socket-based data transmission should be subject to the same kinds of security controls, and home brewing security solutions from scratch are generally not the best idea as you become responsible for the continual re-verification of the code being secure and handling dependency patching and mitigating vulnerabilities in other areas.
So how can we solve this?
As a general rule of thumb, web sockets are our least preferred way of driving connectivity, aside from the resource demand, it is a fairly fragile approach as connections are subject to the vagaries of network connections, which can drop etc. It can be difficult to manage state (i.e. knowing what data has or hasn’t reached the socket consumer). But sometimes, it just is the right answer. Therefore we have developed the following pattern as the following diagram illustrates.
How it works …
The client initiates things by contacting the gateway to request a socket, with the details of the data wanted to flow through the socket. This can then be validated as both a legitimate request or (API Tokens, OAuth etc) and that the requester can have the data wanted via analyzing the request metadata.
The gateway works in conjunction with a service component and will if approved acquire a URI from the socket manager component. This component will provide a URL for the client to use for the socket request. The URL is a randomly generated string. This means that port scans of the exposed web service are going to be difficult. These URLs are handled in a cache, which ideally has a TTL (Time To Live). By using Something like Redis with its native TTL capabilities means that we can expire the URL if not used.
With the provided URL we could further harden the security by associating with it a second token.
Having received the response by the client, it can then establish the socket-based connection which gets routed around the API Gateway to the Socket component. This then takes the randomly-generated part of the URL and looks up the value in the cache, if it exists in the cache and the secondary token matches then the request for the socket is legitimate. With the socket connection having been accepted the logic that will feed the socket can commence execution.
If the request is some form of malicious intent such as a scan, probe or brute force attempt to call the URL then the attempt should fail because …
- If the socket URL has never existed in or has been expired from the Cache and the request is rejected.
- If a genuine URL is obtained, then the secondary key must correctly verify. If incorrect again the request is rejected.
- Ironically, any malicious attack seeking to overload components is most likely to affect the cache and if this fails, then a brute access tempt gets harder as the persistence of all keys will be lost i.e. nothing to try brute force locate.
You could of course craft in more security checks such as IP whitelisting etc, but every-time this is done the socket service gets ever more complex, and we take on more of the capabilities expected from the API Gateway and aside from deploying a cache, we’ve not built much more than a simple service that creates some random strings and caches them, combined with a cache query and a comparison. All the hard security work is delegated to the gateway during the handshake request.
I’ve started to subscribe to the APISecurity.io newsletter. The newsletter includes the analysis of recent API based security breaches along with other useful API related news. Some of the details of the breaches make for interesting reading and provide some good examples of what not to do. It is rather surprising how regularly the lack of the application of good practises is, including:
- Checking the payload is valid to the definition,
- Checking the payload size to ensure it is in the expected bounds,
- Use strong typing on the content received it will help validate the content and limit the chances of poisonous content like injected SQL,
- Ensuring the API has mitigation’s against the classic OWASP Top 10 – SQL Injection, poor authentication implementation.
More broadly, we see that people will recognise the need for applying penetration testing, and look to external organisations to perform the testing, when such work is commissioned the understanding of what the pen tester does is not understood by those commissioning the tests (SANS paper of security scoping), therefore know whether all the risks are checked. When you add to that, the temptation to keep such costs down resulting in the service provider not necessarily probing your APIs to the fullest extent. Not all penetration test services are equal, so simply working to a budget isn’t wise, yes there is a need for pragmatism, but only when you understand the cost/risk trade-off.
But also remember application logic and API definitions and the security controls in place change over time as do the discovery of new vulnerabilities on the stack you’re using, along with evolving compliance requirements. All meaning that a penetration test at the initial go-live is not enough and should be an inherent part of an APIs lifecycle.
When it comes to payload checks etc, products like Oracle’s API Platform make it easy to realise or provide out of the box checks for factors such as size limits, implementing payload checks, so better to use them.
If you ever need to be reminded that of why best practises are needed and should be implemented; a mindset of when not if a breach will happen will ensure you’re prepared and the teams are motivated to put the good practises in.
You don’t need to be a geek or a security expert to understand what is being said here, and more importantly reading between the lines as they say, the likely root causes. For me, this all points to cultural challenges, where organisational pressures or a lack of appreciation by mid level decision makers struggle to appreciate the need to invest in non functional factors such as security, patching and maintenance.
Sadly, Experian aren’t the first with this challenge, and won’t be the last. With DevSecOps etc the people building the software will understand the issue. But, I think we need to be working with educating the business stakeholders on the need for dealing with NFRs, and the need to prioritise certain types of issues.
Oracle Cloud is growing and maturing at a tremendous rate if the breadth of PaaS capabilities is any indication. However, there are a few gotchas out there, that can cause some headaches if they get you. These typically relate to processes that impact across different functional areas. A common middleware stack (API CS, SOA CS, OIC etc) will look something like the following:
As the diagram shows when you build the cloud services, the layers get configured with credentials to the lower layers needed (although Oracle have in the pipeline the Oracle managed version of many services where this is probably going to be hidden from us). Continue reading
I was reading a blog post from the Cloud Security Alliance (here) about the on-going mess and disinformation around Equifax’s security breach.
The article makes a very good point. Sadly Security is seen as just a cost, and whilst people have that mindset we will see decisions being made that favours ‘high share value now’ over long time assurance of sensitive data which means that ‘now value doesnt nose dive’.
Even with today’s legislation in many countries it is a legal obligation to disclose the details of a security breach. The only problem here, is ignorance is bliss, if I don’t know I’m being compromised then nothing to report. The blog post also points out that often the only time security investment is recognised is, and often that information doesn’t propergate within an organisation. This got me to thinking why can’t companies also disclose how many attempts on their security have been mitigated on in the same way companies have to declare profit and loss.
It could produce some interesting information, as you could compare data from different companies of similar profile. When plotting the data, any outliers suggest something maybe wrong. But it would give consumers a means to decide do they trust their data with X over Y when they get a chance to influence the decision. But we’re now moving into the territory where security is becoming a positive measure. If nothing else it may engender an ‘arms war’ of who has the best protection.
As with all things, they way you measure something influences behaviour. This sort of measurement may encourage companies to invest in more ‘white hat’ attacks. That’s no bad thing as if a white hat attack suceeds – the vulnerability has been found.
The interesting thing is that, the article points out that Equifax and other large companies that have been breached have been certified as ISO 9001 compliant, PCI DSS compliant and so on. The issue here is, that these accreditations have a strong emphasis on process and policy, and are down to the auditor spotting non-compliance. In a large organisation the opportunity to steer the auditor towards what is good exists. But more importantly, process requires people to know and follow it. Following process and being prepared to uphold the processes requires an organizational culture that genders its adherence. I can have a rulebook as big as the Encyclopedia Britannica but if my boss, and his boss apply constant pressure to say we have to deliver and there is no repercutions to bending the rules – well then I’m going to start bending.
Leaders like Gray understand the value of an organization’s culture. This can be defined as the set of deeply embedded, self-reinforcing behaviors, beliefs, and mind-sets that determine “how we do things around here.” People within an organizational culture share a tacit understanding of the way the world works, their place in it, the informal and formal dimensions of their workplace, and the value of their actions. Though it seems intangible, the culture has a substantial influence on everyday actions and on performance.
This brings us back to the idea – hard data on the execution (not that i have a process for execution) will give strong indications of compliance. This kind of data is difficult to fudge and with a good sample set, then fudges are more likely to stand out.
Practical? I don’t know, but worth exploring? If we are to change security thinking then yes.
So an interesting piece of research was published by the Cloud Security Alliance. The research shows the growth of document sharing in the enterprise through the use of cloud services. The interesting thing is one of the positives of adopting SaaS and PaaS is easing the challenge of ensuring environments are patched for security. But at the same time the need to educate the wider employee community even more on being security aware.
It also raises the question of managing the accidental or deliberate leakage in such an environment. As the article says, some sharing of documents to the public or 3rd parties to enable cross business collaboration may well be legitimate so businesses are going to need strategies to address this.