We’ve added a new mindmap to our catalogue here. This covers the core of GraphQL. The catalogue contains both the image and a Word representation. The map is built based on a reading of Learning GraphQL by Eve Porcello & Alex Banks on O’Reilly.
When it comes to deployment of API Gateways, there are a couple of well-known patterns, that of the Internal Gateway and External Gateway (described in several resources including here).
These two deployments essentially reflect the considerations of offering endpoints up to less secure network segments such as the internet (external gateways) and trusted network zones (internal gateways). But in addition to the physical deployment within a network, these deployments are likely to host APIs with different characteristics, reflecting levels of trust, and emphasis on enterprise decoupling/abstraction (internal) – the reason why APIs are sometimes associated with the idea of SOA 2.0. Compared with security sensitivity, and potentially monetization or at least usage metrics to help protect specific attack vectors.
These deployment patterns can be seen in the following diagram.
Both the internal and external gateways are reflective of interest in the origin of the API traffic. However a rarer 3rd pattern does exist.
This pattern of crops up when you need to consider the ability to manage how internal solutions connect to outside services, for reasons such as:
When it comes to development, we have had coding standards for almost as long as we have been coding. We tend to look at coding standards for purposes of helping to promote good quality code and reduce the likelihood of bugs and so on. But they also help with readability, making it easy to navigate a code base and so on. This is sufficiently important that there is a vast choice of tools to help us ensure we align with good practices.
That readability etc, when it comes to code interfaces lends to making it easier to use an interface as it promotes consistency and as Don Norman would say avoids ‘cognitive load‘, in other words, the effort involved in performing actions with the interface. Any Java Developer will tell you, want to print out an object (any object) you get a string representation using the .toString() method and direct it using the io packages.
That consistency and predictability are important not just for code if you look at any API best practises documents you’ll encounter directly or indirectly the need to use conventions that drive consistency – use of singular or plural for the name of entities, application of case – camel case, snake case etc. Good naming etc and we’ll see related things appear together in the documentation. Products such as Apiary and SwaggerHub include tooling to help police this in our API design work.
But what about policies that we use to define how an API Gateway handles the receipt and routing of API invocations? Well yes, we should have standards here as well. Some might say, governance gone mad. But gateways are often shared services, so making it easy to see and logically group APIs together at very least by using a good naming convention will help as a minimum. If API management is being administered in a more DevOps fashion, then information security professionals will probably want assurance that developers are applying policies in a recommended manner.
The API Platform when you configure IDCS to provide the option to authenticate users against a corporate Identity Provider such as Active Directory will automatically update the Management Portal Login screen accordingly. However today it doesn’t automatically update the Developer Portal login page. Whilst perhaps an oversight, it is very easy to fix manually when you know how. As result you can have a login that looks like:
The rest of this blog will show what’s needed to fix the problem.
There are circumstances in which notifications from the Oracle API Platform CS could be seen as desirable. For example, if you wish to ensure that the developers are defining good APIs and not accidentally implementing APIs that hit the OWASP Top 10 for APIs. Then you will probably configure things such that developer users can design the APIs, configure the policies, but only request an API to be deployed.
However, presently notifications through mechanisms such as email or via collaboration platforms such as Slack aren’t available. But implementing a solution isn’t difficult. For the rest of this blog we’ll explore how this might be implemented, complete with a Slack implementation.
At the time of writing the Oracle API Platform doesn’t support the use of Socket connections for handling API data flows. Whilst the API Platform does provide an SDK as we’ve described in other blogs and our book it doesn’t allow the extension of how connectivity is managed.
The use of API Gateways and socket-based connectivity is something that can engender a fair bit of debate – on the one hand, when a client is handling a large volume of data, or expects data updates, but doesn’t want to poll or utilize webhooks then a socket strategy will make sense. Think of an app wanting to listen to a Kafka topic. Conversely, API gateways are meant to be relatively lightweight components and not intended to deal with a single call to result in massive latency as the back-end produces or waits to forward on data as this is very resource-intensive and inefficient. However, a socket-based data transmission should be subject to the same kinds of security controls, and home brewing security solutions from scratch are generally not the best idea as you become responsible for the continual re-verification of the code being secure and handling dependency patching and mitigating vulnerabilities in other areas.
So how can we solve this?
As a general rule of thumb, web sockets are our least preferred way of driving connectivity, aside from the resource demand, it is a fairly fragile approach as connections are subject to the vagaries of network connections, which can drop etc. It can be difficult to manage state (i.e. knowing what data has or hasn’t reached the socket consumer). But sometimes, it just is the right answer. Therefore we have developed the following pattern as the following diagram illustrates.
How it works …
The client initiates things by contacting the gateway to request a socket, with the details of the data wanted to flow through the socket. This can then be validated as both a legitimate request or (API Tokens, OAuth etc) and that the requester can have the data wanted via analyzing the request metadata.
The gateway works in conjunction with a service component and will if approved acquire a URI from the socket manager component. This component will provide a URL for the client to use for the socket request. The URL is a randomly generated string. This means that port scans of the exposed web service are going to be difficult. These URLs are handled in a cache, which ideally has a TTL (Time To Live). By using Something like Redis with its native TTL capabilities means that we can expire the URL if not used.
With the provided URL we could further harden the security by associating with it a second token.
Having received the response by the client, it can then establish the socket-based connection which gets routed around the API Gateway to the Socket component. This then takes the randomly-generated part of the URL and looks up the value in the cache, if it exists in the cache and the secondary token matches then the request for the socket is legitimate. With the socket connection having been accepted the logic that will feed the socket can commence execution.
If the request is some form of malicious intent such as a scan, probe or brute force attempt to call the URL then the attempt should fail because …
- If the socket URL has never existed in or has been expired from the Cache and the request is rejected.
- If a genuine URL is obtained, then the secondary key must correctly verify. If incorrect again the request is rejected.
- Ironically, any malicious attack seeking to overload components is most likely to affect the cache and if this fails, then a brute access tempt gets harder as the persistence of all keys will be lost i.e. nothing to try brute force locate.
You could of course craft in more security checks such as IP whitelisting etc, but every-time this is done the socket service gets ever more complex, and we take on more of the capabilities expected from the API Gateway and aside from deploying a cache, we’ve not built much more than a simple service that creates some random strings and caches them, combined with a cache query and a comparison. All the hard security work is delegated to the gateway during the handshake request.
The Oracle API Platform adopted an intelligent pricing model by basing costs on API call volumes and Logical gateway node groupings per hour. In our book about the API Platform (more here). We suggested that a good logical grouping would be to reflect the development, test, pre-production and production model. This makes it nice and easy to use gateway based routing to different environments without needing to change the API policy configuration as you promote your solution through environments.
We have also leveraged naming and Role/Group Based Access Controls to make it easy to operate the API Platform as a shared service, rather than each team having its own complete instance. In doing so the number of logical gateways needed is limited (I.e. not logical gateway divisions on per team models needed). Group management is very easy through the leveraging of Oracle’s Identity Cloud Service – which is free for managing users on the Oracle solutions, and also happens to a respected product in its own right.
Most organisations are not conducting development and testing 365 days a year, for 24 hours a day (yes in an ideal world prolonged soak and load tests would be run to help tease out cumulative issues such as memory leaks, but even then it isn’t perpetual). As a result, it would be ideal to not be using logical gateways for part of the day such as outside the typical development day, and weekends.
Whilst out of the out of hours traffic may drop to zero calls and we may even shutdown the gateway nodes, this alone doesn’t effectively reduce the number of logical gateways as the logical gateway aspect of the platform counts as soon as you create the logical group in the management portal. This in itself isn’t a problem as the API Platform drinks it’s own Champagne as the saying goes, and everything in the UI is actually available as a published REST endpoint. Something covered in the book, and in previous blog posts (for example Making Scripts Work with IDCS Deployed PaaS and Analytics and Stats for APIs). Rather than providing all the code, you can see pretty much all the calls necessary in the other utilities published.
Before defining the steps, there are a couple of things to consider. Firstly, the version of the API deployed to a specific logical gateway may not necessarily be the latest version (iteration) and when to delete the logical gateway this information is lost, so before deleting the logical gateway we should record this information to allow us to reinstate the logical gateway later.
As deleting logical gateways will remove the gateway from the system, when recreating the gateway we can use the same name, but the gateway is not guaranteed to get the same Id as before, as a result, we should when rebuilding always discover the Id from the name to be safe.
A logical gateway can not be deleted until all the physical nodes are reallocated, so we need to iterate through the nodes removing them. When it comes to reconnecting the nodes, this is a little more tricky as reconnecting the gateway appears to only be achievable with information known to the gateway node. Therefore the simplest thing is when bringing the node back online we take the information from the gateway-props.json file and run a script that determines whether the management tier knows about the node. If not then just re-run the create, start, join cycle., otherwise just run the start command.
As with the logical gateway, re-running the create, deploy, start cycle will result in the node having a new Id. This does mean that whilst the logical gateway name and even the node names will remain the same, the analytics data is likely to become unavailable, so you may wish to extract the analytics data. But then, for development and test, this data is unlikely to provide much long term value.
So based on this, our sequence for releasing the logical gateway needs to be…
- Capture the deployed APIs and the iteration numbers,
- Ideally shutdown the gateway node process itself,
- Delete all the gateway nodes from the logical gateway,
- Delete the logical gateway,
Recover would then be …
- Construct the logical gateway,
- Redeploy the APIs with the correct iteration numbers to the logical gateway using the recorded information- if no nodes are connected at this stage, the UI will provide a warning
- As gateway nodes, come back online, determine if it is necessary to execute the create, start, join or just start
Of course, these processes can be all linked to scheduling such as a cron job and/or server startup and shutdown processes.
It’s been a quiet month for this blog, but I’ve been pretty busy with a raft of other activities…
- a recent article on our sister site – oracle-integration.cloud on RPA.
- I also appear in an interview with K21 Academy here.
- Reviewing a new book on Enterprise API Management for Packt which we would very highly recommend if you want to understand the more Enterprise perspectives of adopting APIs, particularly if you’re considering APIs as a potential new revenue stream.
- UK Oracle User Group committees for TechFest (having been reviewing the paper submissions it looks like it’s going to be an excellent conference in December) and Southern Summit (next week).
- Just launched a number of sessions for the Oracle London Developer Meetup, with another to be announced soon (Blockchain) and potentially two more before the end of the year (we’re working on the speakers now).
I’ve started to subscribe to the APISecurity.io newsletter. The newsletter includes the analysis of recent API based security breaches along with other useful API related news. Some of the details of the breaches make for interesting reading and provide some good examples of what not to do. It is rather surprising how regularly the lack of the application of good practises is, including:
- Checking the payload is valid to the definition,
- Checking the payload size to ensure it is in the expected bounds,
- Use strong typing on the content received it will help validate the content and limit the chances of poisonous content like injected SQL,
- Ensuring the API has mitigation’s against the classic OWASP Top 10 – SQL Injection, poor authentication implementation.
More broadly, we see that people will recognise the need for applying penetration testing, and look to external organisations to perform the testing, when such work is commissioned the understanding of what the pen tester does is not understood by those commissioning the tests (SANS paper of security scoping), therefore know whether all the risks are checked. When you add to that, the temptation to keep such costs down resulting in the service provider not necessarily probing your APIs to the fullest extent. Not all penetration test services are equal, so simply working to a budget isn’t wise, yes there is a need for pragmatism, but only when you understand the cost/risk trade-off.
But also remember application logic and API definitions and the security controls in place change over time as do the discovery of new vulnerabilities on the stack you’re using, along with evolving compliance requirements. All meaning that a penetration test at the initial go-live is not enough and should be an inherent part of an APIs lifecycle.
When it comes to payload checks etc, products like Oracle’s API Platform make it easy to realise or provide out of the box checks for factors such as size limits, implementing payload checks, so better to use them.
If you ever need to be reminded that of why best practises are needed and should be implemented; a mindset of when not if a breach will happen will ensure you’re prepared and the teams are motivated to put the good practises in.