Handling Socket connectivity with API Gateway

Tags

API, event streams, gateway, OAuth, Oracle, Security, socket

At the time of writing the Oracle API Platform doesn’t support the use of Socket connections for handling API data flows. Whilst the API Platform does provide an SDK as we’ve described in other blogs and our book it doesn’t allow the extension of how connectivity is managed.

The use of API Gateways and socket-based connectivity is something that can engender a fair bit of debate – on the one hand, when a client is handling a large volume of data, or expects data updates, but doesn’t want to poll or utilize webhooks then a socket strategy will make sense. Think of an app wanting to listen to a Kafka topic. Conversely, API gateways are meant to be relatively lightweight components and not intended to deal with a single call to result in massive latency as the back-end produces or waits to forward on data as this is very resource-intensive and inefficient. However, a socket-based data transmission should be subject to the same kinds of security controls, and home brewing security solutions from scratch are generally not the best idea as you become responsible for the continual re-verification of the code being secure and handling dependency patching and mitigating vulnerabilities in other areas.

So how can we solve this?

As a general rule of thumb, web sockets are our least preferred way of driving connectivity, aside from the resource demand, it is a fairly fragile approach as connections are subject to the vagaries of network connections, which can drop etc. It can be difficult to manage state (i.e. knowing what data has or hasn’t reached the socket consumer). But sometimes, it just is the right answer. Therefore we have developed the following pattern as the following diagram illustrates.

API Protected Sockets

How it works …

The client initiates things by contacting the gateway to request a socket, with the details of the data wanted to flow through the socket. This can then be validated as both a legitimate request or (API Tokens, OAuth etc) and that the requester can have the data wanted via analyzing the request metadata.

The gateway works in conjunction with a service component and will if approved acquire a URI from the socket manager component. This component will provide a URL for the client to use for the socket request. The URL is a randomly generated string. This means that port scans of the exposed web service are going to be difficult. These URLs are handled in a cache, which ideally has a TTL (Time To Live). By using Something like Redis with its native TTL capabilities means that we can expire the URL if not used.

With the provided URL we could further harden the security by associating with it a second token.

Having received the response by the client, it can then establish the socket-based connection which gets routed around the API Gateway to the Socket component. This then takes the randomly-generated part of the URL and looks up the value in the cache, if it exists in the cache and the secondary token matches then the request for the socket is legitimate. With the socket connection having been accepted the logic that will feed the socket can commence execution.

If the request is some form of malicious intent such as a scan, probe or brute force attempt to call the URL then the attempt should fail because …

If the socket URL has never existed in or has been expired from the Cache and the request is rejected.
If a genuine URL is obtained, then the secondary key must correctly verify. If incorrect again the request is rejected.
Ironically, any malicious attack seeking to overload components is most likely to affect the cache and if this fails, then a brute access tempt gets harder as the persistence of all keys will be lost i.e. nothing to try brute force locate.

You could of course craft in more security checks such as IP whitelisting etc, but every-time this is done the socket service gets ever more complex, and we take on more of the capabilities expected from the API Gateway and aside from deploying a cache, we’ve not built much more than a simple service that creates some random strings and caches them, combined with a cache query and a comparison. All the hard security work is delegated to the gateway during the handshake request.

Thanks to James Neate and Adrian Lowe for kicking around the requirement and arriving at this approach with us.