Defining Boundaries for Logical Gateways on the API Platform a multi cloud / multi region context

Tags

API, API Platform, Cloud, Gateways, Oracle

The Oracle API Platform takes a different licensing model to many platforms, rather than on CPU it works by the use of Logical Gateways and blocks of 25 million successful API calls per month. This means you can have as many actual gateway nodes as you like within a logical group to ensure resilience as you like, essentially how widely you deploy the gateways is more of a maintenance consideration (i.e. more nodes means more gateways to take through a maintenance process from the OS through to the gateway itself).

In our book (here) we described the use of logical gateways (groups of gateway nodes operating together) based on the classic development model, which provides a solid foundation and can leverage the gateway based routing policy very effectively.

But, things get a little trickier if you move into the cloud and elect to distribute the back end services geographically rather than perhaps have a single global instance for the back-end implementation and leverage technologies such as Content Delivery Networks to cache data at the cloud edge and their rapid routing capabilities to offset performance factors.

Classic Global split of geographies

Some of the typical reasons for geographically distributing solutions are …

The low hit rate on data meaning caching solutions like CDNs are unlikely to yield performance benefits wanted and considerable additional work is needed to ‘warm’ the cache,
Different regions require different back end implementations ordering of products in one part of the world may be fulfilled using a partner, but in another, it is directly satisfied,
Data is subject to residency/sovereignty rules – consider China for example. But Germany and India also have special considerations as well.

So our Global splits start to look like:

Global Split now adding extra divisions for India, China, Russia etc

The challenge that comes, is that the regional routing which may be resolved on the Internet side of things through Geo Routing such as the facilities provided by AWS Route53 and Oracle’s Dyn DNS as a result finding nearest local gateway. However Geo DNS may not be achievable internally (certainly not for AWS), as a result, routing to the nearest local back-end needs to be handled by the gateway. Gateway based routing can solve the problem based on logical gateways – so if we logically group gateways regionally then that works. But, this then conflicts with the use of gateway based routing for separation of Development, Test etc.

Routing Options

So, what are the options? Here are a few …

Make you Logical divisions both by the environment and by region – this is fine if you’re processing very high volumes i.e. hundreds of millions or more so the cost of additional Logical gateways is relatively small it the total budget.

Taking the geo split and applying the traditional layers as well has increased the number of Logical gateways

This problem can be further exacerbated, if you consider many larger organisations are likely to end up with different cloud vendors in the same part of the world, for example, AWS and Azure, or Oracle and Google. So continuing the segmentation can become an expensive challenge as the following view helps show:

It is possible to contract things slightly by only have development and test cloud services where ever your core development centre is based. Note that in the previous and next diagrams we’ve removed the region/country-specific gateway drivers.

Don’t segment based on environment, but only on the region – but then how do you control changes in the API configuration so they don’t propagate immediately into production?
Keep the existing model but clone APIs for each region – certainly the tooling we’ve shared (Managing API Policy Versioning in Oracle API Platform) makes this possible, but it’s pretty inelegant and error-prone as it be easy to forget to clone a change, and the cloning logic needs to be extended to take into account the bits that must be region-specific.
Assuming you have a DNS address for the target, you could effectively rewrite the resolution of the address by changing its meaning in each gateway node’s host file. Inelegant, but effective if you have automated deployment and configuration of your gateway servers.
Header based routing with the region and environment as header attributes. This does require either the client to set the values (not good as you’re revealing to your API consumer traits of the implementation), or you apply custom policies before the header-based routing that insert those attributes based on the gateway’s location etc.
Build a new type of gateway based routing which allows both the environment (dev, test etc) and location (region) to inform the routing,

Or, and the point of this blog, use gateway based routing and leverage some intelligent DNS naming and how the API Platform works with a little bit of Groovy or a custom Java policy.

Enhancing Gateway Based Routing

Let’s start with the DNS aspect, as this is what we want to leverage. If you establish the back-end implementations with a consistent naming convention, for example <environment e.g. dev | test | prod>.<region name>.myServices.myDomain.com/entity/ so if you’re using AWS Ireland or Oracle’s Phoenix region then for a Development gateway the DNS would be dev.eu-west-1.myServices.myDomain.com/entity/

Assuming the consistent DNS addressing is applied to the back-end API implementation deployments then it is possible to formulate an algorithm to tweak the gateway based addressing to be region-specific. But we’ll come back to that in a moment.

The last pay off of using a good DNS naming convention, is that if you deploy servers with Terraform, Cloud Formation, Oracle’s Cloud Stack Manager, or the equivalent feature you can build on the naming to provide each server with a meaningful system-level environmental variable which could be to take the DNS address (after all in the script you’ll have worked this out) and add an incrementing number or server logical role e.g. 001.dev.eu-west-1.myService.myDomain.com and use this for hostnames. Whilst it may appear a little clunky it is helpful operationally as now in an operational context if you need to confirm which server I’m currently connected to, by just using hostname command will give all the information necessary i.e. whether its a production machine, what job the server performs etc.

To help our policy logical, that we’ll see in a moment we need to about our server’s identity using the DNS/server naming. Given that we’ve just highlighted the value of using the naming to set the hostname, why not just use that to get the information? Well, this all comes down to security privileges for Java and the signing of the gateway jars. If you try to run a Groovy policy that calls System.getHostname() there is a strong chance it will fail because of environment security. Whilst it is possible to alter permissions or add your own jar signing to the gateway it is a fair bit of extra effort and would be invasive of the gateway deployment. Typically host security is more forgiving of environment variable access as this is a common approach to configuring apps, and you need to know the environment variable name. If we have a script to set the hostname, it isn’t difficult to also create a system-wide environmental variable that can be accessed by Groovy (remember the env var must have a scope that will mean it is seen by the shell session that runs the gateway, and the env var is configured before the gateway is started).

So let’s look at how the policies work. When an API call and reaches the Gateway, it is represented with an object that records the request and the response to provide along with constructs for the service (back-end). (again the book goes into more depth on this). But importantly the service is not invoked until all the policies are complete. This means we can configure the gateway based routing to have destinations such as

dev.xxx.myService.myDomain.com/entity and test.xxx.myService.myDomain.com/entity and so on. Then in a Groovy policy or a custom Java policy positioned after the Gateway based routing policy we can replace xxx with the appropriate value taken from our environment variable. So the Groovy Policy may look something like:

def String region= System.getenv(“REGION”)

// set an HTTP header value so we can see the region retrieved from the environment – this is purely for making traceability easier
context.getServiceRequest().setHeader(“region”, region)

// determine if we’re sure this isn’t a on-prem server – with on-prem we’re probably working with a single location
if (!region.toLowerCase().contains(“on-prem”)) {

//get the currently defined target address as set by the Gateway based routing policy

def requestURL = context.getServiceRequest().getRequestURL()
context.getServiceRequest().setHeader(“requestURL”, requestURL)

// now tweak the URL as described replacing xxx with the region information derviced frm the environment vars

String targetURL = requestURL.replace(“xxx”, region)
context.getServiceRequest().setHeader(“targetURL”, targetURL)
context.getServiceRequest().setRequestURL(targetURL)

}

This Groovy Policy will result in modifying the Gateways record of the address to call for the back-end once all the policies have been processed.

So what will happen if I resolve the back-end address to something that doesn’t exist? This is the kind of scenario where the back-end implementation does not exist or perhaps even should not exist. This case is actually very simple, the Gateway will fail to call the back-end and return a service unavailable error. The same behaviour you would expect to see if the back-end system wasn’t alive. So this doesn’t give away anything about how the environment implementing the back-end service.

Groovy or Java Custom Policy?

Should the mechanism be implemented as a Groovy Policy or Custom Java Policy? There are two factors to consider here, specifically:

will the logic be re-used in other API Policies – if the answer is yes, then implement it as a Java Policy using the SDK as just allocating a custom policy rather than cutting and pasting the Groovy script will be more reliable, and any changes will automatically ripple through to all the policies using the technique. For a Groovy approach, any change will need to be cut and paste to each location it’s used
If execution speed is absolutely critical, then the Java Custom Policy has a factory framework which is run for each policy during startup, so information such as getting the values of the variables can be done once during the startup and held in memory, rather than for each API execution retrieving the value. The fact the custom policy is written in Groovy is less of an issue as the gateway cross compiles the Groovy code by bytecode during the policy deployment.