Tags

, , , , , , , , , , , ,

We have been developing some advanced custom API policies for a client and in the process picked up on a few insights that didn’t even make into the API book. One of these policies is to provide an optimization around caching of API calls. The rest of this blog will talk about the tricks we have specifically applied to link an API Gateway to a caching mechanism and why.

Before I go into the details, I’d like to thank the Oracle product management team and particularly Glenn Mi at Oracle for their support getting through the deeper undocumented elements of the capabilities of the API Platform SDK.

Caching Options

Caching comes in may forms, and is motivated by varying reasons and not always the wanting the same behaviours. When getting into the subject of caching it is surprising how polarised people’s view points can be about which cache strategies are correct. The following diagram illustrates the diversity of caches that could appear in an end to solution solution.

Bringing together a caching technology in the Reverse Proxy model and an API Gateway makes a lot of sense. Data being provided to API consumers needs to be protected whether it comes from a cache or an active back-end system. At the same time you also want to exploit an API Gateway to provide analytics on API traffic, so any caching needs to be behind the gateway. But, if In front of an application layer then we can reduce the application workload.

When it comes to caching technology to partner with the gateway, there are a number of options available from Coherence to ehCache, memcache and Redis. We have avoided Coherence, whilst the gateway currently runs on a WebLogic server, we don’t want to need to unduly distort the performance profile and configuration of the Gateway by forcing a cache onto that server. In addition to which as Coherence is a licensed addition to WebLogic it raises difficult questions about licensing when deploying gateways (with gateways licensed based on logical groupings and API volumes but Coherence is licensed by OCPU). We also know that Oracle is moving towards having a micro-gateway which may mean we see the gateway engine moved onto something like Helidon (but this last point is my speculation).

We have elected to use Redis for several reasons –

  • Available as a PaaS service with several cloud providers (AWS & Azure) so no setup or management effort but can also be deployed on-premises,
  • Has an out of the box deployment that means cached entities can have a time to live (TTL) rather than needing to implement separate processes to expire cached values,
  • The ability to make it scale through clustering,
  • Cost

This caching model also allows us to optionally allow application development teams to push into the cache directly results. So rather than waiting on TTL the cache can be refreshed directly or even primed, rather than having to create fake requests to prime a cache.

Custom Policy

The API Platform custom policy needs to be able to do several things, firstly on a request flow, determine whether there might a cached value, and then try and retrieve it.  If there is a cache value that can be retrieved, then we want to short circuit the normal path so the back end implementation of the API isn’t called but the cached data is used to construct the response (along with additional header information showing the original date and time of the data creation).

This is where Glenn came to our rescue.  We had been told it is possible to force the response to happen. However how to do this is undocumented at present (neither in Oracle’s SDK documentation or in our book on API Platform). But more on this in a moment.

On the response side, the rules about what can and can’t be cached need to be implemented. This includes interpreting possible header values that may describe caching behaviour. Then if cachable the payload needs to be stored in the cache with an appropriate TTL (either defaulting or taking the TTL from the header rules about caching). This meant taking the time under the Internet Engineering Task Force (IETF)’s RFC’s on the subject (such as RFC7234) to ensure the behaviour is correct.

Force Response

Within the SDK is a hidden method called forceResponse() which whilst not exposed by the Java Interface definitions can be accessed as follows:

oracle.apiplatform.policies.sdk.ocsg.context.ApiRuntimeContextAdaptor 
ctx =
(oracle.apiplatform.policies.sdk.ocsg.context.ApiRuntimeContextAdaptor)
apiRuntimeContext;

ctx.mContext.forceResponse();

It is something of a fudge, as we’re casting an object back to a specific implementation of the context object. But we have raised a request with Oracle to expose this operation through the interfaces.

To make this operation work, we do need to bring an additional class into the SDK dependencies for this to work. Specifically, oracle.sdp.daf.jar which can be located in the gateway deployment in GATEWAY_HOME/ocsg/applications/daf.war

With this we can build the policy and get it working.

Checking the Header for Cache-Control and Pragma

One of the challenges that the IETF’s RFC presents (such as 7234) is the fact that the headers have no requirements in terms of case standardization so CACHE-CONTROL is just as valid as Cache-Control or cache-control. As the headers are only provided as a Map, we’ve implemented our own operations to case insensitively search the headers of the relevant values.

Connecting to Redis

To connect with Redis, we’ve taken advantage of the Jedis library which has made the whole process straight forward providing a Java skin on the Redis query notation, creating a connection or connection pool to use, and allowing the connections to be tuned if necessary.  As the GitHub page says, its there to make it easy to use Redis and be small (something we want out of our use case).

Development & Testing Caching

Redis had an extra bonus to help, the development process, RedLabs (the producers of Redis) provide a small footprint free tier service running on AWS infrastructure. This has meant during our development of the core logic of policy we have been able to work pretty much in a skunk works manner with no need to spin up or configure any infrastructure, virtual networks that should ensure the security lock-down isn’t preventing traffic to the cache, all things that you would expect a secured environment (cloud or on-premises) to include. Which meant until we’re happy with the behavior and understand the network and infrastructure configuration requirements to automate the setup into the proper environments through the use of ARM (Azure), Cloud Formation (AWS), Terraform or Ansible.