Different aspects of APIs and microservices from API Design to utilities to work with API Platform and microservices from mindmasps on design to paradigm concepts
A while back, I was invited to contribute to Devmio (the knowledge portal driven by the publishers involved with the JAX London and other events). After a little bit of delay from my end, I offered an article that they decided was sufficient to be incorporated into DevOps magazine.
The last week or so has been the DeveloperWeek 23 Conference – in Hybrid form, with the physical event last week and online this week. Circumstances prevented me from attending physically, but yesterday I was honored with the opportunity to present virtually. My session covered the adoption of API Streaming as an alternative approach to needing to poll with APIs to get the latest data state/updates.
One of the benefits of using cloud providers is the potential to scale solutions to meet demand by scaling out with additional compute nodes without concern for physical capacity limits (cost considerations are smaller, but still apply). Dynamic scaling is typically driven by monitoring defined groups of nodes for CPU use and when demand hits a threshold an additional node is spun up. Kubernetes can create some more nuanced scaling configurations but this is tends to be used for microservice style solutions.
This is well and fine, but if we need to finesse the configuration to ensure multiple container instances (such as microservices) can be allowed on a single node without compromising the deployment of containers across nodes of a cluster to provide resilience things can get a lot more challenging. We also want to manage the number of active containers – we don’t want to have unnecessary volumes of containers effectively idling. So how do we manage this?
In addition to this, what if we’re using services that demand tens of seconds or even minutes to be spun up to a point of readiness – such as instantiating database servers, adding new nodes to data caches such as Coherence which will need time to clone data, and adjust the demand balancing algorithms? Likewise, for more traditional application servers. Some service calls will impact upstream configuration changes, such as altering load-balancing configurations. Cloud-native Load Balancers typically can understand node groups, but what if your configuration is more nuanced? If you’re running services such as ActiveMQ you need to update all the impacted nodes in a cluster to be aware of the new node. Bottomline is some solutions or solution parts need lead time to handle increased demand?
This points to more advanced, nuanced metrics than simply current CPU load. And possibly the scaling algorithm needs to be aware of lead times of the different types of functionality involved. So how can we advance a more nuanced, and informed scaling logic? You could look at the inbound traffic on firewalls or load balancers. But this will require a fair bit of additional effort to apply application context. For example is the traffic just getting directed to static page content? We do need to derive some context so that the right rules are impacted. Even in simple K8s only hosted solutions a cluster may host services that aren’t related to the changes in demand and need to be scaled as a result.
A use case perspective
Let’s look at this from a hypothetical use case. We’re a music streaming service. Artists can control when their music is made available, but the industry norm is 12:01am on a Friday morning. There is always a demand spike at this time, but the size of the demand spike can vary, with some artists triggering larger spikes (this isn’t directly correlating to an artist’s popularity, some smaller groups can have very enthusiastic fans) – so dynamic scaling is needed, and reactive to demand. Reporting, analytics, and payment services see cyclical bumps around the monthly financial periods. These monthly cyclic activities can also be done during quieter operating hours. So it is clear we may wish to apply logical partitioning, but for maximum cost efficiency keep everything in the same cluster. There are correlations between different service demands. Certain data services see increased demand during the demand spikes and reporting period, but ‘The bottom line is we need to not only address dynamic scaling, but tailor the scaling to different services at different times.
Back to our question – how do we manage the scaling? We could monitor the firewall and load balancer logs and analyze the kinds of requests being received. That would need additional processing logic to determine which services are receiving the traffic. But our API gateway is likely to have that intelligence implicitly in place as we have different API policies for the different types of endpoints. So we can monitor specific endpoints or groups of endpoints very easily – meaning we can infer the types of traffic demand and how to respond. Not only that, we may have API plans in place so we can control priority and prevent the free versions of our service from using APIs to initiate high-quality media streams. So we already have business and specific process meaning. So linking scaling controls such as KEDA (Kubernetes Event-driven Autoscaling) directly to measurements such as plans and specific APIs creates a relatively easy way to control scaling. Further, we may also use the gateways to provide a rate throttle so it’s not possible to crash our backend with an instantaneous spike. This strategy isn’t that different from an approach we’ve demonstrated for scaling message processing backends (see here and here).
Representation of how we can use the metrics from a gateway to not only support the scaling of a K8s cluster but also other cloud services
We can also use the data KEDA can see in terms of node readiness to adjust the API Gateway rate limiting dynamically as well. Either as a direct trigger from the number of active container instances or by triggering a more advanced check because we still have to address our services that need more lead time before relaxing the rate limiting.
Handling slow scaling services
So we’ve got a way of targeting the scaling of services. But what do we do to address the slower scaling services we described? With the data dimensioned, we can do a couple of things. Firstly we can use the rate of change to determine how many nodes to add. A very sudden increase in demand and adding a node at a time will create the effect of the service performance stuttering as it scales, and all additional capacity is suddenly consumed and then scales again. Factoring in the rate of change to the workload threshold and the context of which services could generate the increased workload can be used to not only determine which databases may need scaling but also be used to adjust the threshold of workload that actually triggers the node introduction. So a sudden increase in demand on services that are known to create a lot of DB activity is then met with dropping the threshold that triggers new nodes from, say, 75% to 25% so we effectively start the nodes on a lower threshold, means the process is effectively started sooner.
With different rates of utilization growth, we can see that we need to alter the trigger threshold for launching new resources
For this to be fully effective we do need the Gateway tracking traffic both internally (East-West) and inbound (North-South). Using the gateway with East-West traffic means that we can establish an anti corruption layer.
Conclusion
Not all parts of a solution will be instantaneous in scaling – regardless of how fast the code startup cycle might be, some services have to address the need to move large chunks of data before becoming ready, e.g., in-memory databases. Some services may not have been built for the latest business demands and the ability to exploit cloud scale and dynamic scaling. Some services need time to adjust configurations.
We may also need to adjust our protection mechanisms if we’re protecting against service overloading.
Scaling capabilities that have response latency to the scaling process can be addressed by achieving earlier, more intelligent recognition of need than warning simply by CPU loads hitting a threshold. The intelligence that can be derived from implicit service context simplifies the effort in creating such intelligence and makes it easier to recognize the change. API Gateways, message queues, message streams, and shared data storage are all means that, by their nature, have implicit context.
Pushing the recognition towards the front of the process creates milliseconds or possibly seconds more warning of the demand than waiting for the impacted nodes to see compute spikes.
One area of API design that doesn’t get discussed much is the semantics of the payload. That is, the names we give our attributes and elements for the values being communicated. When developing single-use APIs (usually for client applications), this is unlikely to be an issue as the team(s) involved are likely to know each other and are able to interact and resolve clarity issues easily enough (although getting the semantics right makes this easier particularly in the long term). But when it comes to providing reusable endpoints, we may know the early adopters but are unlikely to interact with consumers beyond that unless there is a problem.
This makes getting the semantics right somewhat harder. How do we know if our early adopters represent the wider customer base (internally and externally)? Conversely, if we simply use our own company terminology, how do we know that it is representative of the wider user base? It isn’t unusual for organizations to develop their own variations of a term or apply assumed meaning. Even simple things, a ‘post code’ element of an address, other parts of the world use ‘zip codes’ or PINS are they the same? Perhaps if we said ‘postal code,’ we break the direct specific country associations with ‘post code.’ We can overcome these issues by providing a dictionary of meanings and lengthy explanations. Using the right term goes beyond simply understanding the data value; it will infer specific formatting and potential application behaviors. Taking our postcode/zip code example. In the UK data is published, which means it is possible to easily validate a postcode against the address line and vice versa. In fact, in the UK to get something delivered, you only need the property number and the postcode. A US 5-digit zipcode can’t do that. For that precision, the ZIP+4 needs to be used.
If we can address these issues, then life becomes easier for us in maintaining the information and for consumers in not needing to look up the details. The question is how can we be sure of using semantics that is consistent across our APIs and widely understood and, when necessary, already documented, so we don’t have to document the information again?
There is a shortcut to some of these problems. Many industries have agreed on data models for different industries. The bodies such as OASIS, OMG, and others are developed and maintained by multiple organizations. As a result, there is a commonality in the meaning achieved. So if you align with that meaning, then use that semantic. Not only can the naming of attributes become easier, but any documentation can be simplified to reference the published definitions. in most cases, these standards are publicly available as it promotes the widest adoption – one of the goals of developing such models. But there are some pitfalls to be mindful of using this approach:
Sometimes rather than arrive at a universal definition, the models will accommodate structural variations or aliased names – as a result, they may not necessarily be helpful to you.
The more well-known models are internationalized. If you have no intent to support international needs and not expecting to have international consumers, then the naming may not align with localized conventions.
If you use the semantics provided, ensure your data abides by the meaning. For example, don’t use ‘shipping address’ if you’re not shipping anything.
Don’t slavishly copy the data models provided – the model may not be intended for API use cases. At the same time, it doesn’t stop you from asking why the data in the model is there and whether your users may want such data (and whether it makes sense for you to provide that information).
Predefined APIs
Some organizations, such as TMForum have taken the public data model to the next step and provided predefined API specifications. This is ideal where you’re following industry standards and providing standardized/common services that aren’t a differentiator but need to be offered as part of doing business.
Data Catalogs
Larger, data-mature organizations will keep some form of Data Catalog. These catalogs are often held to help understand compliance needs, such as where personal data is held, how data issues can impact data accuracy and integrity, etc. It is possible that metadata may also be kept to address the semantic meaning of data or reference the definitions. Such information is used to help inform any data cleansing that may be needed. This offers a potentially good source of information for internal API use cases.
Vendor Led
If your business is delivery/service focussed so that your unique value isn’t in IT processes but perhaps something that the company manufactures or a specialist service such as consulting in a specific industry, then it is possible that the majority of your systems are SaaS or COTs based. If your business has opted to focus on a particular vendor, e.g., Oracle or SAP, for most services, then vendor-led data models are a possibility. These vendors are often involved with public data model development, so they won’t be too divergent in most situations – but awareness of differences is necessary, but as both models should be internally consistent, the differences will also be consistent. This approach will give you better alignment and reduce the chances of needing to address any divergence. The downside of this is a change of direction on strategic vendors can create additional work going forward as the alignment is disrupted. More work will be needed to map from your naming and semantics to the new core, and attempts to move away from the selected model to try to realign semantics with a new core will potentially create breaking changes for API consumers.
Don’t Forget
Regardless of the approach taken, there are some very simple but critical rules that will keep you in a good place:
Don’t use your underlying storage data models – this is a well-documented API anti-pattern.
Consistency of language across your APIs, regardless of whether they are internal or external, is important.
Information Sources
Regardless of approach – be careful not to lock your API semantics and data model to that of the storage layer – these can change and even create breaking changes that you shouldn’t expose to your users. Some sources to consider.
OAGIS – covers a broad variety of business data domains. Some ERP suppliers have used this as a foundation for their application data models.
The following isn’t unique to OCIR, as it will hold true for any K8s Deployment YAML configuration that works with an Open Container Initiative compliant registry. To define the containers part of the YAML file we need to provide an attribute that can be used to confirm the legitimacy of the request. To do this we need to supply a token. However, we don’t want this token to be visible in plain sight in our YAML. The solution to this is to set up a secret within Kubernetes.
In the following YAML extract, we can see the secret is named.
This does mean we need to create the secret. As this is a one-off task the easiest step is to create the secret by hand. To do that we use the command:
This naturally leads to the next question where do we get the secret?
This step is straightforward. Navigating using the user icon top right (highlighted in the screenshot below), select the User Settings option to get to the screen shown below. Then use the right-hand menu option highlight (Auth Tokens). This displays a section of the UI showing your current auth tokens and provides a button that will popup a window to guide you through creating a new auth token.
I got into a discussion with several people about the use of GraphQL and related API technologies and discovered that a presentation I’ve been using and evolving for a while now, didn’t appear in my blog. So here is a version of it used at an API Conference …
The presentation may appear again in the future as the perspective of API technologies evolves the presentation will need to evolve. For example, AsyncAPI is starting to make an impression now. Other variants to API technologies such as DRPC are showing up.
If you’re new to GraphQL you might find a couple of other posts on the subject helpful:
I’ve had some time to catch up on books I’d like to read, including Kubernetes Best Practises in the last few weeks. While I think I have a fair handle on Kubernetes, the development of my understanding has been a bit ad-hoc as I’ve dug into different areas as I’ve needed to know more. This meant reading a Dummies/Introduction to entry style guide would, to an extent, likely prove to be a frustrating read. Given this, I went for the best practises book because if I don’t understand the practises, then there are gaps in my understanding still, and I can look at more foundation resources.
As it goes, this book was perfect. It quickly covered the basics of the different aspects of Kubernetes helping to give context to the more advanced aspects, and the best practices become almost a formulated summary in each section. The depth of coverage and detail is certainly very comprehensive, explaining the background of CNI (Container Network Interface) to network-level security within Kubernetes.
The book touched upon Service Meshes such as Istio and Linkerd2 but didn’t go into great depth, but again this is probably down to the fact that Service Mesh ideas are still maturing, and you have initiatives like SMI (Service Mesh Interface still in the CNCF’s sandbox).
In terms of best practices, that really stood out for me:
Use of Taints and Tolerations for refined control of pod deployment (Allowing affinity to be controlled to optimise resilience, or direct types of pod deployment to nodes with specialist capabilities such as GPU).
There are a lot more differences and options then you might realize in terms of ingress controller capabilities, so take time to identify what you may need from an ingress controller.
Don’t forget pods can be scaled vertically with the VPA (Vertical Pod Autoscaler)as well as horizontally through the HPA.
While using a managed persistence service will make statement storage a lot easier, stateful sets will give you a very portable solution.
As with a lot of technical books I read. As I go through the book I build up a mind map of what I think are the key points. Doing so leaves me with a resource I can use as a quick reference, but creating the mind map helps reinforce the learning. So here is the mind map …
I recently presented at APIWorld about how API definitions go beyond the payload specification into providing details of terms and conditions and so on. You can see the presentation here (more about my presentations here).
One of the questions during the presentation did I have other examples of good APIs, reflecting the points I’d made. A very valid question, to which I didn’t have more examples to hand, hence this post.
So the easy answer would be to point to an excellent article on Nordic APIs (here) that address the question and explain why they rate the APIs. But that’s a little bit of a lazy answer and in all fairness, the examples provided are from organisations where APIs are recognised as a primary or important contributor to business revenue. So I’ve looked at areas where the API may not necessarily be seen by the business as the primary source of revenue. With the examples provided, we’ve described what we think is good, or not so good about them. Hopefully, through these examples, you’ll see why points are made in the presentation. So here are my reviews…
The latest edition of OraWorld is out which includes the second part of my part part articles relating to GraphQL and API Security. You can check it out at on page 22, along with lots of other great content here.
You must be logged in to post a comment.