API Gateway for data egress

Tags

, , ,

Most larger organizations route their outbound web traffic through a web proxy. The primary motivation for this is to measure where traffic is going. Log traffic for analysis to try and detect activities trying to egress data that should remain within the organization and prevent access to websites that are considered harmful in one form or another.

So why consider an API Gateway as part of an outbound traffic flow? After all, isn’t a Gateway there to protect us? Several very good reasons. Let’s look at them:

  • Managing the use of an external paid service. You may have multiple solutions using a third-party service – for example, an SMS service. Rather than expecting all these different calls to the external API, each having a copy of the 3rd party credentials to manage, we could use the gateway as a single point to attach the credentials.
  • When it comes to being charged for a service, being able to identify the requests at the API level makes it very easy to track your own consumption and forecast forward before being billed. This is really helpful if you have an agreement that provides a good price for pre-booked capacity and a higher charge for overage/capacity not pre-booked.
  • Economies of scale for using 3rd party services can be very powerful. But it can also present two problems.
    • Switching providers quickly can be difficult as multiple points of possible change
    • How to partition the cost of the external service across different departments if everyone is using a common account.

The first of these issues can be easily overcome using the anti-corruption layer pattern where the gateway represents the correct route so it can reformat the requests in one place to work with a different provider.

At the same time, we can more intelligently use Gateway’s metering mechanisms rather than having to implement functionality to mine the proxy’s logs.

Of course you can achieve same effect without a gateway, but you don’t get the benefits that a gateway will offer out of the box. In addition the chances are that you have already got an API Gateway running for your current North-South traffic.

Podcast with Anatolii Ulitovskyi of UNmiss

Tags

, , , , ,

Just before the Christmas break, I got to record an excellent podcast with Anatolii of UNmiss. It was a great conversation about Cloud Integration, APIs, and approaches to Cloud-based integration. While I am not in a consulting role in the conventional sense, a lot of an Evangelist’s task is still to listen, understand, and, when necessary, challenge assumptions and help people understand how technologies can help address problems. This might include sketching out a journey of evolution and improvement. During the podcast, we discussed some of these ideas.

You can listen to the podcast audio or the live video stream of the conversation here.

In addition to some of the practices, we’ve used. The conversation touched upon books. My books are on the sidebar, including links to Manning, who, as a publisher, I’d recommend. I’ve previously blogged some reading recommendations and previously written some book reviews which may be of interest to anyone following up.

Published content

Tags

, , , , , , , , , , ,

We haven’t blogged too much recently as we have been busy helping get and producing content for my employer Oracle, working with Software Engineering Daily, and developing a collaborative book. So, I thought I’d pull together some links to these new resources.

Continue reading

API payload design getting the semantics right

Tags

, , , , , , ,

One area of API design that doesn’t get discussed much is the semantics of the payload. That is, the names we give our attributes and elements for the values being communicated. When developing single-use APIs (usually for client applications), this is unlikely to be an issue as the team(s) involved are likely to know each other and are able to interact and resolve clarity issues easily enough (although getting the semantics right makes this easier particularly in the long term). But when it comes to providing reusable endpoints, we may know the early adopters but are unlikely to interact with consumers beyond that unless there is a problem.

This makes getting the semantics right somewhat harder. How do we know if our early adopters represent the wider customer base (internally and externally)? Conversely, if we simply use our own company terminology, how do we know that it is representative of the wider user base? It isn’t unusual for organizations to develop their own variations of a term or apply assumed meaning. Even simple things, a ‘post code’ element of an address, other parts of the world use ‘zip codes’ or PINS are they the same? Perhaps if we said ‘postal code,’ we break the direct specific country associations with ‘post code.’ We can overcome these issues by providing a dictionary of meanings and lengthy explanations. Using the right term goes beyond simply understanding the data value; it will infer specific formatting and potential application behaviors. Taking our postcode/zip code example. In the UK data is published, which means it is possible to easily validate a postcode against the address line and vice versa. In fact, in the UK to get something delivered, you only need the property number and the postcode. A US 5-digit zipcode can’t do that. For that precision, the ZIP+4 needs to be used.

If we can address these issues, then life becomes easier for us in maintaining the information and for consumers in not needing to look up the details. The question is how can we be sure of using semantics that is consistent across our APIs and widely understood and, when necessary, already documented, so we don’t have to document the information again?

Read more: API payload design getting the semantics right

Public Data Models

There is a shortcut to some of these problems. Many industries have agreed on data models for different industries. The bodies such as OASIS, OMG, and others are developed and maintained by multiple organizations. As a result, there is a commonality in the meaning achieved. So if you align with that meaning, then use that semantic. Not only can the naming of attributes become easier, but any documentation can be simplified to reference the published definitions. in most cases, these standards are publicly available as it promotes the widest adoption – one of the goals of developing such models. But there are some pitfalls to be mindful of using this approach:

  • Sometimes rather than arrive at a universal definition, the models will accommodate structural variations or aliased names – as a result, they may not necessarily be helpful to you.
  • The more well-known models are internationalized. If you have no intent to support international needs and not expecting to have international consumers, then the naming may not align with localized conventions.
  • If you use the semantics provided, ensure your data abides by the meaning. For example, don’t use ‘shipping address’ if you’re not shipping anything.
  • Don’t slavishly copy the data models provided – the model may not be intended for API use cases. At the same time, it doesn’t stop you from asking why the data in the model is there and whether your users may want such data (and whether it makes sense for you to provide that information).

Predefined APIs

Some organizations, such as TMForum have taken the public data model to the next step and provided predefined API specifications. This is ideal where you’re following industry standards and providing standardized/common services that aren’t a differentiator but need to be offered as part of doing business.

Data Catalogs

Larger, data-mature organizations will keep some form of Data Catalog. These catalogs are often held to help understand compliance needs, such as where personal data is held, how data issues can impact data accuracy and integrity, etc. It is possible that metadata may also be kept to address the semantic meaning of data or reference the definitions. Such information is used to help inform any data cleansing that may be needed. This offers a potentially good source of information for internal API use cases.

Vendor Led

If your business is delivery/service focussed so that your unique value isn’t in IT processes but perhaps something that the company manufactures or a specialist service such as consulting in a specific industry, then it is possible that the majority of your systems are SaaS or COTs based. If your business has opted to focus on a particular vendor, e.g., Oracle or SAP, for most services, then vendor-led data models are a possibility. These vendors are often involved with public data model development, so they won’t be too divergent in most situations – but awareness of differences is necessary, but as both models should be internally consistent, the differences will also be consistent. This approach will give you better alignment and reduce the chances of needing to address any divergence. The downside of this is a change of direction on strategic vendors can create additional work going forward as the alignment is disrupted. More work will be needed to map from your naming and semantics to the new core, and attempts to move away from the selected model to try to realign semantics with a new core will potentially create breaking changes for API consumers.

Don’t Forget

Regardless of the approach taken, there are some very simple but critical rules that will keep you in a good place:

  • Don’t use your underlying storage data models – this is a well-documented API anti-pattern.
  • Consistency of language across your APIs, regardless of whether they are internal or external, is important.

Information Sources

Regardless of approach – be careful not to lock your API semantics and data model to that of the storage layer – these can change and even create breaking changes that you shouldn’t expose to your users. Some sources to consider.

  • OAGIS – covers a broad variety of business data domains. Some ERP suppliers have used this as a foundation for their application data models.
  • OASIS – covers many industries
  • TMForum APIs
  • ARTS (formally hosted by NRF now with the OMG). The full OMG standards catalog.
  • GS1 – lots here on shipping, supply chain, and product tracking

Some more reading on the subject:

API more than a payload – Cloud Lunch Learn

Tags

, , , ,

Today I was fortunate enough to present at one of the Cloud Lunch and learn events (you can register for any of the events here and see previous sessions here). One of the questions asked at the end of the session was recommended reading on APIs. So I’ve gathered up some links to books I’d suggest worthwhile reading I’d suggest:

I should also mention an API book I’ve co-authored. While it focuses on an Oracle product, there is a lot of content that is relevant to any API development using an API Gateway (Amazon.co.uk). I’ve not looked at all the books at API-University, but from I have seen the content is worth examining.

The slides for my presentation can be found on slideshare, and here:

The Presentation recording can be found here:

Continue reading

OCI Notifications through the LogSimulator

Tags

, , , , , , ,

We’ve been busy putting together a number of Oracle Architecture Center assets over the last week. This has included building LogSimulator extensions that can either be run in a very simple manner using just a single file, but limited in the payloads that can be sent to OCI (if you take the appropriate custom file from the LogSimulator you do need to make one minor tweak. But the code has also been added to the Oracle GitHub repository here in a manner that doesn’t require the full tool. There of course a price to pay for the simplified implementation. This comes in the form of the notifications being sent and received being hardwired into the code rather than driven through the insulator’s configuration options.

The decoupling has been done by implementing the interface for the custom methods in a class without the implements declaration, and then we extend the base class and apply the implements declaration at that level.

While notifications could take log events, it is more suited to JSON payloads. But as the simulator can tailor the content being sent using some formatting, it does not care if the provided events to send are pre-formatted as JSON objects making it an easy tool to test the configuration of OCI Notifications.

Unit testing as well

In addition to the new channel, as previously mentioned we have been making some code improvements. To support that we have started to add unit tests, and double checking code will compile under Java. To keep the dependencies down we’re making use of Java assert statements rather than a pretty JUnit. But the implementation ideas are very similar. As the tests use Java asserts the use of asserts does need to be enabled in the command line; for example:

Groovy test.groovy -enableasserts

JMESPath is represented using Railroad diagrams

Tags

, , , , , ,

JMESPath is a mature syntax for traversing and manipulating JSON objects. The syntax is also supported with multiple language implementations available through GitHub (and other implementations exist). As a result, it has been very widely adopted; just a few examples include:

  • Azure CLI
  • AWS CLI and Lambda
  • Oracle Cloud WAF
  • Splunk

As the syntax is very flexible and recursive in its use following the documented notation can be a little tricky to start with. So following the syntax can be rather tricky. The complete definition runs to 97 lines, of which 32 lines focus on the syntactical structure. The others describe the base types such as numbers, characters, accepted escaped characters, and so on. Nothing wrong with this, as the exhaustive definition is necessary to build parsers. But for the majority of the time it is those 32 lines that we need to understand.

As the expression goes – ‘a picture says a thousand words’, there might not be a thousand words, but there is enough to suggest a visual representation will help. Even if the visual only helps us traverse the use of the detailed syntax. So we’ve use our favoured visual representation – the railroad diagram and the tool produced by Tab Akins to create the representation. We’ve put the code and created images for the syntax in my GitHub repository here, continuing the pattern previously adopted.

Here is the resulting diagram …

To make it easy to trace back to the original syntax document we’ve included groupings on the diagram that have names from the original speciofication.

Parts of the diagram make the expressions look rather simple, but you’ll note that it is possible for the sections to be iterative which allows for the expression to traverse a JSON object of undefined depth. But what can be really challenging is that an in many areas it is possible to nest expressions within expressions. Visually there is no simple way to represent the expression possibilities of this in a linear manner. Other than be clear about where the nesting can take place.

LogSimulator New Feature – Custom Targets with OCI Logging example

Tags

, , , , ,

Those who have been using my Logging in Action book will know that to help test the configuration of monitoring tools including Fluentd we have built a LogGenerator that can very easily play and replay logging events into a variety of destinations and formats. all written in Groovy to make the utility easy to run as a script and extend without needing to set up a proper Java development environment.

With the number of different destinations built into the script and the logic to load the source log events and format them the utility is getting rather large for a single file. Rather than letting it continue to grow as we add more destinations to pump log events too, I’ve extended the implementation so you can point to a Groovy file that implements the logic to send the log events. It only requires three simple methods to be implemented.

To demonstrate the feature we have created a custom extension and fully documented it. The extension allows you to send log events to the OCI Logging service. This includes an optional crude aggregation mechanism as sending individual log events is a little inefficient over REST. By doing this we can send synthetic or playback logs as if we’re an application in real-life to ensure that any alerting or routing for the logging works properly before we get anywhere production and do not need to run the application and induce error events.

Beyond this, we’re also thinking about creating a plugin to fire log events at Prometheus so we can send events using the Prometheus pushgateway. As a result, we can tune Prometheus’ configuration.

More improvements – refactoring the existing code

We will refactor the existing code to use the same approach which should make the code more maintainable, but the changes won’t stop the utility from working as it always has (so we won’t break out the existing output channels from the core).

We have also started to improve the code commenting – so hopefully it will make the code a bit more navigable.

Practical Steps when it comes to writing a technical book

Tags

, , ,

Following my article on Software Engineering Daily, here are some practical things that will help you if you’re considering taking on a technical book project.

Identifying a Publisher

While it is easy to self-publish today. The recognition comes from having worked with a traditional publisher as they have processes that ensure a level of quality. Not all publishers are equal, and some publishers are attributed with more prestige than others. In addition to this, some publishers are willing to take a risk on a subject and/or author. Have a look at the titles already published, and whether there are any publishers you can connect to.

When comes to contacting the publishers, most of their websites will have a page for recruiting authors. Some are easier to find than others. Here are a couple:

If, or when you get to talk to a publisher it is worth ensuring you understand how their editorial process works and what is expected from you? Plus what happens if you find yourself in the position of not being able to work to the original schedule. Day-to-day work can get in the way which you hadn’t expected.

Continue reading

Contributing to Software Engineering Daily

Tags

, , , , , ,

For a long time, I’ve tracked and read articles on Software Engineering Daily. We’ll day represents what is hopefully the first of many articles that we will write for them. The article is about the kind of people that make technical book authors, and the perception we have of authors – so if you’re interested check it out here.

Some more content on the subject of books …