Mastering FluentD configuration syntax

Tags

, , , , , , , ,

Getting to grips with FluentD configuration which describes how to handle logging event(s) it has to process can be a little odd (at-least in my opinion) until you appreciate a couple of foundation points, at which point things start to click, and then you’ll find it pretty easy to understand.

It would be hugely helpful, if the online documentation provided some of the points I’ll highlight upfront rather than throwing you into a simple example, which tells you about the configuration, but doesn’t elaborate as deeply as maybe worthwhile. Of course, that view point maybe born from the fact I have reviewed so many books I’ve come to expect things a certain way.

But before I highlight what I think are the key points of understanding, let me make the case getting to grips wit FluentD.

Why master FluentD?

FluentD’s purpose is to allow you to take log events from many resources and filter, transform and route logging events to the necessary end points. Whilst is forms part of a standard Kubernetes deployment (such as that provided by Oracle and Azure for example) it can also support monolithic environments just as easily with connections working with common log formats and frameworks. You could view it as effectively a lightweight (particularly if you use FluentBit  variant which is effectively a pared back implementation) middleware for logging.

If this isn’t sufficient to convince you, if Google searches are a reflection of adoption, then my previous post reflecting upon Observability -London Oracle Developer Meetup shows a plot reflecting the steady growth.  This is before taking into account that a number of cloud vendors have wrapped Fluentd/fluentbit into their wider capabilities such as Google (see here).

Not only can you see it as middleware for logging it can also have custom processes and adapters built through the use of Ruby Gems, making it very extensible.

FluentD

Remember these points

and mastering the config should be a lot easier…

Continue reading

Observability -London Oracle Developer Meetup

Tags

, , , , , , , , , , ,

meetup-monitoringLast night was the London Oracle Developer Meetup’s sessions around observeability.  Andrei Cioaca with a focus on the use of OpenTracing as provided by Jaeger, in a standard Kubernetes deployment with Istio – realized with Oracle Kubernetes Engine (OKE).  This was followed by my session on another pillar using logging via FluentD. Also incorporated into standard Kubernetes, but also able to support traditional monolithic use cases.

Andrei provided a great overview of the 3 pillars, and the strengths and weaknesses of the different pillars. With the basics covered Andrei then dove into the configuration and execution of Istio combined with Jaeger and the corresponding insights available.  including a look at the kinds of visual insights that Jaeger and Kiali provide.  Some probing conversations followed about the relationship to Spring Cloud Sleuth, Open Zipkin and the OpenTracing as a concept more generally.

Andrei’s presentation material can be found in his GitHub repository here.

search-trend-fluentd

Google Analytics on Search Terms

My session followed a pizza break, as there was a delay in its arrival. With everybody having chatted over pizza about OpenTracing, we picked up on FluentD and the Logging aspect to Observability. FluentD, as an open source project has been growing steadily, and actually baked into several Log Analytics products and services – as the above analytics from Google shows.

The presentation looked at the growing challenges of modern software in terms of making sense of logging.  We explored the capabilities of FluentD before drilling into realworld use cases and potential deployment models.

As you’ll see from the slides we ran a couple of demos. The configuration for the demo’s can be found at https://github.com/mp3monster/fluentd-demos along with an example payload.

The next meetup we have organized is around Blockchain, all the details can be found at https://www.meetup.com/Oracle-Developer-Meetup-London/events/264661742/.

Other related info …

Article direct to LinkedIn – OpenTracing and API Gateways

Tags

, , ,

Capgemini’s Oracle Expert Community – which includes myself, have been asked to publish articles directly to LinkedIn as part of the supporting activities to Oracle Open World. So here is my offerings: https://www.linkedin.com/pulse/connection-between-api-gateways-opentracing-phil-wilkins/.

This is a short look at why API Gateways at the boundary of your environment when supporting OpenTracing can offer more values.

Handling Socket connectivity with API Gateway

Tags

, , , , , ,

At the time of writing the Oracle API Platform doesn’t support the use of Socket connections for handling API data flows. Whilst the API Platform does provide an SDK as we’ve described in other blogs and our book it doesn’t allow the extension of how connectivity is managed.

The use of API Gateways and socket based connectivity is something that can engender a fair bit of debate – on the one hand, when a client is handling a large volume of data, or expects data updates, but doesn’t want to poll or utilize webhooks then a socket strategy will make sense. Think of an app wanting to listen to a Kafka topic. Conversely, API gateways are meant to be relatively lightweight components and not intended to deal with single call to result in massive latency as the back-end produces or waits to forward on data as this is very resource intensive and inefficient. However, a socket based data transmission should be subject to the same kinds of security controls, and home brewing security solutions from scratch is generally not the best idea as you become responsible for the continual re-verification of the code being secure and handling dependency patching, and mitigating vulnerabilities in other areas.

So how can we solve this?

As a general rule of thumb, web sockets are our least preferred way of driving connectivity, aside from the resource demand, it is a fairly fragile approach as connections are subject to the vagaries of network connections, which can drop etc. It can be difficult to manage state (i.e. knowing what data has or hasn’t reached the socket consumer). But sometimes, it just is the right answer. Therefore we have developed the following pattern as the following diagram illustrates.

API Protected Sockets

How it works …

The client initiates things by contacting the gateway to request a socket, with the details of the data wanted to flow through the socket. This can then be validated as both a legitimate request or (API Tokens, OAuth etc etc) and that the requester can have the data wanted via analyzing the request metadata.

The gateway works in conjunction with a service component, and will if approved acquire a URI from the socket manager component. This component with provide a URL for the client to use for the socket request. The URL is a randomly generated string. This means that port scans of the exposed web service is is going to be difficult. These URLs are handled in a cache, which ideally have a TTL (Time To Live). By using Something like Redis with its native TTL capabilities means that we can expire the URL if not used.

With the provided URL we could further harden the security by associating with it a second token.

Having received the the response by the client, it can then establish the socket based connection which gets routed around the API Gateway to the Socket component. This then takes the randomly generated part of the URL and looks up the value in the cache, if it exists in the cache and the secondary token matches then the request for the socket is legitimate. With the socket connection having been accepted the logic that will feed the socket can commence execution.

If the request is some form of  malicious intent such as a scan, probe or brute force attempt to call the URL then the attempt should fail because …

  • If the socket URL has never existed in or has been expired from the Cache and the request is rejected.
  • If a genuine URL is obtained, then the secondary key must correctly verify. If incorrect again the request is rejected.
  • Ironically, any malicious attack seeking to overload components is most likely to affect the cache and if this fails, then a brute access tempt gets harder as the persistence of all keys will be lost i.e. nothing to try brute force locate.

You could of course craft in more security checks such as IP white listing etc, but every-time this is done the socket service gets ever more complex, and we take on more of the capabilities expected from the API Gateway and aside from deploying a cache, we’ve not built much more than a simple service that creates some random strings and caches them, combined with a cache query and a comparison. All the hard security work is delegated to the gateway during the handshake request.

Thanks to James Neate and Adrian Lowe for kicking around the requirement and arriving at this approach with us.

 

Costs in Multi-Cloud

Tags

, , , , , , , ,

Over the last couple of years we have seen growing references to multi-cloud. That is to say, people are recognizing that organisations, particularly larger ones are ending up with cloud services for many different vendors. This at-least in part has come from where departments within an organization can by meaningful resources within their local budgets.

Whilst there is a competitive benefit of the recent partnership agreement between Microsoft and Oracle given the market margin AWS has in comparison to everyone else. Irrespective of the positioning with AWS, this agreement has arisen because of the adoption of multi-cloud. It also provides a solution to the problem of running highly resilient Oracle database setups using RAC, DataGuard etc can be made available to Azure without risk to security and the all important network performances that are essentially to DB operation. Likewise, Oracle’s SaaS offerings are sector leaders if not best in place, something Microsoft can’t compete with. But at the other end, regardless of Oracle’s offerings, often organisations will prefer Microsoft development ecosystem because of the alignment to office tooling, the ease of building solutions quickly.

Multi-cloud even with the agreements like the Microsoft and Oracle one (See herehere), doesn’t mean there won’t be higher costs in crossing clouds. Let’s see where the costs reside …

  • Data egress (and in some cases ingress as well) from clouds costs. Whilst the ingress costs have been eliminated because it can be seen as a barrier to selling services, particularly big data. Data egress can however be an issue. Oracle have made this cost very low to be almost negligible, but not necessarily others as the following comparison shows …
  • Establishing the high performance connections That the agreement supports needed between Azure and Oracle cloud is the same tech for cloud to ground do incur a cost. In Oracle’s case there is a fee for the connection (not a large cost, but one that exists none the less) plus any traffic fees the provider of the network connection spanning the data center locations. This is because you’re leasing capacity on someone’s dedicated fiber or MPLS services. Now, this should prove to be small as part of the enabler of this offering is that both Oracle and Microsoft cloud DCs are often actually physically provided by the same provider or at-least the centers a physically pretty close, as a result of both companies gravitating to locations close together because of the optimal highly available infrastructure (power, telecommunications) legal and commercial factors along with the specialist skills needed.

If data egress is the key challenge to costs, what drives the data egress beyond the obvious content for user interfaces? …

  • Obviously you have the business data flows, some of these flows will be understood by the business community. But not all, this is down to the way data from cloud can be exposed to another. For example inefficient services with APIs that requires frequent polling and not using expressing the request efficiently, rather than perhaps express the request using HTTP header attributes and other efficiencies or even utilize frameworks such as webhooks so data can be pushed.
  • High speed data access, often drives data replication having databases in multiple clouds with mirror image data in each location even if the majority of the data is not necessarily needed. This can happen with technologies such as Kafka which for non compacted topics will mean every event can be replicated even if that event has a short life time.
  • One of the hidden costs, is the operational tasks of gathering logs to a combined view so end to end insights can be obtained. A detailed log can actually yield more ‘data’ by volume than the business flows themselves because it is semi structured, and intended to be very readable and at the most granular level there to help debug and test.

In addition to the data flows, you need to consider how other linkages in addition to the Oracle-Azure connection are involved. In the detailed documentation it is not possible to get your on-premises location connected to one of the clouds (e.g. Oracle FastConnect, and then assume your traffic can hop to Azure via the bridge using FastConnect and Azure’s ExpressRoute.  To have have performance to your solution parts in both Azure and Oracle Cloud, you still need to have FastConnect and ExpressRoute configured to your on-premises location. This of course may impact how bulk data for lift and shift app use cases such as EBS maybe applied. For example, if you choose to regularly bulk data transfer between on-premise and EBS via the app/middleware tier rather than via direct DB, and that mid tier is running in Azure – you will need both routes established.

Conclusion

There is no doubt that the Oracle-Azure cloud to cloud linkage is a step forward, but ‘the devil is in the details‘ as the saying goes. To optimize the benefits and savings we’d suggest that you;

  • you’ll need think through your use cases – understand data flow and volume (someone bulk syncing application data with a data warehouse?),
  • define a cloud data strategy – to layout principles, approaches and identify compliance needs, this is particularly helpful for custom solution development, so the right level of log data is consolidated with the important details, data retention addresses compliance requirements and doesn’t ratchet up unnecessary costs (there is a tendency to horde data just in case – if this is really wanted, think about how its stored),
  • based on business common usage models define a simple forecasting formula – being able to quantify data costs will always makes it easier to challenge back data hording tendency,
  • confirm the inter-cloud network vendor charges when working with multi-cloud.

Why do I have an Ace logo on my website?

Tags

, , , ,

For the observant, you’ll have noticed that I have a logo on left side of my site saying Oracle Ace. Periodically I get asked what is it, what does it mean, and for those who are less involved in the Oracle community probably don’t know what it means.

What

Most developers will probably have encountered the idea of Java Champions or perhaps Microsoft MVPs (Most Valued Professional). All of these badges, and other large vendors such as SAP have comparable ones are a recognition of individuals outside of the organisation (in this case Oracle) who do a lot to support the community and wider technology ecosystem.

These contributions vary but typically take the form activities such as writing blogs/articles/books, answering questions on StackOverflow and other community sites where questions are raised and answered by experts. Organising and/or presenting at conferences.

This is the content helps bridge the gap between the standard guidance, documentation, white papers that the vendors will produce and real world practical experience.

Whilst you in theory you don’t have to be an expert to be part of these advocacy programmes, the reality is to communicate the meaningful value you need to have a level of experience and understanding that is more than the majority. I know a number of people in the Ace community who would deny being experts, and the only thing that differentiates them from everyone else is being willing to stand up and share what they have learnt. I would say that inevitably they are experts, as the processes and resource (atleast for Aces inevitably enable that development of expertise as I will try to illustrate shortly).

But before we progress, let me quickly summarise the advocacy communities that Oracle support …

  • 9928f94d6fb7bc0024781fa68e0bc571_400x400Java Champions – these are people working in the pure Java ecosystem
  • Oracle Ace’s – within this community we have three tiers of Ace and which tier reflects the amount of time actively involved in the Ace programme and how much you contribute. So you start out as …
    • Ace Associate progressing to
    • ace-logoAce then a at the top are a smaller community of
    • Ace Directors

Ace’s generally focus on Oracle’s mainstream products from database to Middleware like WebLogic and Apps

  • obga_badgeThe final group are Groundbreaker Ambassadors – this group are comparable to Ace Directors and actually progress through the Associate and Ace accreditation. But rather than focus on more traditional Oracle offerings this group tend to work with what could be described as modern app dev tech from Microservices and APIs to Blockchain.

Why?

Why get involved in such a community? Whilst I can only speak for myself, I suspect some of my motivators hold true for others, for me it’s about…

  • The is a strong sense of community amongst the Aces and obviously an inbuilt common interest. Given we often encounter each other at conferences etc, it makes it a lot easier socially when attending conferences. Stuck for a coffee conversation? Go say hello to someone you already know.
  • The value in knowledge and experience is in the sharing of that information, and you can’t beat the sense of validation when someone says – thank you, that really helped me.
  • Talking to other Aces means you may pickup useful insights. Certainly the Ace community are encouraged to develop relationships with Oracle product management (to be nominated for Ace Director / Groundbreaker Ambassador you need the sponsorship of a product manager).
  • These insights will further your knowledge which makes the day job easier. It becomes easier to influence Oracle when it comes to having features or priorities set that are of interest.
  • Some employers and customers put value on the Ace recognition as
    • there is the implicit expertise
    • gives indirect channels to product management
    • track record of sharing and enabling others
  • …so it creates some extra career opportunities or a foot up. If you look at eProseed’s website you’ll see that they are very proud to employee a lot of Ace Directors.

Expertise

Coming back to the point of expertise, as you develop within the community, the chances of learning from others increases, but developing relationships with product management means getting to hear about what’s next etc as well as getting to hear the product managers and their thinking. In fact Ace Directors and Groundbreakers have dedicated briefing sessions and additional access that provides further insight into the product, strategy and direction. These relationships can start to create a virtuous circle of knowledge accumulation.

Biased?

Carrying the badge of a vendor, and obviously contributing to a vendor’s community carries the risk of being perceived as not being independent/impartial or perhaps understanding the wider landscape. But having been part of the community, this is deeply inaccurate. The community members I know take pride in being professional which usually means being clearly impartial and appreciating the wider IT landscape in which they specialize. Being an Ace doesn’t mean you only know Oracle products, many Ace’s in the integration and development space are often also certified on the Azure or AWS platforms for example. What you won’t find is an Ace publicly calling Oracle out, but then with the access afforded/acquired into the organisation means where there are concerns/challenges/issues they are communicated through the relationships developed and this input appears to be taken very seriously.

Benefits

When it comes to benefits, there are some, but I wouldn’t want people to think that it will ‘pay’ for the level of effort put in. The benefits are very much in the realm of acknowledgement for the contributions made. So yes, we get a few goodies – nice polo shirt with the community logo and the alike. Engraved glassware acknowledging your progress to Ace. The real reward for me, is the community and having opportunities to share insights and a bit of acknowledgement of the effort invested, everything else is a bonus.

London Oracle Developer Meetup – OIC Patterns and more

Tags

, , , ,

This Meetup was put together quickly as it presented an opportunity to align with other events happening in the Oracle offices. Despite the relatively short notice we a turn out that really made great use of our speaker – Sid Joshi who walked through the Enterprise Level patterns supported by Oracle’s Integration Cloud (OIC) including a demo showing how PaaS4SaaS worked using Service Cloud and OIC making use of VBCS and integration (formerly ICS) parts of the API Platform.

As with all the meet-ups we allow the discussions to flow freely. So, the conversation probed different aspects of OIC. So with the follow up on Several Capgemini use cases of OIC that have won the team awards.

You can see these use cases here. Sid’s presentation is available AppIntegrationPatterns_MeetUp. Additional resources can also be obtained from https://oracle-integration.cloud

As the conversation has focused on OIC and the use cases rather than our ongoing Drones with APIs stories, I have had an interesting follow on discussion about the application of drones.  The drone story has many threads.  The initial driver for the work on the drone has been about bringing something interesting and distinctive to the meetup.  The drone is very tangible, and the source of amusement which makes the meetups a lot more fun.

Continue reading

Managing API Gateway Costs with Oracle API Platform

Tags

, , , , , , ,

The Oracle API Platform adopted an intelligent pricing model by basing costs on API call volumes and Logical gateway node groupings per hour. In our book about the API Platform (more here). We suggested that a good logical grouping would be to reflect the development, test, preproduction and production model. This makes it nice and easy to use gateway based routing to different environments without needing to change the API policy configuration as you promote your solution through environments.

We have also leveraged naming and Role/Group Based Access Controls to make to make it easy to operate the API Platform as a shared service, rather than each team having its own complete instance. In doing so the number of logical gateways needed is limited (I.e. not logical gateway divisions on per team models needed). Group management is very easy through the leveraging of Oracle’s Identity Cloud Service – which is free for managing users on the Oracle solutions, and also happens to a respected product in its own right.

Most organisations are not conducting development and testing 365 days a year, for 24 hours (yes in an ideal world prolonged soak and load tests would be run to help tease out cumulative issues such as memory leaks, but even then it isn’t perpetual). As a result it would be ideal to not be using logical gateways for part of the day such as outside the typical development day, and weekends.

Whilst out of the out of hours traffic may drop to zero calls and we may even shutdown the gateway nodes, this alone doesn’t effectively reduce the number of logical gateways as the logical gateway aspect of the platform counts as soon as you create the logical group in the management portal. This in itself isn’t a problem as the API Platform drinks it’s own Champagne as the saying goes, and everything in the UI is actually available as a published REST endpoint. Something covered in the book, and in previous blog posts (for example Making Scripts Work with IDCS Deployed PaaS and Analytics and Stats for APIs). Rather than providing all the code, you can see pretty much all the calls necessary in the other utilities published.

Before defining the steps, there are a couple of things to consider. Firstly, the version of the API deployed to a specific logical gateway may not necessarily be the latest version (iteration) and when to delete the logical gateway this information is lost, so before deleting the logical gateway we should record this information to allow us to reinstate the logical gateway later.

As deleting logical gateways will remove the gateway from the system, when recreating the gateway we can use the same name, but the gateway is not guaranteed to get the same Id as before, as a result we should when rebuilding always discover the Id from the name to be safe.

A logical gateway can not be deleted until all the physical nodes are reallocated, so we need to iterate though the nodes removing them. When it comes to reconnecting the nodes, this is a little more tricky as reconnecting the gateway appears to only be achievable with inform known to the gateway node. Therefore the simplest thing is when bringing the node back online we take the information from the gateway-props.json file and run a script that determines whether the management tier knows about the node. If not then just re-run the create, start, join cycle., otherwise just run the start command.

As with the logical gateway, re-running the create, deploy, start cycle will result in the node having a new Id. This does mean that whilst the logical gateway name and even the node names will remain the same, the analytics data is likely to be become unavailable, so you may wish to extract the analytics data. But then, for development and test this data is unlikely to provide much long term value.

So based on this our sequence for releasing the logical gateway needs to be ….

  1. Capture the deployed APIs and the iteration numbers,
  2. Ideally shutdown the gateway node process itself,
  3. Delete all the gateway nodes from the logical gateway,
  4. Delete the logical gateway,

Recover would then be …

  1. Construct the logical gateway,
  2. Redeploy the APIs with the correct iteration numbers to the logical gateway using the recorded information- if no nodes are connected at this stage, the UI will provide a warning
  3. As gateway nodes comeback on line, determine if it is necessary to execute the create, start, join or just start

Of course these processes can be all linked to scheduling such as a cron job and/or server startup and shutdown processes.

Mastering Distributed Tracing – book review

So recently we have been working on ‘knowing what I don’t know’ when it comes to Open Tracing and how such tech may intersect with traditional logging and the use of Fluentd.

As part of that, I have read the Packt book Mastering Distributed Tracing written by Yuri Shkuro who has been key in the OpenTracing API and Jaeger and is the technical lead for Uber’s tracing team.

Whilst I have a good relationship with Packt, the fact they published the book is pretty much coincidental.

Understanding tracing over traditional logging is very important when moving into the world of microservices and reactive frameworks such as Node.js where threads are picked up and put down, you don’t know where and when the next service in a solution will pick up the next related activity. When you add to this solutions are more polyglot than ever – not only in the sense of different languages that maybe used, but a more diverse source of middle features e.g. historically you’d probably use JMS based messaging if you’re a Java developer and MSMQ for .net. Now you may be using AWS SNS as easily as Kafka. This means the mechanisms for passing and tracing events through these services need to be more unifying than ever.

Complexity of Observability

Continue reading