Fluent Bit and AI: Unlocking Machine Learning Potential

30 Monday Dec 2024

Posted by mp3monster in Fluentbit, General, Technology

Tags

AI, artificial-intelligence, Cloud, Data Drift, development, Fluent Bit, GenAI, Machine Learning, ML, observability, Security, Technology, Tensor Lite, TensorFlow

These days, everywhere you look, there are references to Generative AI, to the point that what have Fluent Bit and GenAI got to do with each other? GenAI has the potential to help with observability, but it also needs observation to measure its performance, whether it is being abused, etc. You may recall a few years back that Microsoft was trailing new AI features for Bing, and after only having it in use for a couple of days, it had been recorded generating abusive comments and so on (Microsoft’s Tay is such an example).

But this isn’t the aspect of GenAI (or the foundations of AI with Machine Learning (ML)) I was thinking about. Fluent Bit can be linked to GenAI through its TensorFlow plugin. Is this genuinely of value or just a bit of ‘me too’?

There are plenty of backend use cases once the telemetry has been incorporated into an analytics platform, for example:

Making it easy to query and mine the observability data, such as natural language searching – to simplify expressing what is being looked for.
Outlier / Anomaly detection – when signals, particularly metrics, diverge from the normal patterns of behavior, we have the first signs of a problem. This is more Machine Learning than generative AI.
Using AI agents to tune monitoring thresholds and alerting scenarios

But these are all backend, big data style use cases and do not center on Fluent Bit’s core value of getting data sources to appropriate destination systems for such analysis or visualization.

To incorporate AI into Fluent Bit pipelines, we need to overcome a key issue – AI tends to be computationally heavy – making it potentially too slow for streams of signals being generated by our applications and too expensive given that most logs reflecting ‘business as usual’ are, in effect, low value.

There are some genuine use cases where lightweight AI can deliver value. First, we should be a little more precise. The TensorFlow plugin is the TensorFlow Lite version, also known as LiteRT. The name comes from the fact that it is a lite-weight solution intended to be deployable using small devices (by AI standards). This fits the Fluent Bit model of having a small footprint.

So, where can we put such a use case:

Translating stack traces into actionable information can be challenging. A trained ML or AI model can help classify and characterize the cause of a stack trace. As a result, we can move from the log to triggering appropriate actions.
Targeted use cases where we’ve filtered out most signal data to help analyze specific events – for example, we want to prevent the propagation of PII data downstream. Some PII data can be easily isolated through patterns using REGEX. For example, credit card IDs are a pattern of 4 digits in 4 groups. Phone numbers and email addresses can also be easily identified. However, postal addresses aren’t easy, particularly when handling multinational addresses, where the postal code/zip code can’t be used as an indicative pattern. Using AI to help with such checks means we must filter out signals to only examine messages that could accidentally carry such information.

When adopting AI into such scenarios, we have to be aware of the problems that can impact the use of ML and AI. These use cases are less high profile than the issues of hallucinations but just as important. As we’re observing software, which will change over time. As a result, payloads or data shifts (technically referred to as data drift) and the detection rate can drop. So, we need to measure the efficacy of the model. However, issues such as data drift need to be taken into account, as the scenario being detected may change in volume, reflecting changes in software usage and/or changes in how the solution works.

There are ways to help address such considerations, such as tracking false positive outcomes, and if the model can provide confidence scoring, is there a trend in the score?

Conclusion

There are good use cases for using Machine Learning (and, to an extent, Artificial Intelligence) within an observability pipeline – but we have to be selective in its application as:

The cost of the computation can outweigh the benefits
The execution time for such computation can be notably slower than our pipeline, leading to risks of back pressure if applied to every event in the pipeline.
The effectiveness and how much data drift might occur (we might initially see very good results, but then things can fall off).

Possibly, the most useful application is when the AI/ML engine has been trained to recognize patterns of events that preceded a serious operational issue (strictly, this is the use of ML).

Forward-looking

The true potential for Gen AI is when we move beyond isolating potential faults based on pattern recognition to using AI to help recommend or even trigger remediation processes.

Fluent Bit with Oracle Cloud

09 Tuesday Jan 2024

Posted by mp3monster in Books, Fluentbit, Fluentd, General, Oracle

≈ Leave a comment

Tags

book, Cloud, demo, FluentBit, logging, monitoring, o11y, observability, OCI, Oracle

The hyper scaler cloud vendors all offer Logging and monitoring capabilities. But they tend to focus on supporting their native services. If you’re aware of Oracle’s Cloud (OCI) messaging, then you’ll know that there is a strong recognition of the importance of multi-cloud. This extends not only to connecting apps across clouds but also to be able to observe and manage cloud-spanning solutions. Ultimately, most organizations want to headline observability-related views of their solutions.

Late last year, I presented these ideas, illustrating the ideas with the use of Fluent Bit and OCI’s Observability and Management products to visualize and analyze what is happening. I finally found the time to write how the very basic demo was built from a clean sheet over on the Oracle Devs blog on Medium.

Useful Resources for Fluent Bit and Observability

This also highlights the fact that the Fluent Bit book, while I believe, once completed, will be through, can’t cover everything – and certainly not build end-to-end use cases like the Oracle Observability & Management example. To help address this, the book includes an appendix of helpful additional information, some of which I have included here, along with other content that we encounter – all of which can be found at Fluentd & Fluent Bit Additional stuff.

Cloud Observability in Action – Book Review

04 Thursday Jan 2024

Posted by mp3monster in Book Reviews, Books, General, manning

≈ Leave a comment

Tags

book, development, FluentBit, Fluentd, manning, Michael Hausenblas, o11y, observability, OpenTelemetry, Prometheus, review

With the Christmas holidays happening, things slowed down enough to sit and catch up on some reading – which included reading Cloud Observability in Action by Michael Hausenblas from Manning. You could ask – why would I read a book about a domain you’ve written about (Logging In Action with Fluentd) and have an active book in development (Fluent Bit with Kubernetes)? The truth is, it’s good to see what others are saying on the subject, not to mention it is worth confirming I’m not overlapping/duplicating content. So what did I find?

Cloud Observability in Action by Michael Hausenblas

Cloud Observability In Action has been an easygoing and enjoyable read. Tech books can sometimes get a bit heavy going or dry, not the case here. Firstly, Michael went back to first principles, making the difference between Observability and monitoring – something that often gets muddied (and I’ve been guilty of this, as the latter is a subset of the former). Observability doesn’t roll off the tongue as smoothly as monitoring (although I rather like the trend of using O11y). This distinction, while helpful, particularly if you’re still finding your feet in this space, is good. What is more important is stepping back and asking what should we be observing and why we need to observe it. Plus, one of my pet points when presenting on the subject – we all have different observability needs – as a developer, an ops person, security, or auditors.

Next is Michael’s interesting take on how much O11y code is enough. Historically, I’ve taken the perspective – that enough is a factor of code complexity. More complex code – warrants more O11y or logging as this is where bugs are most likely to manifest themselves; secondly, I’ve looked at transaction and service boundaries. The problem is this approach can sometimes generate chatty code. I’ve certainly had to deal with chatty apps, and had to filter out the wheat from the chaff. So Michael’s approach of cost/benefit and measuring this using his B2I ratio (how much code is addressing the business problems over how much is instrumentation) was a really fresh perspective and presented in a very practical manner, with warnings about using such a measure too rigidly. It’s a really good perspective as well if you’re working on hyperscaling solutions where a couple of percentage point improvements can save tens of thousands of dollars. Pretty good going, and we’re only a couple of chapters into the book.

The book gets into the underlying ideas and concepts that inform OpenTelemetry, such as traces and spans, metrics, and how these relate to Observability. Some of the classic mistakes are called out, such as dimensioning metrics with high cardinality and why this will present real headaches for you.

As the data is understood, particularly metrics you can start to think about how to identify what normal is, what is abnormal, or an outlier. That then leads to developing Service Level Objectives (SLOs), such as an acceptable level of latency in the solution or how many errors can be tolerated.

The book isn’t all theory. The ideas are illustrated with small Go applications, which are instrumented, and the generated metrics, traces, and logs. Rather than using a technology such as Fluentd or Fluent Bit, Michael starts by keeping things simple and directly connecting the gathering of the metrics into tools such as Prometheus, Zipkin, Jaeger, and so on. In later chapters, the complexity of agents, aggregators, and collectors is addressed. Then, the choices and considerations for different backend solutions from cloud vendor-provided services such as OpenSearch, ElasticSearch, Splunk, Instana and so on. Then, the front-end visualization of the data is explored with tools such as Grafana, Kibana, cloud-provided tools, and so on.

As the book progresses, the chapters drill down into more detail, such as the differences and approaches for measuring containerized solutions vs. serverless implementations such as Lambda and the kinds of measures you may want. The book isn’t tied to technologies typically associated with modern Cloud Native solutions, but more traditional things like relational databases are taken into account.

The closing chapters address questions such as how to address alerting, incident management, and implementing SLOs. How to use these techniques and tools can help inform the development processes, not just production.

So I would recommend the book, if you’re trying to understand Observability (regardless of a cloud solution or not). If you’re trying to advance from the more traditional logging to a fuller capability, then this book is a great guide, showing what, why, and how to evaluate the value of doing so.

To come back to my opening question. The books have small points of overlap, but this is no bad thing, as it helps show how the different viewpoints intersect. I would actually say that the Observability in Action shows how the wider landscape fits together, the underlying value propositions that can help make the case for implementing a full observability solution. Then, Logging in Action and the new book, Fluent Bit with Kubernetes, give you some of the common context, and we drill into the details of how and what can be done with Fluent Bit and Fluentd. All Manning needs now is content to deep dive into Prometheus, Grafana, Jaeger, and OpenSearch to provide an end-to-end coverage of first principles to the art of the possible in Observability.

I also have to thank Michael for pointing his readers and sections of Logging in Action that directly relate and provide further depth into an area.

New Article for SE Daily…

27 Thursday Apr 2023

Posted by mp3monster in ExternalWebPublications, Fluentd, General, Technology

≈ Leave a comment

Tags

Cloud, external publications, Hybrid, monitoring, multicloud, observability, SE Daily, Software Engineering Daily

We’ve just had a new article published for Software Engineering Daily which looks at monitoring in multi-cloud and hybrid use cases and highlights some strategies that can help support the single pane of glass by exploiting features in tools such as Fluentd and Fluentbit that perhaps aren’t fully appreciated. Check it out …

Challenges of Multi-Cloud and Hybrid Monitoring

Mastering FluentD configuration syntax

19 Thursday Sep 2019

Posted by mp3monster in Cloud, Fluentd, General, Technology

≈ Leave a comment

Tags

configuration, Fluentbit, Fluentd, google, GPC, monitoring, observability, OKE, slack

Getting to grips with FluentD configuration which describes how to handle logging event(s) it has to process can be a little odd (at least in my opinion) until you appreciate a couple of foundation points, at which point things start to click, and then you’ll find it pretty easy to understand.

It would be hugely helpful if the online documentation provided some of the points I’ll highlight upfront rather than throwing you into a simple example, which tells you about the configuration but doesn’t elaborate as deeply as may be worthwhile. Of course, that viewpoint may be born from the fact I have reviewed so many books I’ve come to expect things a certain way.

But before I highlight what I think are the key points of understanding, let me make the case getting to grips with FluentD.

Why master FluentD?

FluentD’s purpose is to allow you to take log events from many resources and filter, transform and route logging events to the necessary endpoints. Whilst is forms part of a standard Kubernetes deployment (such as that provided by Oracle and Azure for example) it can also support monolithic environments just as easily with connections working with common log formats and frameworks. You could view it as effectively a lightweight (particularly if you use FluentBit variant which is effectively a pared-back implementation) middleware for logging.

If this isn’t sufficient to convince you, if Google searches are a reflection of adoption, then my previous post reflecting upon Observability -London Oracle Developer Meetup shows a plot reflecting the steady growth. This is before taking into account that a number of cloud vendors have wrapped Fluentd/fluentbit into their wider capabilities such as Google (see here).

Not only can you see it as middleware for logging it can also have custom processes and adapters built through the use of Ruby Gems, making it very extensible.

FluentD

Remember these points

and mastering the config should be a lot easier…

Continue reading →

Observability -London Oracle Developer Meetup

10 Tuesday Sep 2019

Posted by mp3monster in Dev Meetup, Fluentd, General, Technology

≈ 1 Comment

Tags

Fluentd, Istio, Jaeger, Kiali, logging, meetup, monitoring, observability, OKE, OpenTracing, Oracle, Tracing

meetup-monitoring Last night was the London Oracle Developer Meetup’s sessions around observability. Andrei Cioaca with a focus on the use of OpenTracing as provided by Jaeger, in a standard Kubernetes deployment with Istio – realized with Oracle Kubernetes Engine (OKE). This was followed by my session on another pillar using logging via FluentD. Also incorporated into standard Kubernetes, but also able to support traditional monolithic use cases.

@andreicioaca starts talking about Oracle #Kubernetes #OKE #istio and #OpenTracing at the #OracleDeveloperMeetup #London @PaaSCommunity pic.twitter.com/HISzpmxjaN

— Phil Wilkins (@PhilConsultant) September 9, 2019

Andrei provided a great overview of the 3 pillars and the strengths and weaknesses of the different pillars. With the basics covered Andrei then dove into the configuration and execution of Istio combined with Jaeger and the corresponding insights available. including a look at the kinds of visual insights that Jaeger and Kiali provide. Some probing conversations followed about the relationship to Spring Cloud Sleuth, Open Zipkin and the OpenTracing as a concept more generally.

Andrei’s presentation material can be found in his GitHub repository here.

Google Analytics on Search Terms

My session followed a pizza break, as there was a delay in its arrival. With everybody having chatted over pizza about OpenTracing, we picked up on FluentD and the Logging aspect to Observability. FluentD, as an open-source project has been growing steadily, and actually baked into several Log Analytics products and services – as the above analytics from Google shows.

The presentation looked at the growing challenges of modern software in terms of making sense of logging. We explored the capabilities of FluentD before drilling into real-world use cases and potential deployment models.

As you’ll see from the slides we ran a couple of demos. The configuration for the demos can be found at https://github.com/mp3monster/fluentd-demos along with an example payload.

The next meetup we have organized is around Blockchain, all the details can be found at https://www.meetup.com/Oracle-Developer-Meetup-London/events/264661742/.

Other related info …