Observing the Observer (Fluent Bit monitoring)

25 Thursday Apr 2024

Posted by mp3monster in Fluentbit, General, Technology

Tags

development, FluentBit, Grafana, Jaeger, logs, metrics, OpenSearch, Prometheus, Traces

In the Fluent Bit book I touch upon the point that we should be observing the observer. After all, if we don’t monitor our observability stack, then we’ll be operating blind and may never know until things go catastrophically wrong, and we’re getting complaints that production business solutions are down. One of the peer review comments was it would be really good to have a visual representation in the book. While I’d love to incorporate such diagrams, for them to be readable, they do use up a lot of space on the printed page, and very long chapters can also put some readers off. So, given the point wasn’t a key theme, we simply couldn’t incorporate the diagram.

But the suggestion is a good one. So we’ve created the visual representation here.

Annotated diagram showing how Fluent Bit could be used to monitor an open-source observation stack.

The diagram shows how we can monitor our observation tech stack so we don’t become ‘blind’ without knowing about it. The Outflow from the right of the diagram would send a signal to a notification service, which could be as simple as email or as user-friendly as Slack. We’ve indicated which kinds of metrics, logs, and traces could offer the most value from our observation tech stack.

If we’re running everything within a Kubernetes cluster, it would be easy to say we don’t need such a sophisticated setup as we can use Kubernetes liveness probes if the containers are well configured. While it is true if one of our services starts to fail, a liveness check should pick it up and recycle the container. But such probes only worry about the HTTP response code, not the cause. If we don’t monitor and capture more information we’ll never understand the problem. At worst we could end up seeing Kubernetes starting and then killing our containers in a vicious cycle and struggling to resolve the cause. So collecting the logs and metrics remains just as important.

How to Publish Fluent Bit Metrics and Logs

To publish Fluent Bit’s metrics to Prometheus, we need to configure the fluentbit-metrics input plugin (it does sound odd as an input, but there are reasons that become clearer in the book). We then route the output that supports using Fluent Bit as a Prometheus node exporter or makes use of the remote write API.

The log output for Fluent Bit can be configured via the command line or in the SERVICE blog (using the attributes log_file and log_level in the configuration file. Today this is setting the log threshold and identifying the log file. We can then, of course, configure a tail input plugin against the file if we want to send the logs to OpenSearch. We can also set plugin-specific logging thresholds as overrides to the Fluent Bit wide setting in the SERVICE block of configuration.

Configuring the other monitoring tools

Grafana‘s configuration will allow it to publish Prometheus scrapable metrics and Traces that are OTLP compliant can be found documented here.
Prometheus provides metrics on itself (details here) and logging controls as part of its command line and generates logfmt or JSON logs, details here.
OpenSearch‘s logs can be accessed as documented here. The Logs are created with Log4j2, which means out of the box, it will be easy to parse them. Configuring the output of slow query reports does need to be switched on. OpenSearch also illustrates a pre OpenTelemetry/OpenMetrics approach to sharing internal metrics by writing them as logs. However, there are ways to convert such log events to OTLP Metrics with Fluent Bit.
Jaeger provides metrics endpoints that are Prometheus-compatible, along with JSON-based logs, and are documented here. There is some support for tracing.

Fluent Bit the engine to power ChatOps – update

17 Sunday Mar 2024

Posted by mp3monster in chatbots, Fluentbit, General

≈ 1 Comment

Tags

chatops, Cloud, conference, demo, development, FluentBit, video

The other month, I described a presentation and demo (Fluent Bit – Powering Chat Ops) we’ll be doing for the Cloud Native Rejekts conference, which is the precursor event to KubeCon in Paris this week. Since that post, we’re excited to say that, with Patrick Stephens’s contributions from Chronosphere, the demo is now in the Fluent GitHub repo. It has been nicely packaged with a Docker Compose, so everything runs in a couple of containers.

In addition, if you want to see the presentation and hear us discuss the solution and explain how it works, we recorded part of the presentation dry run, which can be heard here (Demo) and here (Code overview).

I couldn’t be in Paris in person, so Patrick took the job of presenting in Paris, we tried to enable my remote participation but had audio issues. Hopefully, you’ll see the recording of Pat’s physical presentation here. But I did manage to collaborate in the demo:

This means that the original repo I mentioned can be viewed as a beta or upstream version (it’s cluttered with some generated code from Helidon, which we will eventually get around to exploiting and making the utility a native binary executable).

Fluent Bit with Kubernetes book update

05 Tuesday Mar 2024

Posted by mp3monster in Books, Fluentbit, manning, Technology

≈ Leave a comment

Tags

book, development, FluentBit, review

A quick update on the book – very early this morning or late last night (depending on your perspective), we sent our development editor the final chapter of the Fluent Bit with Kubernetes book. There is still a way to go before we’re completed (with multiple reviews to happen, appropriate edits to be made, copy editing, etc. Still, it is an important milestone from an author’s perspective.

For the keen readers who have signed up for the MEAP (Manning Early Access Programme) of the book, I can confirm that the editorial team (preparation for eBook and website formatting, checking the edits to address the Technical Editor and Development Editor haven’t introduced any obvious issues) are working on the preparation of Chapter 7 – so that should be available soon. When this chapter is available, the content covering all the foundational aspects of Fluent Bit will be available. The remaining chapters reflect the advanced features.

Cloud Observability in Action – Book Review

04 Thursday Jan 2024

Posted by mp3monster in Book Reviews, Books, General, manning

≈ Leave a comment

Tags

book, development, FluentBit, Fluentd, manning, Michael Hausenblas, o11y, observability, OpenTelemetry, Prometheus, review

With the Christmas holidays happening, things slowed down enough to sit and catch up on some reading – which included reading Cloud Observability in Action by Michael Hausenblas from Manning. You could ask – why would I read a book about a domain you’ve written about (Logging In Action with Fluentd) and have an active book in development (Fluent Bit with Kubernetes)? The truth is, it’s good to see what others are saying on the subject, not to mention it is worth confirming I’m not overlapping/duplicating content. So what did I find?

Cloud Observability in Action by Michael Hausenblas

Cloud Observability In Action has been an easygoing and enjoyable read. Tech books can sometimes get a bit heavy going or dry, not the case here. Firstly, Michael went back to first principles, making the difference between Observability and monitoring – something that often gets muddied (and I’ve been guilty of this, as the latter is a subset of the former). Observability doesn’t roll off the tongue as smoothly as monitoring (although I rather like the trend of using O11y). This distinction, while helpful, particularly if you’re still finding your feet in this space, is good. What is more important is stepping back and asking what should we be observing and why we need to observe it. Plus, one of my pet points when presenting on the subject – we all have different observability needs – as a developer, an ops person, security, or auditors.

Next is Michael’s interesting take on how much O11y code is enough. Historically, I’ve taken the perspective – that enough is a factor of code complexity. More complex code – warrants more O11y or logging as this is where bugs are most likely to manifest themselves; secondly, I’ve looked at transaction and service boundaries. The problem is this approach can sometimes generate chatty code. I’ve certainly had to deal with chatty apps, and had to filter out the wheat from the chaff. So Michael’s approach of cost/benefit and measuring this using his B2I ratio (how much code is addressing the business problems over how much is instrumentation) was a really fresh perspective and presented in a very practical manner, with warnings about using such a measure too rigidly. It’s a really good perspective as well if you’re working on hyperscaling solutions where a couple of percentage point improvements can save tens of thousands of dollars. Pretty good going, and we’re only a couple of chapters into the book.

The book gets into the underlying ideas and concepts that inform OpenTelemetry, such as traces and spans, metrics, and how these relate to Observability. Some of the classic mistakes are called out, such as dimensioning metrics with high cardinality and why this will present real headaches for you.

As the data is understood, particularly metrics you can start to think about how to identify what normal is, what is abnormal, or an outlier. That then leads to developing Service Level Objectives (SLOs), such as an acceptable level of latency in the solution or how many errors can be tolerated.

The book isn’t all theory. The ideas are illustrated with small Go applications, which are instrumented, and the generated metrics, traces, and logs. Rather than using a technology such as Fluentd or Fluent Bit, Michael starts by keeping things simple and directly connecting the gathering of the metrics into tools such as Prometheus, Zipkin, Jaeger, and so on. In later chapters, the complexity of agents, aggregators, and collectors is addressed. Then, the choices and considerations for different backend solutions from cloud vendor-provided services such as OpenSearch, ElasticSearch, Splunk, Instana and so on. Then, the front-end visualization of the data is explored with tools such as Grafana, Kibana, cloud-provided tools, and so on.

As the book progresses, the chapters drill down into more detail, such as the differences and approaches for measuring containerized solutions vs. serverless implementations such as Lambda and the kinds of measures you may want. The book isn’t tied to technologies typically associated with modern Cloud Native solutions, but more traditional things like relational databases are taken into account.

The closing chapters address questions such as how to address alerting, incident management, and implementing SLOs. How to use these techniques and tools can help inform the development processes, not just production.

So I would recommend the book, if you’re trying to understand Observability (regardless of a cloud solution or not). If you’re trying to advance from the more traditional logging to a fuller capability, then this book is a great guide, showing what, why, and how to evaluate the value of doing so.

To come back to my opening question. The books have small points of overlap, but this is no bad thing, as it helps show how the different viewpoints intersect. I would actually say that the Observability in Action shows how the wider landscape fits together, the underlying value propositions that can help make the case for implementing a full observability solution. Then, Logging in Action and the new book, Fluent Bit with Kubernetes, give you some of the common context, and we drill into the details of how and what can be done with Fluent Bit and Fluentd. All Manning needs now is content to deep dive into Prometheus, Grafana, Jaeger, and OpenSearch to provide an end-to-end coverage of first principles to the art of the possible in Observability.

I also have to thank Michael for pointing his readers and sections of Logging in Action that directly relate and provide further depth into an area.

Speeding Ruby

30 Monday Oct 2023

Posted by mp3monster in development, Fluentbit, Fluentd, General, languages, Technology

≈ Leave a comment

Tags

Cloud, development, FluentBit, Fluentd, Ruby, Ruvy, Shopify

Development trends have shown a shift towards precompiled languages like Go and Rust away from interpreted or Just-In-Time (JIT) compiled languages like Java and Ruby as it removes the startup time of the language virtual machine and the JIT compiler as well as a smaller memory footprint. All desirable features when you’re scaling containerized solutions and percentage point savings can really add up.

Oracle has been leading the way with its work on GraalVM for some years now, and as a result, not only can GraalVM be used to produce native binary images from Java code, GraalVM also supports TuffleRuby and GraalPy, among others. As TruffleRuby is an open-source project, Oracle isn’t the only vendor contributing to it, work effort has also come from Shopify.

Helping Ruby move forward isn’t new for the Shopify engineering team, and part of that investment is that they have just announced the open-sourcing of a toolchain called Ruvy. Ruvy takes Ruby and creates a WebAssembly (WASM) from it the code. This builds on the existing project ruby.wasm. In doing so they’ve addressed the Ruby startup overhead of the language VM we mentioned. They have also simplified the process of deployment, eliminating the need for Web Assembly System Interface (WASI) arguments, and overcome constraints of class loading by reading files by having the code bundled within the assembly and then accessing the content using WASI-VFS, a simple virtual file system.

The published benchmarks show a massive performance boost in the process of executing where the Ruby code needs to be executed by the packaged JIT. For me, this is interesting as one of the related cloud-native trends is the shift from Fluentd to Fluent Bit. Fluentd was built with Ruby and has a huge portfolio of third-party extensions. But Fluent Bit is built using C to get those performance gains previously described. But it does support plugins through WASM. This raises an interesting question can we take existing Ruby plugins and wrap them so the required interfacing works – which should be minimal and more likely to be impacted by the fact Fluent Bit v2 has refined the internal data structure that was common to both Fluentd and Fluent Bit to allow Fluent Bit to more easily engaged with OpenTelemetry.

If the extra bit of wrapping code isn’t complex, then applying Ruvy should mean the core plugin can then work with Fluent Bit. If this can be templated, then Fluent Bit is going to make a big leap forward with the number of available plugins.

Clickbait headlines on open-source project maintenance

29 Sunday Oct 2023

Posted by mp3monster in development, General, Technology

≈ Leave a comment

Tags

development, open source, Security

Infoworld published a rather clickbait incendiary new item the other week ‘few open source projects actively maintained’. Personally, I find these statements a little frustrating, as it would be easy for the less informed to assume that adopting open-source software is dangerous. There are several missed points here:

How well and frequently are close source solutions being maintained, and does that stop businesses from using end-of-life products? There is big business to be had in offering support to end-of-life solutions. Just look at companies like Rimini Street. Such organizations aren’t going to change software unless there is a major issue.
Not all open-source software is intended to be undergoing continuous maintenance? Shocking until you consider that open-source projects will remain open and available even when they have been declared end-of-life. Why? One of the things about open-source is you don’t know who is using the code, and suddenly pulling the code because the originator has decided they can no longer maintain their investment could put others in a difficult position. So, the right thing is to leave the source available and allow people to fork it so they can continue maintaining their own version of it or until they’ve migrated away. That way, the originator is not impacted by changes.
Next up, not all open-source projects need continued maintenance; many repositories exist to provide demo and sample solutions – so that developers can see how to use a product or service. These repositories shouldn’t need to change often. Frequent change could easily be a sign of an unstable product or service. These solutions may not be the most secure, as you don’t want to complicate the illustration with all the checks and balances that should be considered. Look at it this way: when we start learning a new language or tool, we start with the classic Hello World – which today means pointing your browser at a URL and seeing the words appear on the page. Do we insist that the initial implementation be secure? No, because it distracts from the basic message. For example, in my GitHub repository, I have multiple public repositories with Apache2 licenses attached to them – i.e., open-source. A number of them support the books I’ve written – they aren’t going to change – in fact, change would be a bad thing unless the associated book is corrected (this repo, for example).
When it comes to security vulnerabilities. This needs to be viewed with some intelligence. For several reasons:
- As mentioned, our demo examples are unlikely to be patched with the latest versions of dependencies all the time. The point is to see how the code works. Unless the demo relates directly to something that has to be patched and that changes the demo itself. I don’t think it is unreasonable to expect developers to apply some intelligence to ensure dependencies (and therefore the risk of known vulnerabilities) are checked rather than blindly cutting and pasting. The majority of the time, such content will be published with a minimum version number, not a maximum.
- Sometimes, a security vulnerability isn’t an issue. For example, I rarely run vulnerability checks on my LogSimulator. Not because I have a cavalier attitude to security but because I don’t expect it to ever be near a production environment, and the data flowing through the tool will be known and controlled by the user in advance of any activity. Secondly, it shouldn’t be using sensitive data, and thirdly, if there was any malicious intent intended, then I’d be more concerned about how secure its data source and configuration is. The tool is a command-line solution. That said, I still apply development practices that minimize potential exploitation.

Don’t get me wrong, there are risks with all software – closed and open-source, whether it is maintained or has security vulnerabilities. A software development team has a responsibility to make informed, risk-aware selections of software (open or closed source). If you have the means to check for risks, then they are best used. It is worth not only scanning our own code but also considering whether the dependencies we use have been scanned if appropriate (e.g. used in production). Utilizing innovations like SBOM, and exercising routine checks and reviews can also help.

While I can’t prove it, I suspect there are more risks being carried by organizations adopting a library that was considered sufficiently secure when downloaded, but subsequent vulnerabilities have been found, or selected mitigations to risks have been eroded over time.

Visualizing A Career Path

02 Wednesday Aug 2023

Posted by mp3monster in General

≈ 2 Comments

Tags

career, development, download, DZone, external publications

I recently wrote a piece for DZone about visualizing career paths. As an enabler for people to make use of the diagrams to help the visualization, we’ve made the original PowerPoint diagrams used available here:

SimpleOrg Download

Update

We’re excited to hear we’ve had another DZone article selected to be used on the homepage …

Simplifying the escaping of JSON strings

08 Thursday Jun 2023

Posted by mp3monster in development, General, Technology

≈ Leave a comment

Tags

CLI, development, JQ, JSON, software, Technology, utility

when you’re testing apps, it is pretty common to want to send JSON via CURL to a local endpoint. The problem is that this usually means that the string you provide curl needs to have characters escaped, such as quote marks. By hand, this can be irritating to sort out, particularly if you’re using an IDE to make sure the JSON is correct. I’d concluded this is hardly a new problem; someone must have produced a nice little multiple-platform command line utility that can do it for you. The result was a bit more surprising.

There are plenty of online utils that solve it, but if you’re working with data, you don’t want to publicly share (or the fiddling around with copy-pasting to your browser). Nothing wrong with these tools, but you can’t script them without resorting to RPA (Robotic Process Automation) either. Here are a couple of services I found that are straightforward, and when I’ve tried them, not plagued by annoying ads.

But finding command line tools, well, finding an answer, has proven a bit more challenging. For removing escaped characters, you could use jq, but we actually want to go the other way to use curl with JSON that has been escaped. I have come across conversations covering the use of bash (making use of awk and sed. Plus, details about how the manipulation could be done in various languages (so you could code your own solution if so inclined. Coding is unlikely to take much effort, but testing permutations is going to demand effort).

The one solution I have found that meant I could escape (or reverse) JSON locally is a plugin for VS Code called appropriately JSON-escaper, which does what is needed in a nice and clean manner. All credit to Joshua Poehls for the tool.

The solution JSON-escaper built on top of a more generic JavaScript utility which addresses escaping special characters which can be found here.

IAM and IDCS do more than support AuthZ

01 Monday May 2023

Posted by mp3monster in development, General, Oracle, Technology

≈ Leave a comment

Tags

data, development, OCI, Oracle, SCIM, Security, software

We could solve this with custom integrations, or we can exploit an IETF standard called SCIM (System for Cross-domain Identity Management). The beauty of SCIM is that it brings a level of standardization to the mechanics of sharing personal identity information, addressing the fact that this data goes through a life cycle.

While Oracle’s IDCS and IAM support identity management for authentication and authorization for OCI and SaaS such as HCM, SCM, and so on. Most software ecosystems need more than that. If you have personalized custom applications or COTS or non-Oracle SaaS that need more than just authentication and need some of your people’s data needs to be replicated.

The lifecycle would include:

Creation of users.
Users move in and out of groups as their roles and responsibilities change.
User details change, reflecting life events such as changing names.
Users leave as they’re no longer employees, deleted their account for the service, or exercise their right to be forgotten.

It means any SCIM-compliant application can be connected to IDCS or IAM, and they’ll receive the relevant changes. Not only does it standardize the process of integrating it helps handle compliance needs such as ensuring data is correct in other applications, that data is not retained any longer than is needed (removal in IDCS can trigger the removal elsewhere through the SCIM interface). In effect we have the opportunity to achieve master data management around PII.

SCIM works through the use of standardized RESTful APIs. The payloads have a standardized set of definitions which allows for customized extension as well. The customization is a lot like how LDAP can accommodate additional data.

The value of SCIM is such that there are independent service providers who support and aid the configuration and management of SCIM to enable other applications.

Securing such data flows

As this is flowing data that is by its nature very sensitive, we need to maximize security. Risks that we should consider:

Malicious intent that results in the introduction of a fake SCIM client to egress data
Use of the SCIM interface to ingress the poisoning of data (use of SCIM means that poisoned data could then propagate to all the identity-connected systems).
Identity hijacking – manipulating an identity to gain further access.

There are several things that can be done to help secure the SCIM interfaces. This can include the use of an API Gateway to validate details such as the identity of the client and where the request originated from. We can look at the payload and validate it against the SCIM schema using an OCI Function.

We can block the use of operations by preventing the use of certain HTTP verbs and/or URLs for particular or all origins.

Is The 12 Factor App right about Logging?

05 Wednesday Oct 2022

Posted by mp3monster in development, Fluentd, General, Technology

≈ Leave a comment

Tags

12 Factor, 12 Factor App, conference, development, Grafana, JAX, logging, London, OpenSearch, Prometheus, Splunk, stdout

The 12 Factor App definition is now ten years old. In the world of software that is a long time. So perhaps it’s time to revisit and review what it says. As I have spent a lot of time around Logging – I’ve focussed on Factor 11 – Logging.

I have been fortunate enough to present at the hybrid JAX London conference on this subject. It was great to get out and see people at a conference rather than just with a screen and a chat console of online-only events.

You can see my presentation here:

Continue reading →