All things related to Fluentd’s grown up younger sibling Fluent Bit

Fluent Bit Articles on TheNewStack – a Biproduct of a book sponsor?

12 Thursday Jun 2025

Posted by mp3monster in Books, Fluentbit, Fluentd, General, manning, Technology

Tags

author, book, books, Fluent Bit, manning, reading, The New Stack, TNS, writing

Like many developers and architects, I track the news feeds from websites such as The New Stack and InfoQ. I’ve even submitted articles to some of these sites and saw them published. However, in the last week, something rather odd occurred: articles in The New Stack (TNS) appeared, attributed to me, although I had no involvement in the publication process with TNS; yet, the content is definitely mine. So what appears to be happening?

To help answer this, let me provide a little backstory. Back in October and November last year, we completed the publication of my book about Fluent Bit (called Logs and Telemetry with a working title of Fluent Bit with Kubernetes), a follow-up from Logging In Action (which covered Fluent Bit’s older sibling, Fluentd). During the process of writing these books, I have had the opportunity to get to know members of the team behind these CNCF projects and consider them as engineering friends. Through several changes, the core team has primarily come to work for Chronosphere. To cut a long story short, I connected the Fluent Bit team to Manning, and they sponsored my book (giving them the privilege of giving away a certain number of copies of the book, cover branding, and so on).

It appears that, as part of working with Manning’s marketing team, authors are invited to submit articles and agree to have them published on Manning’s website. Upon closer examination, the articles appear to have been sponsored by Chronosphere, with an apparent reference to Manning publications. So somewhere among the marketing and sales teams, an agreement has been made, and content has been reused. Sadly, no one thought to tell the author.

I don’t in principle have an issue with this, after all, I wrote the book, and blog on these subjects because I believe enabling an understanding of technologies like Fluent Bit is valuable and my way of contributing to the IT community (Yes, I do see a little bit of money from sales, but the money-to-time and effort ratio works out to be less than minimum wage).

The most frustrating bit of all of this is that one of the articles links to a book I’ve not been involved with, and the authors of Effective Platform Engineering aren’t being properly credited. It turns out that Chronosphere is sponsoring Effective Platform Engineering (Manning’s page for this is here).

26th June Update – another article

The New Stack published another sponsored article – What’s Driving Fluent Bit Adoption? This one does feel like it has been ‘tweaked’.

2nd July Update – and more …

Another bit of the book on New Stack – What Is Fluent Bit?

8th July Update – and more …

Another bit of the book on New Stack – What Are the Differences Between OTel, Fluent Bit and Fluentd?

Fluent Bit v4 the big news

17 Thursday Apr 2025

Posted by mp3monster in Books, Fluentbit, General, Technology

≈ Leave a comment

Tags

book, CNCF, development, eBPF, Fluent Bit, logs, OpenTelemetry, processors, sampling, Security, Trace, zig

With the announcement of Fluent Bit v4 at Kubecon Europe, we thought it worthwhile to take a look at what it means, aside from celebrating 10 years of Fluent Bit.

Firstly, normally using Semantic Versioning would suggest likely breaking (or incompatible changes to use SemVer wording) changes. The good news is that, like all the previous version changes for Fluent Bit the numbering change only reflects the arrival of major new features.

This is good news for me as the author of Logs and Telemetry with Fluent Bit, as it means the book remains entirely relevant. The book obviously won’t address the latest features, but we’ll try to cover those here as supplemental content.

Let’s reflect upon the new features, their benefits, and their implications.

New features in Processors allowing:
- Conditionality to be included.
- Trace sampling.
More flexible support for TLS (v1.3, choosing ciphers to enable)
New language for custom plugins in the form of Zig

Security Improvements

While security for many is not something that will get most developers excited about, there are things here that will make a CSO (Chief Security Officer) smile. Any developer who knows implementing security behaviors because it is a good thing, rather than because you have been told to do it, makes a CSO happy, puts them in a good place, to be given some more lianency when there is a need to do something that would get the CSO hot under the collar. Given this, we can now win those points with CSOs by using new Fluent Bit configurations that control TLS versions (1.1 – 1.3) and ciphers to support in use.

But even more fundamental than that are the improvements around basic credentials management. Historically, credentials and tokens had to be explicit in a configuration file or referenced back to an environment variable. Now, such values can come from a file, and as a result, there is no explicitness in the configuration. File security can manage access and visibility of such details. This will also make credentials rotation a lot easier to implement.

Processor Improvements

The processor improvements are probably the most exciting changes. Processors allow us to introduce additional activities within the pipeline as part of a process such as an input, rather than requiring additional buffer fetch and return which we see in standard plugin operations.

Of course, the downside is that if the processor introduces a lot of effort, we can create unexpected problems, such as back pressure, for example, as a result of a processor working hard on an input.

The other factor that extending processors bring is that they are not supported in classic format, meaning that to exploit such formats, you do need to define your configuration using YAML. The only thing I’m not a fan of, is that the configuration for these features does make me think I’m having to read algorithms expressed with Backus Naur form (BNF).

Trace Sampling

Firstly, the processors supporting OpenTelemetry Tracing can now sample. This is probably Fluent Bit’s only weakness in the Open Telemetry domain until now. Sampling is essential here as traces can become significant as you track application executions through many spans. When combined with each new transaction creating a new trace, traces can become voluminous. To control this explosion on telemetry data, we want to sample traces, collecting a percentage of typical traces (performance, latency, no errors, etc) and the outliers, where tracing will show us where a process is suffering, e.g., an end-to-end process is slowing because of a bottleneck. We can dictate how the sampling is applied based on values of existing attributes, the trace status, status codes, latencies, the number of spans, etc.

Conditionality in Processors

Conditionality makes it easier to respond to aspects of logs. For example, only when the logging payload has several attributes with specific values do we want to filter the event out for more attention. For example, an application reporting that it is starting up, and logs are classified as representing an error – then we may want to add a tag to the event so it can be easily filtered and routed to the escalation process.

Plugins with Zig

The enablement of Zig for plugin development (input, output and filters) is strictly an experimental feature. The contributors are confident they have covered all the typical use cases. But the innate flexibility supporting a language always represents potential edge cases never considered and may require some additional work to address.

Let’s be honest: Zig isn’t a well-known language. So, let’s start by looking briefly at it and why the community has adopted it for custom plugin development as an alternative to the existing options with Lua and WASM.

So Zig has a number of characteristics that align with the Fluent Bit ethos better than Lua and WASM, specifically:

It is a compiled rather than interpreted language, meaning that we reduce the runtime overheads of an interpreter or JIT compiler such as Lua and the proxy layer of WASM. This aligns to be very fast/minimal compute overhead to do its job, – ideal for IoT and minimising the cost of side-care container deployments.
The footprint for the Zig executable is very, very small—smaller than even a C-generated binary! As with the previous point, this lends itself to common Fluent Bit deployments.
The language definition is formally defined, compact, and freely available. This means you should be able to take a tool chain from anyone, and it is easy for specialist chip vendors to provide compilers.
Based on those who have tried, cross-compiling is far easier to deal with than working with GCC, MSVC, etc. Making it a lot easier to develop with the benefits we want from Go. Unlike Go – to connect to the C binary of Fluent Bit doesn’t require the use of a translation layer.

One of Zig’s characteristics that differs from C is its stronger typing and its approach of, rather than prescribing how edge cases are handled, e.g., null pointers, working to prevent you from entering those conditions.

Zig has been around for a few years (the first pre-release was in 2017, and the first non-pre-release was in August 2023). This is long enough for the supporting tooling to be pretty well fleshed out with package management, important building blocks such as the HTTP server, etc.

While asking a large enterprise with more conservative approaches to development (particularly when IT is seen as an overhead, and source of risk rather than a differentiator/revenue generator) to consider adopting Zig could be challenging compared to adopting, say Go. The different potential values here, make for some interesting potential.

Not Only, but Also

While we have made some significant advancements, each Fluent Bit release brings a variety of improvements in its plugins. For example, working with it with eBPF, HTTP output supports more compression techniques, such as Snappy and ZSTD, and Exit having a configurable delay.

The Plus version of library dependencies is being updated to exploit new capabilities or ensure Fluent Bit isn’t using libraries with vulnerabilities.

Additional resources

- Chronosphere announcement

- Announcement YouTube video

- book
- My links to technical resources – we’ve extended to include Zig related resources

Fluentd Labels and Fluent Bit

31 Monday Mar 2025

Posted by mp3monster in Fluentbit, Fluentd, General, Technology

≈ 1 Comment

Tags

configuration, Fluent Bit, Fluentd, labels, migrating, migration, regex, relabelling, tag, tags

Recently, a question came up on the Fluent Bit Slack group about how the Fluentd Label feature and the associated relabel plugin map from Fluentd to Fluent Bit.

Fluentd Labels are a way to implement event routing, and we’ll look more closely at them in a moment. Both Fluent Bit and Fluentd support the control of which plugins respond to an event by using tags, which are based on a per-plugin basis.

To address this, let’s take a moment to see what Fluentd’s labels do.

What do Fluentd’s Labels Do?

Fluentd introduced the concept of labels, where one or more plugins could be grouped, creating a pipeline of plugins. Events can also be labeled, effectively putting them into a pipeline that can be defined with that label.

As a result, we would see something like:

<input>
  Label myLabel
</input>

<input>
  Label myOtherLabel
</input>

<label myLabel>
    <filter>
         -- do something to the log event
    </filter>

    <filter>
         -- do something to the log event
    </filter>

    <output>
    </output>
</label>

<label myOtherLabel>
    <filter>
         -- do something to the log event
    </filter>

    <output>
    </output>
</label>

<output tagName>
</output>

Fluentd’s labelling essentially tries to simplify the routing of events within a Fluentd deployment, particularly when multiple plugins are needed, aka a pipeline.

Fluent Bit’s routing

Fluent Bit doesn’t support the concept of labels. While both support tags can include wildcards within the tag name, Fluent Bit has an extension that adds power to the routing using tags. Rather than introducing an utterly separate routing control, it extended how tags can be used by allowing the matching to be achieved through regular expressions, which is far more flexible. This could look like (using classic format):

[INPUT]
    Name plugin-name
    Tag  myTag

[INPUT]
    Name another-plugin-name
    Tag  myOtherTag

[INPUT]
    Name alternate-plugin-name
    Tag  myAlternateTag

[OUTPUT]
    Name   output-plugin-name
    Match  myTag

[OUTPUT]
    Name   another-output-plugin
    Match_regex  my[Other|Alternate]Tag

The only downsides to processing regular expressions this way are the potentially greater computational effort (depending on the sophistication of the regular expression) and the use of the match on every plugin.

Migration options

The original question was prompted by the idea of migrating from Fluentd to Fluent Bit. When considering this, Labels don’t have a natural like-for-like transition path.

There are several options …

Refactor to use a structured tag approach
Adopt REGEX matches against tags
Separate Fluent Bit deployments

Refactor

Often, tags originate from a characteristic or direct attribute of the event message (payload). Instead, treat a tag purely as a routing mechanism, design a hierarchical routing strategy based on domains, and then use your tags for just this purpose. Aligning tags to domains rather than technical characteristics will help.

This creates the opportunity to progressively refactor out the need for labels. this will then make the transition through

REGEX

An alternative to this is to adopt regular expressions to select appropriate tags regardless of their structure, naming convention, or use of case. While this is very flexible, the expressions can be harder to maintain, and if some tags are driven by event data, there is an element of risk (although likely small) of an unexpected event being caught and processed by a plugin as it unwittingly matches a regular expression.

Multiple Fluent Bit Instances

Fluent Bit’s footprint is very small, notably smaller than that of Fluentd, as no runtime components like Ruby are involved. This means we could deploy multiple instances, with each instance acting as an implicit pipeline for events to be processed. The downside is that the equivalent of relabelling is more involved, as you’ll have to have a plugin to explicitly redirect the event to another instance. We also need to ensure that the events start with the correct Fluent Bit node.

Conclusion

When we try to achieve the same behaviour in two differing products that do have feature differences, trying to force a new product to produce exactly the same behaviour can result in decisions that can feel like compromises. In these situations, we tend to forget that we may have made trade-off decisions that led to the use of a feature in the first place.

When we find ourselves in such situations, while it may feel more costly, it may be worth reflecting on whether it is more cost-effective to return to the original problem, design the solution based on the new product’s features, and maximize the benefit.

Fluentd to Fluent Bit Portability a possibility?

This may start to sound like Fluentd to Fluent Bit portability is impossible and requires us to build our monitoring stack from scratch. Based on my experiences, these tools are siblings, not direct descendants, so there will be differences. In the work on building a migration tool (a project I’ve not had an opportunity to conclude), I’d suggest there is an 85/15 match, and migration is achievable, and the bulk of such a task can be automated. But certainly not all, and just seeking a push-button cutover means you’ll miss the opportunity to take advantage of the newer features.

Do you need a migration?

Fluentd may not be on the bleeding edge feature-wise now, but if your existing systems aren’t evolving and demanding or able to benefit from switching to Fluent Bit, then why force the migration? Let your core product path drive the transition from one tool to another. Remember, the two have interoperable protocols – so a mixed estate is achievable, and for the most part will be transparent.

Fluent Bit and AI: Unlocking Machine Learning Potential

30 Monday Dec 2024

Posted by mp3monster in Fluentbit, General, Technology

≈ Leave a comment

Tags

AI, artificial-intelligence, Cloud, Data Drift, development, Fluent Bit, GenAI, Machine Learning, ML, observability, Security, Technology, Tensor Lite, TensorFlow

These days, everywhere you look, there are references to Generative AI, to the point that what have Fluent Bit and GenAI got to do with each other? GenAI has the potential to help with observability, but it also needs observation to measure its performance, whether it is being abused, etc. You may recall a few years back that Microsoft was trailing new AI features for Bing, and after only having it in use for a couple of days, it had been recorded generating abusive comments and so on (Microsoft’s Tay is such an example).

But this isn’t the aspect of GenAI (or the foundations of AI with Machine Learning (ML)) I was thinking about. Fluent Bit can be linked to GenAI through its TensorFlow plugin. Is this genuinely of value or just a bit of ‘me too’?

There are plenty of backend use cases once the telemetry has been incorporated into an analytics platform, for example:

Making it easy to query and mine the observability data, such as natural language searching – to simplify expressing what is being looked for.
Outlier / Anomaly detection – when signals, particularly metrics, diverge from the normal patterns of behavior, we have the first signs of a problem. This is more Machine Learning than generative AI.
Using AI agents to tune monitoring thresholds and alerting scenarios

But these are all backend, big data style use cases and do not center on Fluent Bit’s core value of getting data sources to appropriate destination systems for such analysis or visualization.

To incorporate AI into Fluent Bit pipelines, we need to overcome a key issue – AI tends to be computationally heavy – making it potentially too slow for streams of signals being generated by our applications and too expensive given that most logs reflecting ‘business as usual’ are, in effect, low value.

There are some genuine use cases where lightweight AI can deliver value. First, we should be a little more precise. The TensorFlow plugin is the TensorFlow Lite version, also known as LiteRT. The name comes from the fact that it is a lite-weight solution intended to be deployable using small devices (by AI standards). This fits the Fluent Bit model of having a small footprint.

So, where can we put such a use case:

Translating stack traces into actionable information can be challenging. A trained ML or AI model can help classify and characterize the cause of a stack trace. As a result, we can move from the log to triggering appropriate actions.
Targeted use cases where we’ve filtered out most signal data to help analyze specific events – for example, we want to prevent the propagation of PII data downstream. Some PII data can be easily isolated through patterns using REGEX. For example, credit card IDs are a pattern of 4 digits in 4 groups. Phone numbers and email addresses can also be easily identified. However, postal addresses aren’t easy, particularly when handling multinational addresses, where the postal code/zip code can’t be used as an indicative pattern. Using AI to help with such checks means we must filter out signals to only examine messages that could accidentally carry such information.

When adopting AI into such scenarios, we have to be aware of the problems that can impact the use of ML and AI. These use cases are less high profile than the issues of hallucinations but just as important. As we’re observing software, which will change over time. As a result, payloads or data shifts (technically referred to as data drift) and the detection rate can drop. So, we need to measure the efficacy of the model. However, issues such as data drift need to be taken into account, as the scenario being detected may change in volume, reflecting changes in software usage and/or changes in how the solution works.

There are ways to help address such considerations, such as tracking false positive outcomes, and if the model can provide confidence scoring, is there a trend in the score?

Conclusion

There are good use cases for using Machine Learning (and, to an extent, Artificial Intelligence) within an observability pipeline – but we have to be selective in its application as:

The cost of the computation can outweigh the benefits
The execution time for such computation can be notably slower than our pipeline, leading to risks of back pressure if applied to every event in the pipeline.
The effectiveness and how much data drift might occur (we might initially see very good results, but then things can fall off).

Possibly, the most useful application is when the AI/ML engine has been trained to recognize patterns of events that preceded a serious operational issue (strictly, this is the use of ML).

Forward-looking

The true potential for Gen AI is when we move beyond isolating potential faults based on pattern recognition to using AI to help recommend or even trigger remediation processes.

Fluent Bit 3.2: YAML Configuration Support Explained

23 Monday Dec 2024

Posted by mp3monster in Fluentbit, General, Technology

≈ Leave a comment

Tags

book, Cloud, config, configuration, development, Fluent Bit, parsers, streams, stream_task, YAML

Among the exciting announcements for Fluent Bit 3.2 is the support for YAML configuration is now complete. Until now, there have been some outliers in the form of details, such as parser and streamer configurations, which hadn’t been made YAML compliant until now.

As a result, the definitions for parsers and streams had to remain separate files. That is no longer the case, and it is possible to incorporate parser definitions within the same configuration file. While separate configuration files for parsers make for easier re-use, it is more troublesome when incorporating the configuration into a Kubernetes deployment configuration, particularly when using a side-car deployment.

Parsers

With this advancement, we can define parsers like this:

Classic Fluent Bit

[PARSER]
    name myNginxOctet1
    format regex
    regex (?<octet1>\d{1,3})

YAML Configuration

parsers:
  - name: myNginxOctet1
    format: regex
    regex: '/(?<octet1>\d{1,3})/'

As the examples show, we swap [PARSER] for a parsers object. Then, each parser is an array of attributes starting with the parser name. The names follow a one-to-one mapping in most cases. This does break down when it comes to parsers where we can define a series of values, which in classic format would just be read in order.

Multiline Parsers

When using multiline parsers, we must provide different regular expressions for different lines. In this situation, we see each set of attributes become a list entry, as we can see here:

Classic Fluent Bit

[MULTILINE_PARSER]
  name multiline_Demo
  type regex
  key_content log
  flush_timeout 1000
  #
  # rule|<state name>|<regex>|<next state>
  rule "start_state" "^[{].*" "cont"
  rule "cont" "^[-].*" "cont"

YAML Configuration

multiline_parsers:
  - name: multiline_Demo
    type: regex
    rules:
    - state: start_state
      regex: '^[{].*'
      next_state: cont
    - state: cont
      regex: "^[-].*"
      next_state: cont

In addition to how the rules are nested, we have moved from several parameters within a single attribute(rule) to each rule having several discrete elements (regex, next_state). In addition to this, we have also changed the use of single and double quote marks.

If you want to keep the configurations for parsers and streams separate, we can continue to do so, referencing the file and name from the main configuration file. While converting the existing conf to a YAML format is the bulk of the work, in all likelihood, you’ll change the file extension to be .YAML will means you must also modify the referencing parsers_file reference in the server section of the main configuration file.

Streams

Streams follow very much the same path as parsers. However, we do have to be a lot more aware of the query syntax to remain within the YAML syntax rules.

Classic Fluent Bit

[STREAM_TASK]
  name selectTaskWithTag
  exec SELECT record_tag(), rand_value FROM STREAM:random.0;

[STREAM_TASK]
  name selectSumTask
  exec SELECT now(), sum(rand_value)   FROM STREAM:random.0;

[STREAM_TASK]
  name selectWhereTask
  exec SELECT unix_timestamp(), count(rand_value) FROM STREAM:random.0 where rand_value > 0;

YAML Configuration

stream_processor:
  - name: selectTaskWithTag
    exec: "SELECT record_tag(), rand_value FROM STREAM:random.0;"
  - name: selectSumTask
    exec: "SELECT now(), sum(rand_value) FROM STREAM:random.0;"
  - name: selectWhereTask
    exec: "SELECT unix_timestamp(), count(rand_value) FROM STREAM:random.0 where rand_value > 0;"

Note, it is pretty common for Fluent Bit YAML to use the plural form for each of the main blocks, although stream definition is an exception to the case. Additionally, both stream_processor and stream_task are accepted (although stream_task is not recognized in the main configuration file)..

Incorporating Configuration directly into the core configuration file

To support directly incorporating these definitions into a single file, we can lift the YAML file contents and apply them as root elements (i.e., at the same level as the pipeline, and service, for example).

Fluent Bit book examples

Our Fluent Bit book (Manning, Amazon UK, Amazon US, and everywhere else) has several examples of using parsers and streams in its GitHub repo. We’ve added the YAML versions of the configurations illustrating parsers and stream processing to its repository in the Extras folder.

Binary Large Objects with Fluent Bit

16 Monday Dec 2024

Posted by mp3monster in Fluentbit, General, Technology

≈ Leave a comment

Tags

3.2.2, Azure, Binary object, BLOB, configuration, Fluent Bit, use cases

When I first heard about Fluent Bit introducing the support binary large objects (BLOBs) in release 3.2. I was a bit surprised; often, handling such data structures is typical, and some might see it as an anti-pattern. Certainly, trying to pass such large objects through the buffers could very quickly blow up unless buffers are suitably sized.

But rather than rush to judgment, the use cases for handling blobs became clear after a little thought. First of all, there are some genuine use cases. The scenarios I’d look to blobs to help are for:

Microsoft applications can create dump files (.dmp). This is the bundling of not just the stack traces but the state, which can include a memory dump and contextual data. The file is binary in nature, and guess what? It can be rather large.
While logs, traces, and metrics can tell us a lot about why a component or application failed, sometimes we have to see the payload that is being processed – is there something in the data we never anticipated? There are several different payloads that we are handling increasingly even with remote and distributed devices, namely images and audio. While we can compress these kinds of payloads, sometimes that isn’t possible as we lose fidelity through compression, and the act of compression can remove the very artifact we need.

Real-world use cases

This later scenario I’d encountered previously. We worked with a system designed to send small images as part of product data through a messaging system, so the data was disturbed by too many endpoints. A scenario we encountered was the master data authoring system, which didn’t have any restrictions on image size. As a result, when setting up some new products in the supply chain system, a new user uploaded the ultra-high-resolution marketing images before they’d been prepared for general use. As you can imagine, these are multi-gigabyte images, not the 10s or 100s of kilobytes expected. The messaging’s allocated storage structures couldn’t cope with the payload.

We had to remotely access the failure points at the time to see what was happening and realize the issue. While the environment was distributed, it wasn’t as distributed as systems can be today, so remote access wasn’t so problematic. But in a more distributed use case, or where the data could have been submitted to the enterprise more widely, we’d probably have had more problems. Here is a case where being able to move a blob would have helped.

A similar use case was identified in the recent Release Webinar presented by Eduardo Silva Pereira, and a use case with these characteristics was explained. With modern cars, particularly self-driving vehicles, being able to transfer imagery back in the event navigation software experiences a problem is essential.

Avoid blowing up buffers.

To move the Blob without blowing up the buffering, the input plugin tells the blob-consuming output plugin about the blob rather than trying to shunt the GBs through the buffer. The output plugin (e.g., Azure Blob) takes the signal and then copies the file piece by piece. By consuming their blob in parts, we reduce the possible impacts of network disruption (ever tried to FTP a very large file over a network for the connection to briefly drop, as a result needing to from scratch?). The sender and receiver use a database table to track the communication and progress of the pieces and reassemble the blob. Unlike other plugins, there is a reverse flow from the output plugin back to the blob plugin to enable the process to be monitored. Once complete, the input plugin can execute post-transfer activities.

This does mean that the output plugin must have a network ‘line of sight’ to the blob when this is handled within a single Fluent Bit node – but it is something to consider if you want to operate in a more distributed model.

A word to the wise

Binary objects are known to be a means by which malicious code can easily be transported within an organization. This means that while observability tooling can benefit from being able to centralize problematic data for us to examine further, we could unwittingly help a malicious actor.

We can protect ourselves in several ways. Firstly, we must first understand and ensure the source location for the blob can only contain content that we know and understand. Secondly, wherever the blob is put, make sure it is ring-fenced and that the content is subject to processes such as malware detection.

Limitations

As the blob is handled with a new payload type, the details transmitted aren’t going to be accessible to any other plugins, but given how the mechanism works, trying to do such things wouldn’t be very desirable.

Input plugin configuration

At the time of writing, the plugin configuration details haven’t been published, but with the combination of the CLI and looking at the code, we do know the input plugin has these parameters:

Attribute Name	Description
path	Location to watch for blob files – just like the path for the tail plugin
exclude_pattern	We can define patterns that exclude files other than our blob files. The pattern logic, is the same as all other Fluent Bit patterns.
database_file	These are the same options as upload_success_action but are applied if the upload fails.
scan_refresh_interval	These are the same options as upload_success_action but are applied if the upload fails.
upload_success_action	This is a value that tells the plugin what to do, when successful. The options are: 0. Do nothing – the default action if no option is provided. delete (1). Delete the blob file add_suffix (2). Emit a Fluent Bit log record emit_log (3). Add suffix to the file – as defined by upload_success_suffix
upload_success_suffix	If the upload success_action is set to use a suffix, then the value provided here will be used as the suffix.
upload_success_message	This text will be incorporated into the Fluent Bit logs
upload_failure_action	These are the same options as upload_success_action but applied if the upload fails.
upload_failure_suffix	This is the failure version of upload_success_suffix
upload_failure_message	This is the failure version of upload_success_message

Output Options

Currently, the only blob output option is for the Azure Blob output plugin that works with the Azure Blob service, but support through using the Amazon S3 standard is being worked on. Once this is available, the feature will be widely available as the S3 standard is widely supported, including all the hyperscalers.

Note

The configuration information has been figured out by looking at the code. We’ll return to this subject when the S3 endpoint is provided and use something like Minio to create a local S3 storage capability.

Securing Fluent Bit operations

18 Monday Nov 2024

Posted by mp3monster in Fluentbit, General, Technology

≈ Leave a comment

Tags

Fluent Bit

I’ve been peer-reviewing a book in development called ML Workloads with Kubernetes for Manning. The book is in its first review cycle, so it is not yet available in the MEAP programme. I mention this because the book’s first few chapters cover the application of Apache Airflow and Juypter Notebooks on a Kubernetes platform. It highlights some very flexible things that, while pretty cool, could be seen by some organizations as potential attack vectors. I should say, the authors have engaged with security considerations from the outset). My point is that while we talk about various non-functional considerations, including security, there isn’t a section dedicated to security. So, we’re going to talk directly about some security considerations here.

It would be very easy to consider security as not being important when it comes to observability – but that would be a mistake, for a few reasons:

Logging Payloads

It is easy to incorporate all an application’s data payloads into observability signals such as traces and logs. It’s an easy mistake to make during initial development – you just want to initially see everything is being handled as intended during development, so include the payload. While we can go back and clean this up or even remove such output as we tidy up code – these things can slip through the wires. Just about any application today will want login credentials. Input credentials are about identifying who we are and determining if or what we can see. The fact that they can uniquely identify us is where we usually run into Data Protection law.

It isn’t unusual for systems to be expected to record who does what and when – all part of common auditing activities. That means our identity is going to often be attached to data flowing through our application.

This makes anywhere the records this data a potential gold mine of data, and the lack of diligence will mean that our operational support tools and processes will be soft targets.

Code Paths

Our applications will carry details of execution paths – from trace-related activities to exception stacks. We need this information to diagnose issues – it is even possible that the code will handle the issues, but it is typical to record the stack trace so we can see that the application has had to perform remediation (even if that is simply because we decided to catch an exception rather than have defensive code). So what? Well, that information tells us as developers what the application is doing – but in the wrong hands, that tells the consumer how they can induce errors and what third-party libraries we’re using (which means the reader can deduce what vulnerabilities we have) (see what OWASP says on the matter here).

Sometimes, our answer to a vulnerability might not be to fix it but to introduce mitigation strategies—e.g., we’ll block direct access to a system. The issue with such mitigations is that people will forget why they’re there or subvert them for the best of reasons, leaving them accidentally vulnerable again. So, minimizing exposure should be the second line of defense.

How does this relate to Fluent Bit?

Well, the first thing is to assume that Fluent Bit is handling sensitive data, remind ourselves of this from time to time, and even test it. This alone immediately puts us in a healthier place, and we at least know what risks are being taken.

Fluent Bit support SSL/TLS for network traffic

SSL/TLS traffic involves certificates; setting up and maintaining such things can be a real pain, particularly if the processes around managing certificates haven’t been carefully thought through and automated. Imposing the management of certificates with manual processes is the fastest way to kill off their adoption and use. Within an organization, certificates don’t have to be expensive ones that offer big payouts if compromised, such as those provided by companies like Thawte and Symantec. The Linux Foundation with Let’s Encrypt and protocols like ACME (Automated Certificate Management Environment) make it cost-free and provide automation for regular certificate rotation.

Don’t get suckered by the idea that SSL stripping at the perimeter is acceptable today. It used to be an acceptable thing to do because, among other reasons, the overhead of the processing of certificates was a measurable overhead. Moore’s law has seen to it that such computational overhead is tolerable if not fractions of a percentage cost. If not convinced, then consider the fact that there is sufficient drive that Kubernetes supports mutual SSL between containers that are more than likely to be actually running on the same physical server.

Start by Considering File systems on logs

If you’re working with applications or frameworks that direct logs to local files, you can do a couple of things. First, control the permissions on the files.

Many frameworks that support logging configuration don’t do anything with the logs (although some do, like Airflow). For those cases where log location doesn’t have a behavioral impact, we can look to control where the logs are being written. Structuring logs into a common part of the file system can make things easier to manage, certainly from a file system permissions viewpoint.

Watching for sensitive data bleed

If you’re using Fluent Bit to consolidate telemetry into systems like Loki, etc., then we should be running regular scans to ensure that no unplanned sensitive data is being caught. We can use tools like Telemetrygen to inject values into the event stream to test this process and see if the injected values are detected.

If or when such a situation occurs, the ideal solution is to fix the root cause. But, this isn’t always possible when the issue comes through a 3rd party library, an organization is reluctant to make changes or production changes are slow. In these scenarios and discussed in the book, we can use Fluent Bit configurations to mitigate the propagation of such data. But as we said earlier, if you use mitigations, it warrants verifying they aren’t accidentally undone, which takes us back to the start of this point.

Classifying and Tagging data

Telemetry, particularly traces and logs can be classified and tagged to reflect information about origin, nature of the event. This is mostly done nearest the source as understanding the origin helps the classification process. This task is something Fluent Bit can easily do and route accordingly as we can see in the book.

Don’t run Fluent Bit as root

Not running Fluent Bit with root credentials is security 101. But it is tempting when you want to use Fluent Bit to tap in and listen to the OS and platform logs and metrics, particularly if you aren’t a Linux specialist. It is worth investing in getting an OS base configuration that is secure while not preventing your observability. This doesn’t automatically mean you must use containers. Bare metal, etc., can be secured by not installing from a vendor base image but an image you’ve built, or even simpler, taking the base image and then using tools like Chef, Ansible, etc., to impose a configuration over the top.

Bottom Line

The bottom line is, as long as we keep in mind that our observability processes and data should be subject to the same care and consideration as our business data, along with the fact that security should never be an afterthought, something that we bolt on just before go live and pervasive rather than just at the boundary.

When I learnt to drive (in the dark ages), one of the things I was told is – if you assume that everyone on the road is a clueless idiot, then you’ll be ok. We should look at treating systems development and the adoption of security the same way – if you assume someone is likely to make a mistake and take defensive steps — then we’ll be ok — thiswill give us security in depth.

shhh – Fluent Bit book has gone to the printers, and …

13 Sunday Oct 2024

Posted by mp3monster in Books, Fluentbit, General, manning, Technology

≈ Leave a comment

Tags

book, ebook, FluentBit, manning, webinar

I thought you might like to know that last week, the production process on the book (Logs and Telemetry with Fluent Bit, written with the working title of Fluent Bit with Kubernetes) was completed, and the book should be on its way to the printers. In the coming weeks, you’ll see the MEAP branding disappear, and the book will appear in the usual places.

If you’ve been brilliant and already purchased the book – the finished version will be available to download soon, and for those who have ordered the ‘tree’ media version – a few more weeks and ink and paper will be on their way.

As part of the promotion, we will be doing a webinar with the book’s sponsor, To register for their webinar – go to https://go.chronosphere.io/fluent-bit-with-kubernetes-meet-the-author.html

Migrating from Fluentd to Fluent Bit

08 Tuesday Oct 2024

Posted by mp3monster in Fluentbit, Fluentd, General, Technology

≈ Leave a comment

Tags

devops, FluentBit, Fluentd, Kubernetes, mapping, migration, tooling, utility

Earlier in the year, I made a utility available that supported the migration from Fluent Bit classic configuration format to YAML. I also mentioned I would explore the migration of Fluentd to Fluent Bit. I say explore because while both tools have a common conceptual foundation, there are many differences in the structure of the configuration.

We discussed the bigger ones in the Logs and Telemetry book. But as we’ve been experimenting with creating a Fluentd migration tool, it is worth exploring the fine details and discussing how we’ve approached it as part of a utility to help the transformation.

Routing

Many of the challenges come from the key difference in terms of routing and consumption of events from the buffer. Fluentd assumes that an event is consumed by a single output; if you want to direct the output to more than one output, you need to copy the event. Fluent Bit looks at things very differently, with every output plugin having the potential to output every event – the determination of output is controlled by the match attribute. These two approaches put a different emphasis on the ordering of declarations. Fluent Bit focuses on routing and the use of tags and match declarations to control the rounding of output.

  <match *>
    @type copy
    <store>
      @type file
      path ./Chapter5/label-pipeline-file-output
      <buffer>
        delayed_commit_timeout 10
        flush_at_shutdown true
        chunk_limit_records 50
        flush_interval 15
        flush_mode interval
      </buffer>
      <format>
        @type out_file
        delimiter comma
        output_tag true
      </format> 
    </store>
    <store>
      @type relabel
      @label common
    </store>
  </match>

Hierarchical

We can also see that Fluentd’s directives are more hierarchical (e.g., buffer, and format are within the store) than the structures used by Fluentd Bit, so we need to be able to ‘flatten’ the hierarchy. As a result, it makes sense that where the copy occurs, we’ll define both outputs in the copy declaration as having their own output plugins.

Buffering

There is a notable difference between the outputs’ buffer configurations: in Fluent Bit, the output can only control how much storage in the filesystem can be used. As you can see in the preceding example, we can set the flushing frequency, control the number of chunks involved (regardless of storage type).

Pipelines

Fluentd allows us to implicitly define multiple pipelines of sources and destinations, as ordering of declarations and event consumption is key. ~In addition to this, we can group plugin behavior with the use of the Fluentd label attribute. But the YAML representation of a Fluent Bit doesn’t support this idea.

<source>
  @type dummy
  tag dummy
  auto_increment_key counter
  dummy {"hello":"me"}
  rate 1
</source>
<filter dummy>
 @type stdout
 </filter>
<match dummy>
  @id redisTarget
  @type redislist
  port 6379
</match>
<source>
  @id redisSource
  @type redislist
  tag redisSource
  run_interval 1
</source>
<match *>
  @type stdout
</match>

Secondary outputs

Fluentd also supports the idea of a secondary output as the following fragment illustrates. If the primary output failed, you could write the event to an alternate location. Fluent Bit doesn’t have an equivalent mechanism. To create a mapping tool, we’ve taken the view we should create a separate output.

<match *>
    @type roundrobin
    <store> 
      @type forward
      buffer_type memory
      flush_interval 1s  
      weight 50
      <server>
        host 127.0.0.1
        port 28080
      </server>  
    </store>
    <store>
      @type forward
      buffer_type memory
      flush_interval 1s        
        weight 50
      <server>
        host 127.0.0.1
        port 38080
      </server> 
    </store>
  <secondary>
    @type stdout
  </secondary>
</match>

The reworked structure requires consideration for the matching configuration, which isn’t so easily automated and can require manual intervention. To help with this, we’ve included an option to add comments to link the new output to the original configuration.

Configuration differences

While the plugins have a degree of consistency, a closer look shows that there are also attributes and, as a result, features of plugins that don’t translate. To address this, we have commented out the attribute so that the translated configuration can be seen in the new configuration to allow manual modification.

Conclusion

While the tool we’re slowly piecing together will do a lot of the work in converting Fluentd to Fluent Bit, there aren’t exact correlations for all attributes and plugins. So the utility will only be able to perform the simplest of mappings without developer involvement. But we can at least help show where the input is needed.

Resources

Fluent Bit – using Lua script to split up events into multiple records

20 Friday Sep 2024

Posted by mp3monster in Fluentbit, General, Technology

≈ Leave a comment

Tags

FluentBit, Lua, plugins

One of the really advanced features of Fluent Bit’s use of Lua scripts is the ability to split a single log event so downstream processing can process multiple log events. In the Logging and Telemetry book, we didn’t have the space to explore this possibility. Here, we’ll build upon our understanding of how to use Lua in a filter. Before we look at how it can be done, let’s consider why it might be done.

Why Split Fluent Bit events

This case primarily focuses on the handling of log events. There are several reasons that could drive us to perform the split. Such as:

Log events contain metrics data (particularly application or business metrics). Older systems can emit some metrics through logging such as the time to complete a particular process within the code. When data like this is generated, ideally, we expose it to tools most suited to measuring and reporting on metrics, such as Prometheus and Grafana. But doing this has several factors to consider:
- A log record with metrics data is unlikely to generate the data in a format that can be directed straight to Prometheus.
- We could simply transform the log to use a metrics structure, but it is a good principle to retain a copy of the logs as they’re generated so we don’t lose any additional meaning, which points to creating a second event with a metrics structure. We may wish to monitor for the absence of such metrics being generated, for example.
When transactional errors occur, the logs can sometimes contain sensitive details such as PII (Personally Identifiable Information). We really don’t want PII data being unnecessarily propagated as it creates additional security risks – so we mask the PII data for the event to go downstream. But, at the same time, we want to know the PII ID to make it easier to identify records that may need to be checked for accuracy and integrity. We can solve this by:
- Copying the event and performing the masking with a one-way hash
- Create a second event with the PII data, which is limited in its propagation and is written to a data store that is sufficiently secured for PII data, such as a dedicated database

In both scenarios provided, the underlying theme is creating a version of the event to make things downstream easier to handle.

Implementing the solution

The key to this is understanding how the record construct is processed as it gets passed back and forth. When the Lua script receives an event, it arrives in our script as a table construct (Java developers, this approximates a HashMap), with the root elements of the record representing the event payload.

Typically, we’d manipulate the record and return it with a flag saying the structure has changed, but it is still a table. But we could return an array of tables. Now each element (array entry) will be processed as its own log event.

A Note on how Lua executes copying

When splitting up the record, we need to understand how Lua handles its data. if we tried to create the array with the code:

record1 = record
record2 = record
newRecord[record1, record2]

Then we manipulated newRecord[1] We would still impact both records; this is because Lua, like its C underpinning, always uses shallow references rather than deep copies of objects. So we need to ensure we perform a deep copy before manipulating the records. You can see this in our example configuration (here on GitHub), or look at the following Lua code fragment:

function copy(obj)
  if type(obj) ~= 'table' then return obj end
  local res = {}
  for k, v in pairs(obj) do res[copy(k)] = copy(v) end
  return res
end

The proof

To illustrate the behavior, we have created a configuration with a single dummy plugin that only emits a single event. That event is then picked up by a Filter with our Lua script. After the filter, we have a simple output plugin. As a result of creating two records, we should see two output entries. To make it easy to compare, in the Lua script, we have a flag called deepCopy; when set to true – we’ll clone the records and modify payload values; when set to true – we then perform the split.

[SERVICE]
  flush 1

[INPUT]
    name dummy
    dummy {   "time": "12/May/2023:08:05:52 +0000",   "remote_ip": "10.4.72.163",   "remoteuser": "-",   "request": {     "verb": "GET",     "path": " /downloads/product_2",     "protocol": "HTTP",     "version": "1.1"   },   "response": 304}
    samples 1
    tag dummy1

[FILTER]
    name lua
    match *
    script ./advanced.lua
    call cb_advanced
    protected_mode true

[OUTPUT]
    name stdout
    match *

Limitations and solutions

While we can easily split events up and return multiple records, we can’t use different tags or timestamps. Using the same timestamp is pretty sensible, but different tags may be more helpful if we want to route the different records in other ways.

As long as the record contains the value we want to use as a tag, we can add to the pipeline a tag-write plugin and point it to the attribute to parse with a REGEX. To keep things efficient, if we create an element that is just the tag when creating the new record, then the REGEX becomes a very simple expression to match the value.

Conclusion

We’ve seen a couple of practical examples of why we might want to spin out new observability events based on what we get from our system. An important aspect of the process is how Lua handles memory.

26th June Update – another article

2nd July Update – and more …

8th July Update – and more …

Other Posts …

Share this:

Security Improvements

Processor Improvements

Trace Sampling

Conditionality in Processors

Plugins with Zig

Not Only, but Also

Additional resources

Share this:

What do Fluentd’s Labels Do?

Fluent Bit’s routing

Migration options

Refactor

REGEX

Multiple Fluent Bit Instances

Conclusion

Fluentd to Fluent Bit Portability a possibility?

Do you need a migration?

More reading

Share this:

Conclusion

Forward-looking

Share this:

Parsers

Classic Fluent Bit

YAML Configuration

Multiline Parsers

Classic Fluent Bit

YAML Configuration

Streams

Classic Fluent Bit

YAML Configuration

Incorporating Configuration directly into the core configuration file

Fluent Bit book examples

Share this:

Real-world use cases

Avoid blowing up buffers.

A word to the wise

Limitations

Input plugin configuration

Output Options

Note

Share this:

Logging Payloads

Code Paths

How does this relate to Fluent Bit?

Fluent Bit support SSL/TLS for network traffic

Start by Considering File systems on logs

Watching for sensitive data bleed

Classifying and Tagging data

Don’t run Fluent Bit as root

Bottom Line

Share this:

Share this:

Routing

Hierarchical

Buffering

Pipelines

Secondary outputs

Configuration differences

Conclusion

Resources

Share this:

Why Split Fluent Bit events

Implementing the solution

A Note on how Lua executes copying

The proof

Limitations and solutions

Conclusion

Resources

Share this: