Fluent Bit Processors and Processor Conditions

18 Tuesday Nov 2025

Posted by mp3monster in Fluentbit, General, Technology

Tags

conditional, configuration, examples, FLB, Fluent Bit, Logs and Telemetry, processors

Fluent Bit’s processors were introduced in version 3. But we have seen the capability continue to advance, with the introduction of the ability to conditionally execute processors. Until the ability to define conditionality directly with the processor, the control has had to be fudged around (putting processors further downstream and controlling it with filters and routing, etc).

In the Logs and Telemetry book, we go through the basic Processor capabilities and constraints, such as:

Configuration file constraints
Performance differences compared to using filter plugins

So we’re not going to revisit those points here.

We can chain processors to run in a sequence by directly defining the processors in the order in which they need to be executed. You can see the chaining in the example below.

Scenario

In the following configuration, we have created several input plugins using both the Dummy plugin and the HTTP plugin. Using the HTTP plugin makes it easy for us to control execution speed and change test data values, which helps us see different filter behaviours. To make life easy we’ve provided several payloads, and a simple caller.[bat|sh] script which takes a single parameter [1, 2, 3, …] which identifies the payload file to send.

All of these resources are available in the Book’s GitHub repository as part of the extras folder, which can be found here. This saves us from embedding everything into this blog.

Filters as Processors and Chaining

Filters can be used as processors as well as the dedicated processor types such as SQL asnd content modifier. The Filters just need to be referenced using the name attribute e.g. name: regex would use the REGEX filter.

Processor Conditions

If you’re familiar with Kubernetes selectors syntax, then the definition conditions for a processor will feel familiar. The condition is made up of a condition (#–A–) and the condition will contain one or more rules (#–B–). The rule defines an expression which will yield a Boolean outcome by identifying:

An element of the payload (for example, a field/element in the log structure, or a metric, etc).
The value to evaluate the field against
The evaluation operator, which can be one of the typical operators e.g eq (equals), (see below for the full list).

Since the rules are a list, you can include as many as needed. The condition, in addition to the rule , has its own operator (#–C–), which tells the condition how to combine the results of each of the rules together. As we need a Boolean value, we can only use a logical and or a logical or. When we have a single rule then the operation is tested with itself.

In the following example, we have two inputs with processors to help demonstrate the different behavior. In the dummy source, we can see how a nested element can be accessed (i.e. $<element name>[‘<child element name>‘] ), performing a string comparison. Here we’re using a normal filter plugin as a processor.

With our HTTP source, we’re demonstrating that we can have two processors with their own conditions. The first processor is interesting, as it illustrates an exception to the convention; we can express conditionality within the Lua code (#–D–), but it ignores the condition construct. It is obviously debatable as to the value of a condition for the Lua processor, but it is worth considering, as there is an overhead when calling the LuaJIT if the condition can be quickly resolved internally.

service:
  flush: 1
  log_level: debug
  
pipeline:
  inputs:
    - name: dummy
      dummy: '{"request": {"method": "GET", "path": "/api/v1/resource"}}'
      tag: request.log
      Interval_sec: 60
      processors:
        logs:
          - name: content_modifier
            action: insert
            key: content_modifier_processor
            value: true
            condition:     #--A--
              op: and      #--C--
              rules:       #--B--
                - field: $request['method']"
                  op: eq
                  value: "GET"    
    - name: http
      port: 9881
      listen: 0.0.0.0
      successful_response_code: 201
      success_header: x-fluent-bit received
      tag: http
      tag_key: token
      processors:
        logs:
          - name: lua
            call: modify
            code: |
              function modify(tag, timestamp, record)
                new_record = record
                new_record["conditional"] = "condition-triggered"
                return 1, timestamp, new_record
              end
            condition:   #--D--
              op: and
              rules:
                - field: "$classifier"
                  op: eq
                  value:  "1"
          - name: content_modifier
            action: insert
            key: content_modifier_processor2
            value: true
            condition:
              op: and
              rules:
                - field: "$classifier"
                  op: eq
                  value: "2"
          - name: sql
            query: "SELECT token, classifier FROM STREAM;"
            condition:
              op: and
              rules:
                - field: "$classifier"
                  op: eq
                  value: "3"
  outputs:
    - name: stdout
      match: "*"

To run the demonstration, we’ve provided several test payloads and a simple script that will call the Fluent Bit HTTP input plugin with the correct file. We just need to pass the number associated with the log file e.g. <Log>1<.json> is caller.[bat|sh] 1, and so on. The script is a variation of:

set fn=log%1%.json
echo %fn
curl -X POST --location 127.0.0.1:9881 --header Content-Type:application/json --data @%fn%

An example of one of the test payloads:

{"msg" : "dynamic tag", "helloTo" : "the World", "classifier" : 1, "token": "token1"}

Conclusion

Once you’ve got a measure of the condition structure, making the processors conditional is very easy.

Operators available


Operator	Greater than ( > )
eq	Equals
neq	Not Equals
gt	Greater than, or equal to ( >= )
gte	Grather than, or equal to ( >= )
lt	Less than ( < )
lte	Less than or equal to ( =< )
in	Is a value in a defined set (array) e.g. op: in value: [“a”, “b”, “c”]
not_in	Is a value not in a defined set (array) e.g. op: not_in value: [“a”, “b”, “c”]
regex	Matches the regular expression defined e.g op: regex value: ^a*z
not_regex	Does not match the regular expression provided e.g op: not_regex value: ^a*z

Observations on vibe coding Fluent Bit configurations

13 Thursday Nov 2025

Posted by mp3monster in AI, development, Fluentbit, General, Technology

≈ Leave a comment

Tags

AI, Cline, code generation, configuration, Fluent Bit, Grok, LLM, LLMs, observations, plugins, prompt engineering, prompts, vibe coding

With vibe coding (an expression now part of the Collins Dictionary) and tools like Cline, which are penetrating the development process, I thought it would be worthwhile to see how much can be done with an LLM for building configurations. Since our preference for a free LLM has been with Grok (xAI) for development tasks, we’ve used that. We found that beyond the simplest of tasks, we needed to use the Expert level to get reliable results.

This hasn’t been a scientific study, as you might find in natural language-to-SQL research, such as the Bird, Archer, and Spider benchmarks. But I have explained below what I’ve tried and the results. Given that LLMs are non-deterministic, we tried the same question from different browser sessions to see how consistent the results were.

Ask the LLM to list all the plugins it knows about

To start simply, it is worth seeing how many plugins the LLM might recognize. To evaluate this, we asked the LLM to answer the question:

list ALL fluent bit input plugins and a total count

Grok did a pretty good job of saying it counted around 40. But the results did show some variation: on one occasion, it identified some plugins as special cases, noting the influence of compilation flags; on another attempt, no references were made to this. In the first attempts, it missed the less commonly discussed plugins such as eBPF. Grok has the option to ask it to ‘Think Harder’ (which switches to Exper mode) when we applied this, it managed to get the correct count, but in another session, it also dramatically missed the mark, reporting only 28 plugins.

Ask the LLM about all the attributes for a Plugin

To see if the LLM could identify all the attributes of a plugin, we used the prompt:

list all the known configuration attributes for the Fluent Bit Tail plugin

Again, here we saw some variability, with the different sessions providing between 30 and 38 attributes (again better results when in Expert mode). In all cases, the main attributes have been identified, but what is interesting is that there were variations in how they have been presented, some in all lowercase, others with leading capitalization.

Ask the LLM to create a regular expression

Regular expressions are a key way of teasing out values that may appear in semi-structured or unstructured logs. To determine if the LLM could do this, we used the following prompt:

Create a regular expression that Fluent Bit could execute that will determine if the word fox appears in a single sentence, such as 'the quick brown fox jumped over the fence'

The result provided offered variations on the expression \bfox\b (sometimes allowing for the possibility that Fox might be capitalized). In each of the cases where we went back to the LLM and added to the prompt that Note that the expression should be case sensitive – It was consistently correct.

The obvious takeaway here is that to avoid any errors, we do need to be very explicit.

Ask the LLM to create a simple filter and output

Given that we can see the LLM understands the most commonly used Fluent Bit plugins, let’s provide a simple scenario with a source log that will have some rotation, and see how it processes it. The prompt provided is:

create a classic configuration for fluent bit that will read an input from a file called myApp which has a post fix with the date. The input needs to take into account that the file extension will change at midnight. Any log entries that show an error and reference an IP need to be filtered to go to a CSV file with columns in the order tag, IP, log, timestamp

The handling of this question differed significantly across sessions: one response implemented filtering and output structuring via a Lua function, while another maximized the use of multiple filters. None of the solutions recognized that the output could reference standard Fluent Bit attributes for the tag and timestamp.

What was interesting was that the LLM also picked up on the possibility of externalizing the parser expressions, such as finding the IP. Interestingly, Fluent Bit includes a predefined set of filters, including an IPv4 filter.

Ask the LLM to use a stream processor

Stream Processors are a less commonly used feature of Fluent Bit, so it would be interesting to see how well the LLM can handle such use cases.

create a Fluent Bit YAML configuration which takes an NGINX log file input and uses a SQL stream processor to count the number of calls identified and the average time between these calls as coming from IP addresses starting 10. every minute. The NGINX logs need to be forwarded to a fluent bit node on the server named ops.myCorp.com and the output of the stream processing needs to go to a local log file called counter.txt and written to a Prometheus end point called ops_prometheus.myCorp.com

The result of this prompt comes in two parts. A main configuration:

service:
  log_level: info
  flush: 1

pipeline:
  inputs:
    - name: forward
      host: 0.0.0.0
      port: 24224
      tag: nginx

  filters:
    - name: parser
      match: nginx
      key_name: log
      parser: nginx
      reserve_data: true

  outputs:
    - name: file
      match: ip10.stats
      path: /var/log/counter.txt  # Adjust path as needed
      format: template
      template: '{ "timestamp": "{time}", "count": {count}, "avg_interval": {avg_interval} }'

    - name: prometheus_remote_write
      match: ip10.stats
      host: ops_prometheus.myCorp.com
      port: 9091  # Common port for Prometheus remote write; adjust if needed
      add_label:
        - job fluent-bit
        - instance ${HOSTNAME}

parsers:
  - name: nginx
    format: regex
    regex: '^(?<remote>[^ ]*) (?<host>[^ ]*) (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)"(?: "(?<xforwardfor>[^\"]*)")?)?$'
    time_key: time
    time_format: '%d/%b/%Y:%H:%M:%S %z'

stream_processor:
  - name: ip10_stats
    exec: "CREATE STREAM ip10_stats WITH (tag='ip10.stats') AS SELECT COUNT(*) AS count, (MAX(RECORD_TIME()) - MIN(RECORD_TIME())) / (COUNT(*) - 1) AS avg_interval FROM TAG:'nginx' WHERE remote LIKE '10.%' WINDOW TUMBLING (1 MINUTE);"

The configuration for the second node provided is:

service:
  log_level: info

pipeline:
  inputs:
    - name: tail
      path: /path/to/nginx/access.log  # Adjust to your NGINX access log path
      tag: nginx
      db: /var/log/fluent-bit-tail.db  # For position tracking

  outputs:
    - name: forward
      match: nginx
      host: ops.myCorp.com
      port: 24224

As you can see, while the configuration is well-formed, the LLM has confused the input and output requirements. This is because we have not been explicit enough about the input. The thing to note is that if we tweak the prompt to be more explicit and resubmit it, we get the same mistake, but submitting the same revised prompt produces the correct source. We can only assume that this is down to the conversational memory yielding a result that takes precedence over anything older. For transparency, here is the revised prompt with the change underlined.

create a Fluent Bit YAML configuration which takes an NGINX log file as an input. The NGINX input uses a SQL stream processor to count the number of calls identified and the average time between these calls as coming from IP addresses starting 10. every minute. The input NGINX logs need to be output by forwarding them to a fluent bit node on the server named ops.myCorp.com and the output of the stream processing needs to go to a local log file called counter.txt and written to a Prometheus end point called ops_prometheus.myCorp.com

The other point of note in the generated configuration provided is that there doesn’t appear to be a step that matches the nginx tag, but results in events with the tag ip10.stats. This is a far more subtle problem to spot.

What proved interesting is that, with a simpler calculation for the stream, the LLM in an initial session ignored the directive to use the stream processor and instead used the logs_to_metrics filter, which is a simpler, more maintainable approach.

Creating Visualizations of Configurations

One of the nice things about using an LLM is that, given a configuration, it can create a visual representation (directly) or in formats such as Graphviz, Mermaid, or D2. This makes it easier to follow diagrams immediately. Here is an example based on a corrected Fluent Bit configuration:

The above diagram was created by the LLM as a mermaid file as follows

graph TD
A[NGINX Log File
/var/log/nginx/access.log] -->|"Tail Input & Parse (NGINX regex)"| B[Fluent Bit Pipeline
Tag: nginx.logs]
B -->|Forward Output| C[Remote Fluent Bit Node
ops.myCorp.com:24224]
B -->|"SQL Stream Processor
Filter IPs starting '10.'
Aggregate every 60s: count & avg_time"| D[Aggregated Results
Tag: agg.results]
D -->|"File Output
Append template lines"| E[Local File
/var/log/counter.txt]
D -->|"Prometheus Remote Write Output
With custom label"| F[Prometheus Endpoint
ops_prometheus.myCorp.com:443]

Lua and other advanced operations

When it comes to other advanced features, the LLM appears to handle this well, but given that there is plenty of content available on the use of Lua, the LLM could have been trained on. The only possible source of error would be the passing of attributes in and out of the Lua script, which it handled correctly.

We experimented with handling open telemetry traces, which were handled without any apparent issues. Although advanced options like compression control weren’t included (and neither were they requested).

We didn’t see any hallucinations when it came to plugin attributes. When we tried to ‘bait’ the LLM into hallucinating an attribute by asking it to use an attribute that doesn’t exist, it responded by adding the attribute along with a configuration comment that the value was not a valid Fluent Bit attribute.

General observations

One thing that struck me about this is that describing what is wanted with sufficient detail for the LLM requires a higher level of focus (albeit for a shorter period) than manually creating the configuration. The upside is that we are, in effect, generating part of a detailed design document. Today, I’d go so far as to suggest including the prompt(s) used in the configuration header(s) to help people understand what is needed from the configuration.

When the LLM generates the configuration, there will be no escaping the need to understand what the plugins do and the options they offer to determine whether the LLM has got it right. If the LLM gets it wrong, we would recommend starting a fresh session.

This does raise a bit of an interesting challenge. If you need to have a fair handle on how Fluent Bit works and common ways plugins work, then actually writing your own configuration files may be better anyway, as that approach will provide more reinforcement to learning. We, of course, can use the LLM to help ‘peer review’ our configurations in the sense of checking syntax, and generating visual representations can help confirm the configuration is as we expect.

This may be a personal thing, as I find myself working with Fluent Bit in small bursts rather than continually. As a result, I remember what I can do with a plugin, but I often forget the exact names of the less commonly used attributes to effect the desired behavior. Looking up, and keying in the attribute names can take time. One thing that we have tried was to include in the prompt statements, such as:

In the configuration include all the attributes even if they are only populated with their default or example values. Add a comment for each attribute to show its surpose and whether it is mandatory or not.

What you get then is all the possible options, with the correct names (something that is easy for the LLM, as it is essentially asking it to reformat the documentation for you). Then it becomes a case of removing the attributes that aren’t needed line by line—a lot quicker and easier than manually adding them.

LLM generated plugin with all attributes described (slightly modified output to make the point clearer).

[INPUT]
   Name tail  
     # Purpose: Specifies the input plugin to use. Mandatory: yes
   buffer_chunk_size 32k   
     # Purpose: Set the initial buffer size to read file data. The value must be according to the Unit Size specification. Mandatory: no
   buffer_max_size 32k  
    # Purpose: Set the limit of the buffer size per monitored file. If reading a file exceeds this limit, the file is removed from the monitored file list. The value must be according to the Unit Size specification. Mandatory: no
   path /var/log/blah.txt  
    # Purpose: Pattern specifying a specific log file or multiple ones through the use of common wildcards. Allows multiple patterns separated by commas. Mandatory: yes
    path_key filename  
     # Purpose: If enabled, it appends the name of the monitored file as part of the record. The value assigned becomes the key in the map. Mandatory: no
  exclude_path *.gz,*.zip  
     # Purpose: Set one or multiple shell patterns separated by commas to exclude files matching certain criteria, e.g., *.gz,*.zip. Mandatory: no
  offset_key offset  
     # Purpose: If enabled, Fluent Bit appends the offset of the current monitored file as part of the record. The value assigned becomes the key in the map. Mandatory: no
  read_from_head false  
    # Purpose: For new discovered files on start (without a database offset/position), read the content from the head of the file, not tail. Mandatory: no
  refresh_interval 60  
    # Purpose: The interval of refreshing the list of watched files in seconds. Mandatory: no
  rotate_wait 5  
    # Purpose: Specify the number of extra time in seconds to monitor a file once it is rotated in case some pending data is flushed. Mandatory: no
  ignore_older 1d  
    # Purpose: Ignores files older than ignore_older. Supports m, h, d (minutes, hours, days) syntax. Mandatory: no
  ignore_active_older_files  false  
    # Purpose: Ignore files that are older than the value set in ignore_older even if the file is being ingested. Mandatory: no
  skip_long_lines off  
    # Purpose: When a monitored file reaches its buffer capacity due to a very long line (buffer_max_size), the default behavior is to stop monitoring that file. skip_long_lines alters that behavior and instructs Fluent Bit to skip long lines and continue processing other lines that fit into the buffer size. Mandatory: no
  skip_empty_lines off  
    # Purpose: Skips empty lines in the log file from any further processing or output. Mandatory: no
  db /fluent-bit/tail/blah.db  
    # Purpose: Specify the database file to keep track of monitored files and offsets. Recommended to be unique per plugin. Mandatory: no
  db.sync normal  
    # Purpose: Set a default synchronization (I/O) method. Affects how the internal SQLite engine synchronizes to disk. Values: extra, full, normal, off. Mandatory: no
  db.locking false  
    # Purpose: Specify that the database will be accessed only by Fluent Bit. Enabling this increases performance but restricts external tools from querying the content. Mandatory: no
  db.journal_mode wal  
    # Purpose: Sets the journal mode for databases (wal). Enabling wal provides higher performance. Not compatible with shared network file systems. Mandatory: no
  db.compare_filename false  
    # Purpose: Determines whether to review both inode and filename when retrieving stored file information from the database. true verifies both; false checks only the inode. Mandatory: no
  mem_buf_limit 10MB  
    # Purpose: Set a memory limit that the Tail plugin can use when appending data to the engine. If the limit is reached, it will be paused. When the data is flushed, it resumes. Mandatory: no
  exit_on_eof false  
    # Purpose: When reading a file, exit as soon as it reaches the end of the file. Used for bulk load and tests. Mandatory: no
  parser none  
    # Purpose: Specify the name of a parser to interpret the entry as a structured message. Mandatory: no
  key log  
   # Purpose: When a message is unstructured (no parser applied), it's appended as a string under the key name log. This option lets you define an alternative name for that key. Mandatory: no
  inotify_watcher true  
   # Purpose: Set to false to use file stat watcher instead of inotify. Mandatory: no
  tag blah.logs  
   # Purpose: Set a tag with regexextract fields that will be placed on lines read. Supports tag expansion: if the tag includes an asterisk (*), that asterisk will be replaced with the absolute path of the monitored file, with slashes replaced by dots. Mandatory: no
  tag_regex (?<pod_name>[a-z0-9](?:[-a-z0-9]*[a-z0-9])?(?:\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*)_(?<namespace_name>[^_]+)_(?<container_name>.+)-(?<container_id>[a-z0-9]{64})\.log$  
    # Purpose: Set a regular expression to extract fields from the filename. Mandatory: no
  static_batch_size 50M  
    # Purpose: Set the maximum number of bytes to process per iteration for the monitored static files (files that already exist upon Fluent Bit start). Mandatory: no
  file_cache_advise on  
    # Purpose: Set the posix_fadvise in POSIX_FADV_DONTNEED mode. This reduces the usage of the kernel file cache. This option is ignored if not running on Linux. Mandatory: no
  threaded false  
    # Purpose: Indicates whether to run this input in its own thread. Mandatory: no
  Unicode.Encoding none  
    # Purpose: Set the Unicode character encoding of the file data. Requests two-byte aligned chunk and buffer sizes. Supported values: UTF-16LE, UTF-16BE, and auto. Mandatory: no
  Generic.Encoding none  
    # Purpose: Set the non-Unicode encoding of the file data. Supported values: ShiftJIS, UHC, GBK, GB18030, Big5, Win866, Win874, Win1250, Win1251, Win1252, Win1254, Win1255, Win1256. Mandatory: no
  multiline.parser docker,cri  
    # Purpose: Specify one or multiple Multiline Parser definitions to apply to the content. When using a new multiline.parser definition, you must disable the old configuration from your tail section (e.g., parser, parser_firstline, parser_N, multiline, multiline_flush, docker_mode). Mandatory: no
  multiline off  
    # Purpose: If enabled, the plugin will try to discover multiline messages and use the proper parsers to compose the outgoing messages. When this option is enabled, the parser option isn't used. Mandatory: no
  multiline_flush 4  
    # Purpose: Wait period time in seconds to process queued multiline messages. Mandatory: no
  parser_firstline none  
    # Purpose: Name of the parser that matches the beginning of a multiline message. The regular expression defined in the parser must include a group name (named capture), and the value of the last match group must be a string. Mandatory: no
  parser_1 example_parser  
    # Purpose: Optional. Extra parser to interpret and structure multiline entries. This option can be used to define multiple parsers. For example, parser_1 ab1, parser_2 ab2, parser_N abN. Mandatory: no
  docker_mode off  
    # Purpose: If enabled, the plugin will recombine split Docker log lines before passing them to any parser. This mode can't be used at the same time as Multiline. Mandatory: no
  docker_mode_flush 4  
   # Purpose: Wait period time in seconds to flush queued unfinished split lines. Mandatory: no
  docker_mode_parser none  
    # Purpose: Specify an optional parser for the first line of the Docker multiline mode. The parser name must be registered in the parsers.conf file. Mandatory: no

[OUTPUT]
  Name   stdout
  Match  *
  Format json_lines

[OUTPUT] Name stdout Match * Format json_lines

I would also suggest it is worth trying the same prompt is separate conversations/sessions to see how much variability in approach to using Fluent Bit is thrown up. If the different sessions yield very similar solutions, then great. If the approaches adopted are different, then it is worth evaluating the options, but also considering what in your prompt might need clarifying (and therefore the problem being addressed).

Generating a diagram to show the configuration is also a handy way to validate whether the flow of events through it will be as expected.

The critical thing to remember is that the LLM won’t guarantee that what it provides is good practice. The LLM is generating a configuration based on what it has seen, which can be both good and bad. Not to mention applying good practice demands that we understand why a particular practice is good, and when it is to our benefit. It is so easy to forget that LLMs are primarily exceptional probability engines with feedback loops, and their reasoning is largely probabilities, feedback loops, and some deterministic tooling.

Fluentd Labels and Fluent Bit

31 Monday Mar 2025

Posted by mp3monster in Fluentbit, Fluentd, General, Technology

≈ 1 Comment

Tags

configuration, Fluent Bit, Fluentd, labels, migrating, migration, regex, relabelling, tag, tags

Recently, a question came up on the Fluent Bit Slack group about how the Fluentd Label feature and the associated relabel plugin map from Fluentd to Fluent Bit.

Fluentd Labels are a way to implement event routing, and we’ll look more closely at them in a moment. Both Fluent Bit and Fluentd support the control of which plugins respond to an event by using tags, which are based on a per-plugin basis.

To address this, let’s take a moment to see what Fluentd’s labels do.

What do Fluentd’s Labels Do?

Fluentd introduced the concept of labels, where one or more plugins could be grouped, creating a pipeline of plugins. Events can also be labeled, effectively putting them into a pipeline that can be defined with that label.

As a result, we would see something like:

<input>
  Label myLabel
</input>

<input>
  Label myOtherLabel
</input>

<label myLabel>
    <filter>
         -- do something to the log event
    </filter>

    <filter>
         -- do something to the log event
    </filter>

    <output>
    </output>
</label>

<label myOtherLabel>
    <filter>
         -- do something to the log event
    </filter>

    <output>
    </output>
</label>

<output tagName>
</output>

Fluentd’s labelling essentially tries to simplify the routing of events within a Fluentd deployment, particularly when multiple plugins are needed, aka a pipeline.

Fluent Bit’s routing

Fluent Bit doesn’t support the concept of labels. While both support tags can include wildcards within the tag name, Fluent Bit has an extension that adds power to the routing using tags. Rather than introducing an utterly separate routing control, it extended how tags can be used by allowing the matching to be achieved through regular expressions, which is far more flexible. This could look like (using classic format):

[INPUT]
    Name plugin-name
    Tag  myTag

[INPUT]
    Name another-plugin-name
    Tag  myOtherTag

[INPUT]
    Name alternate-plugin-name
    Tag  myAlternateTag

[OUTPUT]
    Name   output-plugin-name
    Match  myTag

[OUTPUT]
    Name   another-output-plugin
    Match_regex  my[Other|Alternate]Tag

The only downsides to processing regular expressions this way are the potentially greater computational effort (depending on the sophistication of the regular expression) and the use of the match on every plugin.

Migration options

The original question was prompted by the idea of migrating from Fluentd to Fluent Bit. When considering this, Labels don’t have a natural like-for-like transition path.

There are several options …

Refactor to use a structured tag approach
Adopt REGEX matches against tags
Separate Fluent Bit deployments

Refactor

Often, tags originate from a characteristic or direct attribute of the event message (payload). Instead, treat a tag purely as a routing mechanism, design a hierarchical routing strategy based on domains, and then use your tags for just this purpose. Aligning tags to domains rather than technical characteristics will help.

This creates the opportunity to progressively refactor out the need for labels. this will then make the transition through

REGEX

An alternative to this is to adopt regular expressions to select appropriate tags regardless of their structure, naming convention, or use of case. While this is very flexible, the expressions can be harder to maintain, and if some tags are driven by event data, there is an element of risk (although likely small) of an unexpected event being caught and processed by a plugin as it unwittingly matches a regular expression.

Multiple Fluent Bit Instances

Fluent Bit’s footprint is very small, notably smaller than that of Fluentd, as no runtime components like Ruby are involved. This means we could deploy multiple instances, with each instance acting as an implicit pipeline for events to be processed. The downside is that the equivalent of relabelling is more involved, as you’ll have to have a plugin to explicitly redirect the event to another instance. We also need to ensure that the events start with the correct Fluent Bit node.

Conclusion

When we try to achieve the same behaviour in two differing products that do have feature differences, trying to force a new product to produce exactly the same behaviour can result in decisions that can feel like compromises. In these situations, we tend to forget that we may have made trade-off decisions that led to the use of a feature in the first place.

When we find ourselves in such situations, while it may feel more costly, it may be worth reflecting on whether it is more cost-effective to return to the original problem, design the solution based on the new product’s features, and maximize the benefit.

Fluentd to Fluent Bit Portability a possibility?

This may start to sound like Fluentd to Fluent Bit portability is impossible and requires us to build our monitoring stack from scratch. Based on my experiences, these tools are siblings, not direct descendants, so there will be differences. In the work on building a migration tool (a project I’ve not had an opportunity to conclude), I’d suggest there is an 85/15 match, and migration is achievable, and the bulk of such a task can be automated. But certainly not all, and just seeking a push-button cutover means you’ll miss the opportunity to take advantage of the newer features.

Do you need a migration?

Fluentd may not be on the bleeding edge feature-wise now, but if your existing systems aren’t evolving and demanding or able to benefit from switching to Fluent Bit, then why force the migration? Let your core product path drive the transition from one tool to another. Remember, the two have interoperable protocols – so a mixed estate is achievable, and for the most part will be transparent.

Fluent Bit 3.2: YAML Configuration Support Explained

23 Monday Dec 2024

Posted by mp3monster in Fluentbit, General, Technology

≈ Leave a comment

Tags

book, Cloud, config, configuration, development, Fluent Bit, parsers, streams, stream_task, YAML

Among the exciting announcements for Fluent Bit 3.2 is the support for YAML configuration is now complete. Until now, there have been some outliers in the form of details, such as parser and streamer configurations, which hadn’t been made YAML compliant until now.

As a result, the definitions for parsers and streams had to remain separate files. That is no longer the case, and it is possible to incorporate parser definitions within the same configuration file. While separate configuration files for parsers make for easier re-use, it is more troublesome when incorporating the configuration into a Kubernetes deployment configuration, particularly when using a side-car deployment.

Parsers

With this advancement, we can define parsers like this:

Classic Fluent Bit

[PARSER]
    name myNginxOctet1
    format regex
    regex (?<octet1>\d{1,3})

YAML Configuration

parsers:
  - name: myNginxOctet1
    format: regex
    regex: '/(?<octet1>\d{1,3})/'

As the examples show, we swap [PARSER] for a parsers object. Then, each parser is an array of attributes starting with the parser name. The names follow a one-to-one mapping in most cases. This does break down when it comes to parsers where we can define a series of values, which in classic format would just be read in order.

Multiline Parsers

When using multiline parsers, we must provide different regular expressions for different lines. In this situation, we see each set of attributes become a list entry, as we can see here:

Classic Fluent Bit

[MULTILINE_PARSER]
  name multiline_Demo
  type regex
  key_content log
  flush_timeout 1000
  #
  # rule|<state name>|<regex>|<next state>
  rule "start_state" "^[{].*" "cont"
  rule "cont" "^[-].*" "cont"

YAML Configuration

multiline_parsers:
  - name: multiline_Demo
    type: regex
    rules:
    - state: start_state
      regex: '^[{].*'
      next_state: cont
    - state: cont
      regex: "^[-].*"
      next_state: cont

In addition to how the rules are nested, we have moved from several parameters within a single attribute(rule) to each rule having several discrete elements (regex, next_state). In addition to this, we have also changed the use of single and double quote marks.

If you want to keep the configurations for parsers and streams separate, we can continue to do so, referencing the file and name from the main configuration file. While converting the existing conf to a YAML format is the bulk of the work, in all likelihood, you’ll change the file extension to be .YAML will means you must also modify the referencing parsers_file reference in the server section of the main configuration file.

Streams

Streams follow very much the same path as parsers. However, we do have to be a lot more aware of the query syntax to remain within the YAML syntax rules.

Classic Fluent Bit

[STREAM_TASK]
  name selectTaskWithTag
  exec SELECT record_tag(), rand_value FROM STREAM:random.0;

[STREAM_TASK]
  name selectSumTask
  exec SELECT now(), sum(rand_value)   FROM STREAM:random.0;

[STREAM_TASK]
  name selectWhereTask
  exec SELECT unix_timestamp(), count(rand_value) FROM STREAM:random.0 where rand_value > 0;

YAML Configuration

stream_processor:
  - name: selectTaskWithTag
    exec: "SELECT record_tag(), rand_value FROM STREAM:random.0;"
  - name: selectSumTask
    exec: "SELECT now(), sum(rand_value) FROM STREAM:random.0;"
  - name: selectWhereTask
    exec: "SELECT unix_timestamp(), count(rand_value) FROM STREAM:random.0 where rand_value > 0;"

Note, it is pretty common for Fluent Bit YAML to use the plural form for each of the main blocks, although stream definition is an exception to the case. Additionally, both stream_processor and stream_task are accepted (although stream_task is not recognized in the main configuration file)..

Incorporating Configuration directly into the core configuration file

To support directly incorporating these definitions into a single file, we can lift the YAML file contents and apply them as root elements (i.e., at the same level as the pipeline, and service, for example).

Fluent Bit book examples

Our Fluent Bit book (Manning, Amazon UK, Amazon US, and everywhere else) has several examples of using parsers and streams in its GitHub repo. We’ve added the YAML versions of the configurations illustrating parsers and stream processing to its repository in the Extras folder.

Binary Large Objects with Fluent Bit

16 Monday Dec 2024

Posted by mp3monster in Fluentbit, General, Technology

≈ Leave a comment

Tags

3.2.2, Azure, Binary object, BLOB, configuration, Fluent Bit, use cases

When I first heard about Fluent Bit introducing the support binary large objects (BLOBs) in release 3.2. I was a bit surprised; often, handling such data structures is typical, and some might see it as an anti-pattern. Certainly, trying to pass such large objects through the buffers could very quickly blow up unless buffers are suitably sized.

But rather than rush to judgment, the use cases for handling blobs became clear after a little thought. First of all, there are some genuine use cases. The scenarios I’d look to blobs to help are for:

Microsoft applications can create dump files (.dmp). This is the bundling of not just the stack traces but the state, which can include a memory dump and contextual data. The file is binary in nature, and guess what? It can be rather large.
While logs, traces, and metrics can tell us a lot about why a component or application failed, sometimes we have to see the payload that is being processed – is there something in the data we never anticipated? There are several different payloads that we are handling increasingly even with remote and distributed devices, namely images and audio. While we can compress these kinds of payloads, sometimes that isn’t possible as we lose fidelity through compression, and the act of compression can remove the very artifact we need.

Real-world use cases

This later scenario I’d encountered previously. We worked with a system designed to send small images as part of product data through a messaging system, so the data was disturbed by too many endpoints. A scenario we encountered was the master data authoring system, which didn’t have any restrictions on image size. As a result, when setting up some new products in the supply chain system, a new user uploaded the ultra-high-resolution marketing images before they’d been prepared for general use. As you can imagine, these are multi-gigabyte images, not the 10s or 100s of kilobytes expected. The messaging’s allocated storage structures couldn’t cope with the payload.

We had to remotely access the failure points at the time to see what was happening and realize the issue. While the environment was distributed, it wasn’t as distributed as systems can be today, so remote access wasn’t so problematic. But in a more distributed use case, or where the data could have been submitted to the enterprise more widely, we’d probably have had more problems. Here is a case where being able to move a blob would have helped.

A similar use case was identified in the recent Release Webinar presented by Eduardo Silva Pereira, and a use case with these characteristics was explained. With modern cars, particularly self-driving vehicles, being able to transfer imagery back in the event navigation software experiences a problem is essential.

Avoid blowing up buffers.

To move the Blob without blowing up the buffering, the input plugin tells the blob-consuming output plugin about the blob rather than trying to shunt the GBs through the buffer. The output plugin (e.g., Azure Blob) takes the signal and then copies the file piece by piece. By consuming their blob in parts, we reduce the possible impacts of network disruption (ever tried to FTP a very large file over a network for the connection to briefly drop, as a result needing to from scratch?). The sender and receiver use a database table to track the communication and progress of the pieces and reassemble the blob. Unlike other plugins, there is a reverse flow from the output plugin back to the blob plugin to enable the process to be monitored. Once complete, the input plugin can execute post-transfer activities.

This does mean that the output plugin must have a network ‘line of sight’ to the blob when this is handled within a single Fluent Bit node – but it is something to consider if you want to operate in a more distributed model.

A word to the wise

Binary objects are known to be a means by which malicious code can easily be transported within an organization. This means that while observability tooling can benefit from being able to centralize problematic data for us to examine further, we could unwittingly help a malicious actor.

We can protect ourselves in several ways. Firstly, we must first understand and ensure the source location for the blob can only contain content that we know and understand. Secondly, wherever the blob is put, make sure it is ring-fenced and that the content is subject to processes such as malware detection.

Limitations

As the blob is handled with a new payload type, the details transmitted aren’t going to be accessible to any other plugins, but given how the mechanism works, trying to do such things wouldn’t be very desirable.

Input plugin configuration

At the time of writing, the plugin configuration details haven’t been published, but with the combination of the CLI and looking at the code, we do know the input plugin has these parameters:

Attribute Name	Description
path	Location to watch for blob files – just like the path for the tail plugin
exclude_pattern	We can define patterns that exclude files other than our blob files. The pattern logic, is the same as all other Fluent Bit patterns.
database_file	These are the same options as upload_success_action but are applied if the upload fails.
scan_refresh_interval	These are the same options as upload_success_action but are applied if the upload fails.
upload_success_action	This is a value that tells the plugin what to do, when successful. The options are: 0. Do nothing – the default action if no option is provided. delete (1). Delete the blob file add_suffix (2). Emit a Fluent Bit log record emit_log (3). Add suffix to the file – as defined by upload_success_suffix
upload_success_suffix	If the upload success_action is set to use a suffix, then the value provided here will be used as the suffix.
upload_success_message	This text will be incorporated into the Fluent Bit logs
upload_failure_action	These are the same options as upload_success_action but applied if the upload fails.
upload_failure_suffix	This is the failure version of upload_success_suffix
upload_failure_message	This is the failure version of upload_success_message

Output Options

Currently, the only blob output option is for the Azure Blob output plugin that works with the Azure Blob service, but support through using the Amazon S3 standard is being worked on. Once this is available, the feature will be widely available as the S3 standard is widely supported, including all the hyperscalers.

Note

The configuration information has been figured out by looking at the code. We’ll return to this subject when the S3 endpoint is provided and use something like Minio to create a local S3 storage capability.

Fluent Bit config from classic to YAML

02 Tuesday Jul 2024

Posted by mp3monster in Fluentbit, General, java, Technology

≈ 2 Comments

Tags

configuration, development, FluentBit, format, tool, YAML

Fluent Bit supports both a classic configuration file format and a YAML format. The support for YAML reflects industry direction. But if you’ve come from Fluentd to Fluent Bit or have been using Fluent Bit from the early days, you’re likely to be using the classic format. The differences can be seen here:

[SERVICE]
    flush 5
    log_level debug
[INPUT]
   name dummy
   dummy {"key" : "value"}
   tag blah
[OUTPUT]
   name stdout
   match *
#
# Classic Format
#

service:
    flush: 1
    log_level: info
pipeline:
    inputs:
        - name: dummy
          dummy: '{"key" : "value"}'
          tag: blah
    outputs:
        - name: stdout
          match: "*"
#
# YAML Format
#

Why migrate to YAML?

Beyond having a consistent file format, the driver is that some new features are not supported by the classic format. Currently, this is predominantly for Processors; it is fair to assume that any other new major features will likely follow suit.

Migrating from classic to YAML

The process for migrating from classic to YAML has two dimensions:

Change of formatting
- YAML indentation and plugins as array elements
- addressing any quirks such as wildcard (*) being quoted, etc
Addressing constraints such as:
- Using include is more restrictive
- Ordering of inputs and outputs is more restrictive – therefore match attributes need to be refined.

None of this is too difficult, but doing it by hand can be laborious and easy to make mistakes. So, we’ve just built a utility that can help with the process. At the moment, this solution is in an MVP state. But we hope to have beefed it up over the coming few weeks. What we plan to do and how to use the util are all covered in the GitHub readme.

The repository link (fluent-bit-classic-to-yaml-converter)

Update 4th July 24

A quick update to say that we now have a container configuration in the repository to make the tool very easy to use. All the details will be included in the readme, along with some additional features.

Update 7th July

We’ve progressed past the MVP state now. The detected include statements get incorporated into a proper include block but commented out.

We’ve added an option to convert the attributes to use Kubernetes idiomatic form, i.e., aValue rather than a_value.

The command line has a help option that outputs details such as the control flags.

Update 12th July

In the last couple of days, we pushed a little too quickly to GitHub and discovered we’d broken some cases. We’ve been testing the development a lot more rigorously now, and it helps that we have the regression container image working nicely. The Javadoc is also generating properly.

We have identified some edge cases that need to be sorted, but most scenarios have been correctly handled. Hopefully, we’ll have those edge scenarios fixed tomorrow, so we’ll tag a release version then.

Fluent Bit config – Railroad Diagrams

13 Wednesday Dec 2023

Posted by mp3monster in Fluentbit, General, railroad diagrams, Technology

≈ Leave a comment

Tags

configuration, diagram, FluentBit, railroad, stream processor, syntax

I’ve written about how railroad syntax diagrams (see here) are great for helping write code (or configuration files). Following the track through the diagram will give you the correct statement syntax, and generally, the diagrams don’t require you to jump around like BNF or eBNF representations.

As you may have seen, I’m currently writing a book on Fluent Bit, and guess what? We’re on the Stream Processor chapter. Looking broaching the use of syntax, it just felt right to have a railroad diagram. As the diagram is fairly large, it won’t print well, so here are the diagrams.

The master content is in GitHub here. If you want a quick reference to how the diagrams work, check here.

Stream Processor Configuration

While Fluent Bit’s core syntax is pretty straightforward, the syntax for the stream processing is a bit more complex, with a strong resemblance to SQL. As SQL is declarative in nature and can contain iterative and nested elements, RailRoad diagrams can really help.

The original BNF definition of the Stream SQL syntax is here.

Fluent Bit Configuration RailRoad Diagrams

When it comes to the core configuration files, RailRoad diagrams aren’t as effective because the configuration is more declarative in nature. But we’ve tried to capture the core essence here. The only issue is that representing things like the use of @include, which can show up in most parts of the file – arent so easy, and a list of attributes for each possible standard plugin would make the diagram enormously large and unwieldy.

We know there are gaps in the current diagrams, which will be addressed. Including:

YAML format
@include
We should show that environment variables can be references as attributes
A better way to show the required indentation and line separation

API Platform – Developer Portal Delegated Authentication

18 Monday Nov 2019

Posted by mp3monster in API Platform CS, Oracle, Technology

≈ 2 Comments

Tags

API, APIPlatformCS, Cloud Service, configuration, developer, federated, IDCS, login, OAuth, Oracle, portal

The API Platform when you configure IDCS to provide the option to authenticate users against a corporate Identity Provider such as Active Directory will automatically update the Management Portal Login screen accordingly. However today it doesn’t automatically update the Developer Portal login page. Whilst perhaps an oversight, it is very easy to fix manually when you know how. As result you can have a login that looks like:

The rest of this blog will show what’s needed to fix the problem.

Continue reading →

Mastering FluentD configuration syntax

19 Thursday Sep 2019

Posted by mp3monster in Cloud, Fluentd, General, Technology

≈ Leave a comment

Tags

configuration, Fluentbit, Fluentd, google, GPC, monitoring, observability, OKE, slack

Getting to grips with FluentD configuration which describes how to handle logging event(s) it has to process can be a little odd (at least in my opinion) until you appreciate a couple of foundation points, at which point things start to click, and then you’ll find it pretty easy to understand.

It would be hugely helpful if the online documentation provided some of the points I’ll highlight upfront rather than throwing you into a simple example, which tells you about the configuration but doesn’t elaborate as deeply as may be worthwhile. Of course, that viewpoint may be born from the fact I have reviewed so many books I’ve come to expect things a certain way.

But before I highlight what I think are the key points of understanding, let me make the case getting to grips with FluentD.

Why master FluentD?

FluentD’s purpose is to allow you to take log events from many resources and filter, transform and route logging events to the necessary endpoints. Whilst is forms part of a standard Kubernetes deployment (such as that provided by Oracle and Azure for example) it can also support monolithic environments just as easily with connections working with common log formats and frameworks. You could view it as effectively a lightweight (particularly if you use FluentBit variant which is effectively a pared-back implementation) middleware for logging.

If this isn’t sufficient to convince you, if Google searches are a reflection of adoption, then my previous post reflecting upon Observability -London Oracle Developer Meetup shows a plot reflecting the steady growth. This is before taking into account that a number of cloud vendors have wrapped Fluentd/fluentbit into their wider capabilities such as Google (see here).

Not only can you see it as middleware for logging it can also have custom processes and adapters built through the use of Ruby Gems, making it very extensible.

FluentD

Remember these points

and mastering the config should be a lot easier…

Continue reading →

Ansible Book Review Part 4

14 Saturday Mar 2015

Posted by mp3monster in Book Reviews, Books, Packt, Technology

≈ 1 Comment

Tags

Ansible, automation, AWS, book, Chef, configuration, deployment, DigitalOcean, Docker, Hadoop, Packt, Packt Publishing, Puppet, Puppet Labs, review

This the final part of the detailed look at Packt book, Learning Ansible. As the book says in the opening to chapter 6 we’re into the back straight, into the final mile. The first of two final chapters look at provisioning of platforms on Amazon AWS, DigitalOcean and the use of the very hip and cool Docker plus updating your inventory of systems given that we have dynamically introduced new ones. The approach is illustrated by not only instantiating servers but delivering a configured Hadoop cluster. As with everything else we’ve seen in Ansible there isn’t a standardised approach to all IaaS platforms as that restricts you the lowest common denominator which is contrary to Ansible goals described early on. But deploying the Hadoop elements on the two cloud IaaS providers is common. Although the chapter is pretty short, I did have to read through this more carefully, as the book leverages a lot of demonstrated features from previous chapters (configuration arrays etc) which meant seeing the key element of the interaction with AWS was harder. It does mean if you tried diving into this chapter straight away, although not impossible does require a bit more investment from the reader to see all the value points. That said it is great to see through the use of the various features how easy to setup the provisioning in the cloud is, and the inventory update. Perhaps the win would have been to just so the simple provision and then the clever approach.

Chapter 7 focuses on Deployment. When I read this, I was a little nonplussed, hadn’t we been reading about this in the previous 6 chapters. But when you look at the definition provided:

“To position (troops) in readiness for combat, as along a front or line.”
Excerpt From: “Learning Ansible.” Packt Publishing.

You can start to see the true target of what we’re really thinking about, which is the process of going from software build to production readiness. So having gone through the software packaging activities you need to orchestrate the deployment across potentially multiple servers across a server farm. This orchestration piece is really just pulling together everything that has been explained before but also share some Ansible best practise. Then finally an examination of the Ansible approach for the nodes to pull deployments and updates.

The final piece of the book is an Appendix which looks at the work to bring Ansible to the Windows platform, Ansible Galaxy and Ansible Tower. Ansible Galaxy is a repository of roles build by the Ansible community. Ansible Tower provides a web front end to the Ansible server. The Tower product is the commercial side of the Ansible company – and effectively sales here fund the full time Ansible development effort.

So to summarise …

The Learning Ansible book explains from first principles to the very rich capabilities of building packaging software, instantiating cloud servers or containers through to configuring systems and deploying applications into new environments; and then capturing instantiated system details into the Ansible inventory. How Ansible compares with the more established solutions in this space in the form of Puppet and Chef is discussed, and the pros and cons of the different tools. All the way through, the books has been written in an easy engaging manner. You might even say wonderfully written. The examples are very good with the possible exception of 2 cases (just merely good in my opinion), the examples are supported with very clear explanations that demonstrate the power of the Ansible product. Even if you choose not to use Ansible, this book does an excellent job of showing the value of not resorting to the ‘black art’ of system build and configuration and suggesting good ways to realising automation of this kind of activity, in many place undoubtedly thought provoking

Prior Review Parts:

Scenario

Filters as Processors and Chaining

Processor Conditions

Conclusion

Operators available

Share this:

Ask the LLM to list all the plugins it knows about

Ask the LLM about all the attributes for a Plugin

Ask the LLM to create a regular expression

Ask the LLM to create a simple filter and output

Ask the LLM to use a stream processor

Creating Visualizations of Configurations

Lua and other advanced operations

General observations

Share this:

What do Fluentd’s Labels Do?

Fluent Bit’s routing

Migration options

Refactor

REGEX

Multiple Fluent Bit Instances

Conclusion

Fluentd to Fluent Bit Portability a possibility?

Do you need a migration?

More reading

Share this:

Parsers

Classic Fluent Bit

YAML Configuration

Multiline Parsers

Classic Fluent Bit

YAML Configuration

Streams

Classic Fluent Bit

YAML Configuration

Incorporating Configuration directly into the core configuration file

Fluent Bit book examples

Share this:

Real-world use cases

Avoid blowing up buffers.

A word to the wise

Limitations

Input plugin configuration

Output Options

Note

Share this:

Why migrate to YAML?

Migrating from classic to YAML

The repository link (fluent-bit-classic-to-yaml-converter)

Update 4th July 24

Update 7th July

Update 12th July

Share this:

Stream Processor Configuration

Fluent Bit Configuration RailRoad Diagrams

Share this:

Share this:

Why master FluentD?

Remember these points

Share this:

Share this: