One of the really advanced features of Fluent Bit’s use of Lua scripts is the ability to split a single log event so downstream processing can process multiple log events. In the Logging and Telemetry book, we didn’t have the space to explore this possibility. Here, we’ll build upon our understanding of how to use Lua in a filter. Before we look at how it can be done, let’s consider why it might be done.
Why Split Fluent Bit events
This case primarily focuses on the handling of log events. There are several reasons that could drive us to perform the split. Such as:
- Log events contain metrics data (particularly application or business metrics). Older systems can emit some metrics through logging such as the time to complete a particular process within the code. When data like this is generated, ideally, we expose it to tools most suited to measuring and reporting on metrics, such as Prometheus and Grafana. But doing this has several factors to consider:
- A log record with metrics data is unlikely to generate the data in a format that can be directed straight to Prometheus.
- We could simply transform the log to use a metrics structure, but it is a good principle to retain a copy of the logs as they’re generated so we don’t lose any additional meaning, which points to creating a second event with a metrics structure. We may wish to monitor for the absence of such metrics being generated, for example.
- When transactional errors occur, the logs can sometimes contain sensitive details such as PII (Personally Identifiable Information). We really don’t want PII data being unnecessarily propagated as it creates additional security risks – so we mask the PII data for the event to go downstream. But, at the same time, we want to know the PII ID to make it easier to identify records that may need to be checked for accuracy and integrity. We can solve this by:
- Copying the event and performing the masking with a one-way hash
- Create a second event with the PII data, which is limited in its propagation and is written to a data store that is sufficiently secured for PII data, such as a dedicated database
In both scenarios provided, the underlying theme is creating a version of the event to make things downstream easier to handle.
Implementing the solution
The key to this is understanding how the record construct is processed as it gets passed back and forth. When the Lua script receives an event, it arrives in our script as a table construct (Java developers, this approximates a HashMap), with the root elements of the record representing the event payload.
Typically, we’d manipulate the record and return it with a flag saying the structure has changed, but it is still a table. But we could return an array of tables. Now each element (array entry) will be processed as its own log event.
A Note on how Lua executes copying
When splitting up the record, we need to understand how Lua handles its data. if we tried to create the array with the code:
record1 = record
record2 = record
newRecord[record1, record2]
Then we manipulated newRecord[1] We would still impact both records; this is because Lua, like its C underpinning, always uses shallow references rather than deep copies of objects. So we need to ensure we perform a deep copy before manipulating the records. You can see this in our example configuration (here on GitHub), or look at the following Lua code fragment:
function copy(obj)
if type(obj) ~= 'table' then return obj end
local res = {}
for k, v in pairs(obj) do res[copy(k)] = copy(v) end
return res
end
The proof
To illustrate the behavior, we have created a configuration with a single dummy plugin that only emits a single event. That event is then picked up by a Filter with our Lua script. After the filter, we have a simple output plugin. As a result of creating two records, we should see two output entries. To make it easy to compare, in the Lua script, we have a flag called deepCopy; when set to true – we’ll clone the records and modify payload values; when set to true – we then perform the split.
[SERVICE]
flush 1
[INPUT]
name dummy
dummy { "time": "12/May/2023:08:05:52 +0000", "remote_ip": "10.4.72.163", "remoteuser": "-", "request": { "verb": "GET", "path": " /downloads/product_2", "protocol": "HTTP", "version": "1.1" }, "response": 304}
samples 1
tag dummy1
[FILTER]
name lua
match *
script ./advanced.lua
call cb_advanced
protected_mode true
[OUTPUT]
name stdout
match *
Limitations and solutions
While we can easily split events up and return multiple records, we can’t use different tags or timestamps. Using the same timestamp is pretty sensible, but different tags may be more helpful if we want to route the different records in other ways.
As long as the record contains the value we want to use as a tag, we can add to the pipeline a tag-write plugin and point it to the attribute to parse with a REGEX. To keep things efficient, if we create an element that is just the tag when creating the new record, then the REGEX becomes a very simple expression to match the value.
Conclusion
We’ve seen a couple of practical examples of why we might want to spin out new observability events based on what we get from our system. An important aspect of the process is how Lua handles memory.




You must be logged in to post a comment.