, , , , , , , ,

Getting to grips with FluentD configuration which describes how to handle logging event(s) it has to process can be a little odd (at least in my opinion) until you appreciate a couple of foundation points, at which point things start to click, and then you’ll find it pretty easy to understand.

It would be hugely helpful if the online documentation provided some of the points I’ll highlight upfront rather than throwing you into a simple example, which tells you about the configuration but doesn’t elaborate as deeply as may be worthwhile. Of course, that viewpoint may be born from the fact I have reviewed so many books I’ve come to expect things a certain way.

But before I highlight what I think are the key points of understanding, let me make the case getting to grips with FluentD.

Why master FluentD?

FluentD’s purpose is to allow you to take log events from many resources and filter, transform and route logging events to the necessary endpoints. Whilst is forms part of a standard Kubernetes deployment (such as that provided by Oracle and Azure for example) it can also support monolithic environments just as easily with connections working with common log formats and frameworks. You could view it as effectively a lightweight (particularly if you use FluentBit variant which is effectively a pared-back implementation) middleware for logging.

If this isn’t sufficient to convince you, if Google searches are a reflection of adoption, then my previous post reflecting upon Observability -London Oracle Developer Meetup shows a plot reflecting the steady growth.  This is before taking into account that a number of cloud vendors have wrapped Fluentd/fluentbit into their wider capabilities such as Google (see here).

Not only can you see it as middleware for logging it can also have custom processes and adapters built through the use of Ruby Gems, making it very extensible.


Remember these points

and mastering the config should be a lot easier…

YAML or XML like elements

The configuration looks like a blend of XML and YAML like notation with the inclusion of the @ character in particular places. So…

  • The use of XML style braces provide use the bounding/scope to each section of the configuration,
  • The YAML like values defines the values that dictate the behavior of the different components,
  • Anything preceded by an @ is a directive to FluentD – I.e. what it should execute.

Config is translated into pipelines

Each source represents the start of a pipeline, which in turn becomes a thread. The configurations such as filters, outputs are copied into that thread. Then this doesn’t stop you defining a filter which can operate on multiple sources. The result is that the filter will effectively be copied into each thread.

The order in which steps are executed can also be controlled through the use of labels.

Tags are king

Tags are used in the match statements (underlined). Like this…

<match test.cycle>
@type stdout

In the example provided in the intro (here), the tag doesn’t jump out to you. This is because the tag is derived from the path part of the URL of the call,

@type http
port 8888

as a result of the source, in this case, doesn’t need to set any tag, but you can see the matching clauses on the tag. The matching element then acts upon the tag, except * which is a wildcard.

The tag in the URL is underlined here to, help show the connection…

$ curl -i -X POST -d ‘json={“action”:”login”,”user”:2}’ http://localhost:8888/test.cycle
HTTP/1.1 200 OK
Content-type: text/plain
Connection: Keep-Alive
Content-length: 0

Everything is a JSON object

The log events are represented as a JSON object, and this means you can reference the values by the element name. As the inbound event is already JSON it’s content can be referenced directly by just using its element name.

In the intro, we see the example of using a filter like this (I’ve added the emphasis to show relationships) …

<filter test.cycle>
@type grep
exclude1 action logout

If you’re used to more conventional grep and regex expressions, this isn’t entirely self-evident. Until you look at the call that creates the log events and you see the highlighting …

$ curl -i -X POST -d ‘json={“action“:”login“,”user”:2}’ http://localhost:8888/test.cycle
HTTP/1.1 200 OK
>Content-type: text/plain
Connection: Keep-Alive
Content-length: 0
$ curl -i -X POST -d ‘json={“action“:”logout“,”user”:2}’ http://localhost:8888/test.cycle
HTTP/1.1 200 OK
Content-type: text/plain
Connection: Keep-Alive
Content-length: 0

With the emphasis, it is, I hope clearer that the expression is directly linking element names and values.

I have provided some example configurations in my GitHub repo. Which may help illustrate the points made here.