The new book focuses on Fluent Bit, given its significant advances, reflected by the fact it is now at Version 2 and is deserving of its own title. The new book is a free-standing book but is complimentary to the Logging In Action book. Logging in Action focuses on Fluentd but compliments by addressing more deeply deployment strategies for Fluent Bit and Fluentd. The new book engages a lot more with OpenTelemetry now it has matured, along with technologies such as Prometheus.
It is the fact that we’ve seen increasing focus in the cloud native space on startup speeds and efficiency in footprints that have helped drive Fluent Bit, as it operates with native binaries rather than using a Just-In-Time compilation like Ruby (used for Fluentd). The other significant development is the support for OpenTelemetry.
The book has entered the MEAP (Manning Early Access Program). The first three chapters have been peer-reviewed, and changes have been applied; another three are with the development editor. If you’ve not read a MEAP title before, you’ll find the critical content is in the chapters, but from my experience, as we work through the process, the chapters improve as feedback is received. In addition, as an author, when we have had time away from a chapter and then revisit it – it is easier to spot things that aren’t as clear as they could be. So, it is always worth returning as a reader and looking at chapters. Then, as we move to the production phases, any linguistic or readability issues that still exist are addressed as a copy editor goes through the manuscript.
I’d like to thank those involved with the peer review. Their suggestions and insights have been really helpful. Plus, the team at Calyptia is sponsoring the book (and happens to be employing a number of the Fluent Bit contributors).
We also have a discount code on the book, valid until 20th November – mlwilkins2
Development trends have shown a shift towards precompiled languages like Go and Rust away from interpreted or Just-In-Time (JIT) compiled languages like Java and Ruby as it removes the startup time of the language virtual machine and the JIT compiler as well as a smaller memory footprint. All desirable features when you’re scaling containerized solutions and percentage point savings can really add up.
Oracle has been leading the way with its work on GraalVM for some years now, and as a result, not only can GraalVM be used to produce native binary images from Java code, GraalVM also supports TuffleRuby and GraalPy, among others. As TruffleRuby is an open-source project, Oracle isn’t the only vendor contributing to it, work effort has also come from Shopify.
Helping Ruby move forward isn’t new for the Shopify engineering team, and part of that investment is that they have just announced the open-sourcing of a toolchain called Ruvy. Ruvy takes Ruby and creates a WebAssembly (WASM) from it the code. This builds on the existing project ruby.wasm. In doing so they’ve addressed the Ruby startup overhead of the language VM we mentioned. They have also simplified the process of deployment, eliminating the need for Web Assembly System Interface (WASI) arguments, and overcome constraints of class loading by reading files by having the code bundled within the assembly and then accessing the content using WASI-VFS, a simple virtual file system.
The published benchmarks show a massive performance boost in the process of executing where the Ruby code needs to be executed by the packaged JIT. For me, this is interesting as one of the related cloud-native trends is the shift from Fluentd to Fluent Bit. Fluentd was built with Ruby and has a huge portfolio of third-party extensions. But Fluent Bit is built using C to get those performance gains previously described. But it does support plugins through WASM. This raises an interesting question can we take existing Ruby plugins and wrap them so the required interfacing works – which should be minimal and more likely to be impacted by the fact Fluent Bit v2 has refined the internal data structure that was common to both Fluentd and Fluent Bit to allow Fluent Bit to more easily engaged with OpenTelemetry.
If the extra bit of wrapping code isn’t complex, then applying Ruvy should mean the core plugin can then work with Fluent Bit. If this can be templated, then Fluent Bit is going to make a big leap forward with the number of available plugins.
We’ve just had a new article published for Software Engineering Daily which looks at monitoring in multi-cloud and hybrid use cases and highlights some strategies that can help support the single pane of glass by exploiting features in tools such as Fluentd and Fluentbit that perhaps aren’t fully appreciated. Check it out …
The 12 Factor App definition is now ten years old. In the world of software that is a long time. So perhaps it’s time to revisit and review what it says. As I have spent a lot of time around Logging – I’ve focussed on Factor 11 – Logging.
I have been fortunate enough to present at the hybrid JAX London conference on this subject. It was great to get out and see people at a conference rather than just with a screen and a chat console of online-only events.
One of the areas I present publicly is the use of Fluentd. including the use of distributed and multiple nodes. As many events have been virtual it has been easy to demo everything from my desktop – everything is set up so I can demo things very easily. While doing this all on one machine does point to how compact and efficient Fluentd is as I can run multiple instances concurrently it does undermine distributed capabilities somewhat.
Add to that I now work for Oracle it makes sense to use OCI resources. With that, I have been developing the scripts to configure Ubuntu VMs to set up the demo environments installing Ruby, Fluentd, and various gems needed and pulling the relevant configurations in. All the assets can be found in the GitHub repository https://github.com/mp3monster/logging-demos. The repository readme includes plenty of information as well.
While I’ve been putting this together using OCI, the fact that everything is based on Ubuntu should mean it can be run locally on VMs, WSL2, and adaptable for MacOS as well. The environment has been configured means you can still run on Ubuntu with a single node if desired.
Additional Log Destinations
As the demo will typically be run on OCI we can not only run the demo with a multinode setup, we have extended the setup with several inclusion files so we can utilize OCI services OpenSearch and OCI Log Analytics. If you don’t want to use these services simply replace the contents of several inclusion files including files with the contents of the dummy_inclusion.conf file provided.
Representation of the Demo setup
The configuration works by each destination having one or two inclusion files. The files with the postfix of label-inclusion.conf contains the configuration to direct traffic to the respective service with a configuration that will push log events at a very high frequency to the destination. The second inclusion file injects the duplication of log events to each service. The inclusion declarations in the main node Fluentd config file references an environment variable that should provide the path to the inclusion file to use. As a result, by changing the environment variable to point to a dummy file it becomes possible o configure out the use of one of the services. The two inclusions mean we can keep the store declarations compact and show multiple labels being used. With the OpenSearch setup, we have a variant of the inclusion file model where the route inclusion can reference the logic that we would use in the label directly within the sore declaration.
The best way to see the use of the inclusions is to experiment with setting the different environment variables to reference the different files and then using the Fluentd dry-run feature (more on this in the book).
Setup script
The setup script performs a number of tasks including:
Pulling from Git all the resources needed in terms of configuration files and folders
Retrieving the necessary plugins against the possibility of their use.
Setting up the various environment variables for:
Slack token
environment variables to reference inclusion files
shortcut environment variables and aliases
network (IP) address for external services such as OpenSearch
Setting up a folder for OCI tokens needed.
Setting up temp folders to be used by OCI Plugins as a file-based cache.
Feeding the log analytics service is a more complex process to set up as the feeds need to have metadata about the events being ingested. The downside is the configuration effort is greater, but the payback is that it becomes easier to extract meaningful information quickly because the service has a greater understanding of the content. For example, attributing the logs to a type of source means the predefined or default log formats are immediately understood, and maximum meaning can be retrieved from the log event.
Going to OCI Log Analytics does cut out the need for the Connections hub, which would allow rules and routing to be defined to different OCI services which functionally can help such as directing log events to PagerDuty.
When configuring Fluentd we often need to provide credentials to access event sources, targets, and associated services such as notification tools like Slack and PagerDuty. The challenge is that we don’t want the credentials to be in clear text in the Fluentd configuration.
Using Env Vars
In the Logging In Action with Fluentd book, we illustrated how we can take the sensitive values from environment variables so the values don’t show up in the configuration file. But, we’ve seen regularly the question of how secure is this, can’t the environment variable be seen by everyone on that machine?
The answer to this question comes down to having a deeper understanding of how environment variables work. There is a really good explanation here. The long and short of it is that environment variables can only be seen by the process that creates the variable and any child process will receive a copy of the parent’s variables.
This means that if we create the variable in a shell, only that shell and any processes launched by that shell can see the environment variable. So as long as we don’t set variables up as part of a system-level configuration then we already have a level of security. So we could wrap the start of Fluentd with a script that sets the environment variables needed. Then everything launches that script.
I’ve been fortunate enough to appear on a podcast with the excellent Coding Over Cocktails team from Toro Cloud. we got to talk about some of the ideas discussed in my Logging In Action book. You can check the podcast out via their website which includes all the episode details and links to all the platforms that host the podcast. There have been some great previous guests such as Luis Weir (my old boss), Chris Richardson of Microservices.io, Matthew Reinbold from Postman, Sam Newman to name just a few.
One aspect of logging I didn’t directly address with my Fluentd book was consuming multiline logs, such as those you’ll often see when a stack trace is included in the log output. Implementing the feature with Fluentd isn’t hugely complex as it leverages the use of regular expressions (addressed in the book in more depth) to recognize the 1st line in a multiline log entry and for subsequent lines.
I didn’t address it for a couple of reasons:
Using parsers is fairly inefficient, particularly when you’re using a parser to just decide how to then transform a line (this is why I’m not a huge fan of some of the 12 Factor App‘s recommendations when it comes to logging).
Incorporating into your Fluentd parser configurations for specific app log setups is arguably increasing the level of coupling.
Many logging frameworks can talk directly to Fluentd as we saw in the book. This is can be more efficient, which means that the log event is more likely to be passed over in a structured format (therefore less work to do).
But let’s also be realistic, many applications will be configured to simply log to a file and aren’t likely to be changed. At which point we do need to process such situations, so is it done? The process remains largely the same as the tail plugin we illustrated. Except we introduce a different parser called multiline. The documentation provided by Fluentd includes several examples of multiline configurations that will work for default log formats (such as Log4J and Rails). If we took our most basic source setup:
Then assuming our log Simulator played back multiline logs (the provided configuration doesn’t do that) extended to consume standard Log4J2 logs we would have a configuration as follows:
As you can see we’ve set the parser type to multiline. Then there are two regular expressions, format_firstline is used to help recognize the start of a log event. Every line of the log is tested with this expression as we now assume unless this produces a valid result that the line will be part of a multiline event. If you look at the expression you’ll realize it is looking for a DateTime stamp in the form YYYY-MM-DD. This does mean if you generate a log that starts with the date even if it is part of a multiline output then you’ll trip up the parser. You could extend the expression – but the longer it is the slower the processing.
Following format_firstline we have in our example format1 which describes how to process the first line. This can be extended to define how to handle subsequent lines but this could be multiple format definitions. They do need to be presented in numerical order eg. format1, format2, format, and so on.
LogSimulator – Playing back multi-line logs
The Log simulator uses a very similar mechanism to Fluentd to understand how to playback multiple line logs. When it is reading the log lines in for replay it uses a regular expression to recognize the start of a new log entry (FIRSTOFMULTILINEREGEX) defined in the properties file. The simulator will concatenate lines together until it either hits the end of the file or has a new line that complies with the REGEX. It stores the line with an encoded /n (newline character). It will then print the log using the format specified and will allow the /n to create a newline (or not) based on another config parameter (ALLOWNL).
Fluentd has an incredible catalogue of plugins including notification and collaboration channels from good old-fashioned email through to slack, teams, and others.
The thing to remember if you use these channels is that if you’re sending errors, from application logs it isn’t unusual for there to be multiple error events as a root event can cause a cascade of related issues. For example, if your code is writing transactions to a database and the database goes down with no failover mechanism, then your code will most likely experience an error, roll back the transaction perhaps to some sort of queue, and then try to process the next event. Which will again fail. This is the classic situation where multiple errors will get reported for the same issue. This problem is often referred to as a mail storm given that there was a time when we didn’t have social collaboration tools and everyone used email.
There are several ways to overcome this problem. But the most simple and elegant of these is using the suppress plugin in its filter mode.
<filter **>
@type suppress
interval 60 # period in seconds when the condition to supress is triggered
num 2 # number of occurences of a value before suppressing
attr_keys source # the element of the event to consider.
</filter>
In this example if we encounter an event with an attribute called source containing the same value twice then the suppression will kick in for 60 seconds. If you want the key to the valuebeing checked to be the tag then simply omit the attr_keys parameter.
Of course, we don’t want the suppression to kick in if the same value in the attribute keys occured once every few hours. To address this the occurence count is applied over not a time period, but a number of events received by the configuration of max_slot_num which defaults to 10k, but resets
In the filter mode, this plugin is best positionbed immediately before the match block. This means we don’t accidentally suppress messages before they are routed anywhere else.
For the purposes of a demo this is less of an issue. But for a realworld use case would probably benefit from some tuning. All the documentation for this plugin is at https://github.com/fujiwara/fluent-plugin-suppress
The final leg of getting the book published has taken a lot longer than had been expected. But we’ve just been told that the book has been sent to the printers. This means:
final eBook will be available from Manning in about 1 week
Preordered print copies of the book will be dispatched in about 2 weeks
The alternative ebook formats for mobile readers e.g. kindle etc available in about 3 weeks
The book will become available to purchase from other book stores such as Amazon in 3-4 weeks
Safari Books Online and Apple stores will have the ebook in 4-5 weeks
It also means the project from a writing perspective is complete. But we’re starting to look at the additional examples we’ll add to the GitHub repo. These will be dependent upon the book.
You must be logged in to post a comment.