We’ve just had a new article published for Software Engineering Daily which looks at monitoring in multi-cloud and hybrid use cases and highlights some strategies that can help support the single pane of glass by exploiting features in tools such as Fluentd and Fluentbit that perhaps aren’t fully appreciated. Check it out …
The 12 Factor App definition is now ten years old. In the world of software that is a long time. So perhaps it’s time to revisit and review what it says. As I have spent a lot of time around Logging – I’ve focussed on Factor 11 – Logging.
I have been fortunate enough to present at the hybrid JAX London conference on this subject. It was great to get out and see people at a conference rather than just with a screen and a chat console of online-only events.
One of the areas I present publicly is the use of Fluentd. including the use of distributed and multiple nodes. As many events have been virtual it has been easy to demo everything from my desktop – everything is set up so I can demo things very easily. While doing this all on one machine does point to how compact and efficient Fluentd is as I can run multiple instances concurrently it does undermine distributed capabilities somewhat.
Add to that I now work for Oracle it makes sense to use OCI resources. With that, I have been developing the scripts to configure Ubuntu VMs to set up the demo environments installing Ruby, Fluentd, and various gems needed and pulling the relevant configurations in. All the assets can be found in the GitHub repository https://github.com/mp3monster/logging-demos. The repository readme includes plenty of information as well.
While I’ve been putting this together using OCI, the fact that everything is based on Ubuntu should mean it can be run locally on VMs, WSL2, and adaptable for MacOS as well. The environment has been configured means you can still run on Ubuntu with a single node if desired.
Additional Log Destinations
As the demo will typically be run on OCI we can not only run the demo with a multinode setup, we have extended the setup with several inclusion files so we can utilize OCI services OpenSearch and OCI Log Analytics. If you don’t want to use these services simply replace the contents of several inclusion files including files with the contents of the dummy_inclusion.conf file provided.
Representation of the Demo setup
The configuration works by each destination having one or two inclusion files. The files with the postfix of label-inclusion.conf contains the configuration to direct traffic to the respective service with a configuration that will push log events at a very high frequency to the destination. The second inclusion file injects the duplication of log events to each service. The inclusion declarations in the main node Fluentd config file references an environment variable that should provide the path to the inclusion file to use. As a result, by changing the environment variable to point to a dummy file it becomes possible o configure out the use of one of the services. The two inclusions mean we can keep the store declarations compact and show multiple labels being used. With the OpenSearch setup, we have a variant of the inclusion file model where the route inclusion can reference the logic that we would use in the label directly within the sore declaration.
The best way to see the use of the inclusions is to experiment with setting the different environment variables to reference the different files and then using the Fluentd dry-run feature (more on this in the book).
Setup script
The setup script performs a number of tasks including:
Pulling from Git all the resources needed in terms of configuration files and folders
Retrieving the necessary plugins against the possibility of their use.
Setting up the various environment variables for:
Slack token
environment variables to reference inclusion files
shortcut environment variables and aliases
network (IP) address for external services such as OpenSearch
Setting up a folder for OCI tokens needed.
Setting up temp folders to be used by OCI Plugins as a file-based cache.
Feeding the log analytics service is a more complex process to set up as the feeds need to have metadata about the events being ingested. The downside is the configuration effort is greater, but the payback is that it becomes easier to extract meaningful information quickly because the service has a greater understanding of the content. For example, attributing the logs to a type of source means the predefined or default log formats are immediately understood, and maximum meaning can be retrieved from the log event.
Going to OCI Log Analytics does cut out the need for the Connections hub, which would allow rules and routing to be defined to different OCI services which functionally can help such as directing log events to PagerDuty.
When configuring Fluentd we often need to provide credentials to access event sources, targets, and associated services such as notification tools like Slack and PagerDuty. The challenge is that we don’t want the credentials to be in clear text in the Fluentd configuration.
Using Env Vars
In the Logging In Action with Fluentd book, we illustrated how we can take the sensitive values from environment variables so the values don’t show up in the configuration file. But, we’ve seen regularly the question of how secure is this, can’t the environment variable be seen by everyone on that machine?
The answer to this question comes down to having a deeper understanding of how environment variables work. There is a really good explanation here. The long and short of it is that environment variables can only be seen by the process that creates the variable and any child process will receive a copy of the parent’s variables.
This means that if we create the variable in a shell, only that shell and any processes launched by that shell can see the environment variable. So as long as we don’t set variables up as part of a system-level configuration then we already have a level of security. So we could wrap the start of Fluentd with a script that sets the environment variables needed. Then everything launches that script.
I’ve been fortunate enough to appear on a podcast with the excellent Coding Over Cocktails team from Toro Cloud. we got to talk about some of the ideas discussed in my Logging In Action book. You can check the podcast out via their website which includes all the episode details and links to all the platforms that host the podcast. There have been some great previous guests such as Luis Weir (my old boss), Chris Richardson of Microservices.io, Matthew Reinbold from Postman, Sam Newman to name just a few.
One aspect of logging I didn’t directly address with my Fluentd book was consuming multiline logs, such as those you’ll often see when a stack trace is included in the log output. Implementing the feature with Fluentd isn’t hugely complex as it leverages the use of regular expressions (addressed in the book in more depth) to recognize the 1st line in a multiline log entry and for subsequent lines.
I didn’t address it for a couple of reasons:
Using parsers is fairly inefficient, particularly when you’re using a parser to just decide how to then transform a line (this is why I’m not a huge fan of some of the 12 Factor App‘s recommendations when it comes to logging).
Incorporating into your Fluentd parser configurations for specific app log setups is arguably increasing the level of coupling.
Many logging frameworks can talk directly to Fluentd as we saw in the book. This is can be more efficient, which means that the log event is more likely to be passed over in a structured format (therefore less work to do).
But let’s also be realistic, many applications will be configured to simply log to a file and aren’t likely to be changed. At which point we do need to process such situations, so is it done? The process remains largely the same as the tail plugin we illustrated. Except we introduce a different parser called multiline. The documentation provided by Fluentd includes several examples of multiline configurations that will work for default log formats (such as Log4J and Rails). If we took our most basic source setup:
Then assuming our log Simulator played back multiline logs (the provided configuration doesn’t do that) extended to consume standard Log4J2 logs we would have a configuration as follows:
As you can see we’ve set the parser type to multiline. Then there are two regular expressions, format_firstline is used to help recognize the start of a log event. Every line of the log is tested with this expression as we now assume unless this produces a valid result that the line will be part of a multiline event. If you look at the expression you’ll realize it is looking for a DateTime stamp in the form YYYY-MM-DD. This does mean if you generate a log that starts with the date even if it is part of a multiline output then you’ll trip up the parser. You could extend the expression – but the longer it is the slower the processing.
Following format_firstline we have in our example format1 which describes how to process the first line. This can be extended to define how to handle subsequent lines but this could be multiple format definitions. They do need to be presented in numerical order eg. format1, format2, format, and so on.
LogSimulator – Playing back multi-line logs
The Log simulator uses a very similar mechanism to Fluentd to understand how to playback multiple line logs. When it is reading the log lines in for replay it uses a regular expression to recognize the start of a new log entry (FIRSTOFMULTILINEREGEX) defined in the properties file. The simulator will concatenate lines together until it either hits the end of the file or has a new line that complies with the REGEX. It stores the line with an encoded /n (newline character). It will then print the log using the format specified and will allow the /n to create a newline (or not) based on another config parameter (ALLOWNL).
Fluentd has an incredible catalogue of plugins including notification and collaboration channels from good old-fashioned email through to slack, teams, and others.
The thing to remember if you use these channels is that if you’re sending errors, from application logs it isn’t unusual for there to be multiple error events as a root event can cause a cascade of related issues. For example, if your code is writing transactions to a database and the database goes down with no failover mechanism, then your code will most likely experience an error, roll back the transaction perhaps to some sort of queue, and then try to process the next event. Which will again fail. This is the classic situation where multiple errors will get reported for the same issue. This problem is often referred to as a mail storm given that there was a time when we didn’t have social collaboration tools and everyone used email.
There are several ways to overcome this problem. But the most simple and elegant of these is using the suppress plugin in its filter mode.
<filter **>
@type suppress
interval 60 # period in seconds when the condition to supress is triggered
num 2 # number of occurences of a value before suppressing
attr_keys source # the element of the event to consider.
</filter>
In this example if we encounter an event with an attribute called source containing the same value twice then the suppression will kick in for 60 seconds. If you want the key to the valuebeing checked to be the tag then simply omit the attr_keys parameter.
Of course, we don’t want the suppression to kick in if the same value in the attribute keys occured once every few hours. To address this the occurence count is applied over not a time period, but a number of events received by the configuration of max_slot_num which defaults to 10k, but resets
In the filter mode, this plugin is best positionbed immediately before the match block. This means we don’t accidentally suppress messages before they are routed anywhere else.
For the purposes of a demo this is less of an issue. But for a realworld use case would probably benefit from some tuning. All the documentation for this plugin is at https://github.com/fujiwara/fluent-plugin-suppress
The final leg of getting the book published has taken a lot longer than had been expected. But we’ve just been told that the book has been sent to the printers. This means:
final eBook will be available from Manning in about 1 week
Preordered print copies of the book will be dispatched in about 2 weeks
The alternative ebook formats for mobile readers e.g. kindle etc available in about 3 weeks
The book will become available to purchase from other book stores such as Amazon in 3-4 weeks
Safari Books Online and Apple stores will have the ebook in 4-5 weeks
It also means the project from a writing perspective is complete. But we’re starting to look at the additional examples we’ll add to the GitHub repo. These will be dependent upon the book.
The log simulator we’ve built and written about in the past has had a release made that lines up with the Logging In Action book (v0.1). I am now continuing to add improvements on the main line. Not best Git branching practice, but as I’m working on this solo it doesn’t represent a problem.
If you expect multi line events all you need to do, is add to the properties file a name value pair, with the name FIRSTOFMULTILINEREGEX and the value is a Java/Groovy regular expression which can be used to determine if a line in a log entry is the 1st line in a new log. Then all subsequent log lines are appended to the previous line until a line identifies as a new log entry. The log entry will be written with newline characters in the same place as the read.
In addition to this if the synthetic log events need to be set to be new line then using the ALLOWNL property to be set to true will result in any new line escape sequences (\n) to be made into proper new lines in the output.
The details are all included in the documentation in GitHub.
Last night saw the final chapter of Logging in Action with Fluentd go back to my editor. The next step is that Chapter (and others I hope) will go to MEAP, so early readers not only get the final chapter, but also the raft of improvements we’ve made. Along with that, the manuscript goes for a full peers review. Once that’s back, its time for a round of edits as I address the feedback then into copy editing and Manning sign off review.
As you might have guessed, we’ve kept busy with an article in the 25th edition of OraWorld. This follows Part 1 talking about GraphQL with a look at considerations for API Security.
In addition to that we’re working on a piece around automation of OCI management activities such as setting up developers, allowing them a level of freedom to experiment without accidentally burning through all your credits by spinning up Exadata servers or 500 node Kubernetes clusters.
We might even have some time to write more about APIs and integration.
You must be logged in to post a comment.