All things rerlated to the CNCF project Fluentd and the book Logging in Action

Checking your OpenTelemetry pipeline with Telemetrygen

30 Tuesday Apr 2024

Posted by mp3monster in Fluent Bit, Fluentd, General, Technology

Tags

Docker, docker compose, Fluent Bit, Go, logs, metrics, OpenTelemetry, signals, Telemetrygen, Traces

Testing OpenTelemetry configuration pipelines without resorting to instrumented applications, particularly for traces, can be a bit of a pain. Typically, you just want to validate you can get an exported/generated signal through your pipeline, which may not be the OpenTelemetry Collector (e.g., FluentBit or commercial solutions such as DataDog). This led to the creation of Tracegen, and then the larger Telementrygen.

You can use Tracegen or Telemetrygen by either downloading and running the Go app from GitHub or using the Docker file. But there are a couple of challenges:

On initial investigation, these utilities appear wrapped up in the larger opentelemetry-collector-contrib. While potentially useful, shaking out your OTel pipelining is somewhat overkill.
We can install the app locally with the following command, but then we need to set up Golang in the environment.

go install github.com/open-telemetry/opentelemetry-collector-contrib/cmd/telemetrygen@latest

Fortunately, there is a Docker image that just contains the tool, but to use it, we need to know what the parameters are to override the container defaults. The only irritant is that you either need to mess about with the container to get at the information (i.e., run the — help options) or install the utility (the parameters are not in the GitHub docs), so we’ve teased out all the options into the following table.

The following table’s Signal column All means it can be applied to Metrics, Traces, or Logs. Otherwise we’ve named the signal type that the parameter can be used with.

Signal	Parameter/Flag	Description
Trace	–batch	Whether to batch traces (default true)
All	–ca-cert string	Trusted Certificate Authority to verify server certificate
Traces	–child-spans int	Client certificate file
All	–client-key string	Client private key file
All	–duration duration	For how long to run the test
All	-h, –help {traces\|metrics\|logs}	help – with give you the basic help if no parameter is passed. Or the signal type help when used with the signal name e.g. `telemetrygen traces --help`
All	–interval duration	Reporting interval (default 1s)
Traces	–marshal	Whether to marshal trace context via HTTP headers
All	–mtls	Whether to require client authentication for mTLS
All	–otlp-attributes map[string]string	Custom resource attributes to use. The value is expected in the format key=”value”. Note you may need to escape the quotes when using the tool from a cli. Flag may be repeated to set multiple attributes (e.g `--otlp-attributes key1=\"value1\" --otlp-attributes key2=\"value2\"`)
All	–otlp-endpoint string	Destination endpoint for exporting logs, metrics and traces
All	–otlp-header map[string]string	Custom header to be passed along with each OTLP request. The value is expected in the format key=”value”. Note you may need to escape the quotes when using the tool from a cli. Flag may be repeated to set multiple headers (e.g `--otlp-header key1=\"value1\" --otlp-header key2=\"value2\"`)
All	–otlp-http	Whether to use HTTP exporter rather than a gRPC one
All	–otlp-http-url-path string	Which URL path to write to (default `"/v1/traces"`)
All	–otlp-insecure	Whether to enable client transport security for the exporter’s grpc or http connection
All	–rate int	Approximately how many metrics per second each worker should generate. Zero means no throttling.
Traces	–service string	Service name to use (default `"telemetrygen"`)
Traces	–size int	Desired minimum size in MB of string data for each trace generated. This can be used to test traces with large payloads, i.e. when testing the OTLP receiver endpoint max receive size.
Traces	–span-duration duration	The duration of each generated span. (default 123µs)
Traces	–status-code string	Status code to use for the spans, one of (`Unset`, `Error`, `Ok`) or the equivalent integer (0,1,2) (default “0”)
All	–telemetry-attributes map[string]string	Number of traces to generate in each worker (ignored if the duration is provided) (default 1)
Traces	–traces int	Number of traces to generate in each worker (ignored if duration is provided) (default 1)
All	–workers int	Number of workers (goroutines) to run (default of 1)
Metrics	–metric-type metricType	Number of metrics to generate in each worker (ignored if the duration is provided) (default 1)
Metrics	–metrics int	Number of logs to generate in each worker (ignored if the duration is provided) (default 1)
Logs	–body string	Body of the log (default “the message”)
Logs	–logs int	The severity number of the log ranges from 1 to 24 (inclusive) (default 9)
Logs	–severity-number int32	The severity number of the log ranges from 1 to 24 (inclusive) (default 9)
Logs	–severity-text string	Severity text of the log (default “Info”)

All the configuration parameters for Telemetrygen

It is worth noting that while Tracegen has similar configuration parameters, they aren’t exactly the same in the CLI, often one dash rather than two in the name for example.

The following is a simple Docker compose file that can help you use the container to conduct local testing of your collector. In this configuration, we’re sending a trace to the host machine with HTTPS disabled.

services:
  web:
    image: ghcr.io/open-telemetry/opentelemetry-collector-contrib/telemetrygen:latest
    network_mode: host
    entrypoint:
      [
        "/telemetrygen",
        "traces",
        "--otlp-insecure",
        "--otlp-endpoint",
        "host.docker.internal:4317",
      ]

Fluent Bit with Oracle Cloud

09 Tuesday Jan 2024

Posted by mp3monster in Books, Fluent Bit, Fluentd, General, Oracle

≈ Leave a comment

Tags

book, Cloud, demo, Fluent Bit, logging, monitoring, o11y, observability, OCI, Oracle

The hyper scaler cloud vendors all offer Logging and monitoring capabilities. But they tend to focus on supporting their native services. If you’re aware of Oracle’s Cloud (OCI) messaging, then you’ll know that there is a strong recognition of the importance of multi-cloud. This extends not only to connecting apps across clouds but also to be able to observe and manage cloud-spanning solutions. Ultimately, most organizations want to headline observability-related views of their solutions.

Late last year, I presented these ideas, illustrating the ideas with the use of Fluent Bit and OCI’s Observability and Management products to visualize and analyze what is happening. I finally found the time to write how the very basic demo was built from a clean sheet over on the Oracle Devs blog on Medium.

Useful Resources for Fluent Bit and Observability

This also highlights the fact that the Fluent Bit book, while I believe, once completed, will be through, can’t cover everything – and certainly not build end-to-end use cases like the Oracle Observability & Management example. To help address this, the book includes an appendix of helpful additional information, some of which I have included here, along with other content that we encounter – all of which can be found at Fluentd & Fluent Bit Additional stuff.

Speeding Ruby

30 Monday Oct 2023

Posted by mp3monster in development, Fluent Bit, Fluentd, General, languages, Technology

≈ Leave a comment

Tags

Cloud, development, Fluent Bit, Fluentd, Ruby, Ruvy, Shopify

Development trends have shown a shift towards precompiled languages like Go and Rust away from interpreted or Just-In-Time (JIT) compiled languages like Java and Ruby as it removes the startup time of the language virtual machine and the JIT compiler as well as a smaller memory footprint. All desirable features when you’re scaling containerized solutions and percentage point savings can really add up.

Oracle has been leading the way with its work on GraalVM for some years now, and as a result, not only can GraalVM be used to produce native binary images from Java code, GraalVM also supports TuffleRuby and GraalPy, among others. As TruffleRuby is an open-source project, Oracle isn’t the only vendor contributing to it, work effort has also come from Shopify.

Helping Ruby move forward isn’t new for the Shopify engineering team, and part of that investment is that they have just announced the open-sourcing of a toolchain called Ruvy. Ruvy takes Ruby and creates a WebAssembly (WASM) from it the code. This builds on the existing project ruby.wasm. In doing so they’ve addressed the Ruby startup overhead of the language VM we mentioned. They have also simplified the process of deployment, eliminating the need for Web Assembly System Interface (WASI) arguments, and overcome constraints of class loading by reading files by having the code bundled within the assembly and then accessing the content using WASI-VFS, a simple virtual file system.

The published benchmarks show a massive performance boost in the process of executing where the Ruby code needs to be executed by the packaged JIT. For me, this is interesting as one of the related cloud-native trends is the shift from Fluentd to Fluent Bit. Fluentd was built with Ruby and has a huge portfolio of third-party extensions. But Fluent Bit is built using C to get those performance gains previously described. But it does support plugins through WASM. This raises an interesting question can we take existing Ruby plugins and wrap them so the required interfacing works – which should be minimal and more likely to be impacted by the fact Fluent Bit v2 has refined the internal data structure that was common to both Fluentd and Fluent Bit to allow Fluent Bit to more easily engaged with OpenTelemetry.

If the extra bit of wrapping code isn’t complex, then applying Ruvy should mean the core plugin can then work with Fluent Bit. If this can be templated, then Fluent Bit is going to make a big leap forward with the number of available plugins.

New Article for SE Daily…

27 Thursday Apr 2023

Posted by mp3monster in ExternalWebPublications, Fluentd, General, Technology

≈ Leave a comment

Tags

Cloud, external publications, Hybrid, monitoring, multicloud, observability, SE Daily, Software Engineering Daily

We’ve just had a new article published for Software Engineering Daily which looks at monitoring in multi-cloud and hybrid use cases and highlights some strategies that can help support the single pane of glass by exploiting features in tools such as Fluentd and Fluentbit that perhaps aren’t fully appreciated. Check it out …

Challenges of Multi-Cloud and Hybrid Monitoring

Is The 12 Factor App right about Logging?

05 Wednesday Oct 2022

Posted by mp3monster in development, Fluentd, General, Technology

≈ Leave a comment

Tags

12 Factor, 12 Factor App, conference, development, Grafana, JAX, logging, London, OpenSearch, Prometheus, Splunk, stdout

The 12 Factor App definition is now ten years old. In the world of software that is a long time. So perhaps it’s time to revisit and review what it says. As I have spent a lot of time around Logging – I’ve focussed on Factor 11 – Logging.

I have been fortunate enough to present at the hybrid JAX London conference on this subject. It was great to get out and see people at a conference rather than just with a screen and a chat console of online-only events.

You can see my presentation here:

Continue reading →

Demo Fluentd using Ubuntu with optional inclusion of OpenSearch and OCI Log Analytics

17 Wednesday Aug 2022

Posted by mp3monster in Cloud, development, Fluentd, General, Oracle, Technology

≈ Leave a comment

Tags

Cloud, demo, Fluentd, GitHub, Log Analytics, log simulator, OCI, OpenSearch, Oracle, Ubuntu

One of the areas I present publicly is the use of Fluentd. including the use of distributed and multiple nodes. As many events have been virtual it has been easy to demo everything from my desktop – everything is set up so I can demo things very easily. While doing this all on one machine does point to how compact and efficient Fluentd is as I can run multiple instances concurrently it does undermine distributed capabilities somewhat.

Add to that I now work for Oracle it makes sense to use OCI resources. With that, I have been developing the scripts to configure Ubuntu VMs to set up the demo environments installing Ruby, Fluentd, and various gems needed and pulling the relevant configurations in. All the assets can be found in the GitHub repository https://github.com/mp3monster/logging-demos. The repository readme includes plenty of information as well.

While I’ve been putting this together using OCI, the fact that everything is based on Ubuntu should mean it can be run locally on VMs, WSL2, and adaptable for MacOS as well. The environment has been configured means you can still run on Ubuntu with a single node if desired.

Additional Log Destinations

As the demo will typically be run on OCI we can not only run the demo with a multinode setup, we have extended the setup with several inclusion files so we can utilize OCI services OpenSearch and OCI Log Analytics. If you don’t want to use these services simply replace the contents of several inclusion files including files with the contents of the dummy_inclusion.conf file provided.

The configuration works by each destination having one or two inclusion files. The files with the postfix of label-inclusion.conf contains the configuration to direct traffic to the respective service with a configuration that will push log events at a very high frequency to the destination. The second inclusion file injects the duplication of log events to each service. The inclusion declarations in the main node Fluentd config file references an environment variable that should provide the path to the inclusion file to use. As a result, by changing the environment variable to point to a dummy file it becomes possible o configure out the use of one of the services. The two inclusions mean we can keep the store declarations compact and show multiple labels being used. With the OpenSearch setup, we have a variant of the inclusion file model where the route inclusion can reference the logic that we would use in the label directly within the sore declaration.

The best way to see the use of the inclusions is to experiment with setting the different environment variables to reference the different files and then using the Fluentd dry-run feature (more on this in the book).

Setup script

The setup script performs a number of tasks including:

Pulling from Git all the resources needed in terms of configuration files and folders
Retrieving the necessary plugins against the possibility of their use.
Setting up the various environment variables for:
- Slack token
- environment variables to reference inclusion files
- shortcut environment variables and aliases
- network (IP) address for external services such as OpenSearch
Setting up a folder for OCI tokens needed.
Setting up temp folders to be used by OCI Plugins as a file-based cache.

Using OpenSearch

OpenSearch setup is documented in a tutorial here, and a Reference Architecture at the time of writing there isn’t a one-click deploy Terraform available in the Oracle Reference Architecture library on GitHub.

Currently, the setup for OpenSearch means manually adding the node1 index into the configuration.

Useful Links:

Log Analytics

Feeding the log analytics service is a more complex process to set up as the feeds need to have metadata about the events being ingested. The downside is the configuration effort is greater, but the payback is that it becomes easier to extract meaningful information quickly because the service has a greater understanding of the content. For example, attributing the logs to a type of source means the predefined or default log formats are immediately understood, and maximum meaning can be retrieved from the log event.

Going to OCI Log Analytics does cut out the need for the Connections hub, which would allow rules and routing to be defined to different OCI services which functionally can help such as directing log events to PagerDuty.

Useful Links

Demo Enhancements to come

There are a few things we’re planning to do with the demo:

Create a terraform script to perform all the environment setup
Integrate the configuration script into the terraform
Provide some simple dashboard insights for OpenSearch – currently, this is all manual
Basic setup for OCI Log Analytics

Securing credentials in Fluentd configurations

07 Tuesday Jun 2022

Posted by mp3monster in development, Fluentd, General, manning, Technology

≈ Leave a comment

Tags

Conjur, env vars, environment variables, Fluentd, Hashicorp, open source, Ruby, secrets, Security, slack, token, Vault

When configuring Fluentd we often need to provide credentials to access event sources, targets, and associated services such as notification tools like Slack and PagerDuty. The challenge is that we don’t want the credentials to be in clear text in the Fluentd configuration.

Using Env Vars

In the Logging In Action with Fluentd book, we illustrated how we can take the sensitive values from environment variables so the values don’t show up in the configuration file. But, we’ve seen regularly the question of how secure is this, can’t the environment variable be seen by everyone on that machine?

The answer to this question comes down to having a deeper understanding of how environment variables work. There is a really good explanation here. The long and short of it is that environment variables can only be seen by the process that creates the variable and any child process will receive a copy of the parent’s variables.

This means that if we create the variable in a shell, only that shell and any processes launched by that shell can see the environment variable. So as long as we don’t set variables up as part of a system-level configuration then we already have a level of security. So we could wrap the start of Fluentd with a script that sets the environment variables needed. Then everything launches that script.

An even better way?

Continue reading →

Coding over Cocktails with Fluentd

06 Friday May 2022

Posted by mp3monster in Books, Fluentd, General, manning, Podcasts, Technology

≈ Leave a comment

Tags

book, Coding over Cocktails, Logging in Action, podcast, Toro Cloud

I’ve been fortunate enough to appear on a podcast with the excellent Coding Over Cocktails team from Toro Cloud. we got to talk about some of the ideas discussed in my Logging In Action book. You can check the podcast out via their website which includes all the episode details and links to all the platforms that host the podcast. There have been some great previous guests such as Luis Weir (my old boss), Chris Richardson of Microservices.io, Matthew Reinbold from Postman, Sam Newman to name just a few.

Handling multi-line log entries with Fluentd

19 Tuesday Apr 2022

Posted by mp3monster in Books, Fluentd, General, logsimulator, manning, Technology

≈ Leave a comment

Tags

book, Fluentd, multiline, parser, regex

One aspect of logging I didn’t directly address with my Fluentd book was consuming multiline logs, such as those you’ll often see when a stack trace is included in the log output. Implementing the feature with Fluentd isn’t hugely complex as it leverages the use of regular expressions (addressed in the book in more depth) to recognize the 1st line in a multiline log entry and for subsequent lines.

I didn’t address it for a couple of reasons:

Using parsers is fairly inefficient, particularly when you’re using a parser to just decide how to then transform a line (this is why I’m not a huge fan of some of the 12 Factor App‘s recommendations when it comes to logging).
Incorporating into your Fluentd parser configurations for specific app log setups is arguably increasing the level of coupling.
Many logging frameworks can talk directly to Fluentd as we saw in the book. This is can be more efficient, which means that the log event is more likely to be passed over in a structured format (therefore less work to do).
Alternatively, frameworks like Log4J2 have the means to strip line feeds, etc at source – https://logging.apache.org/log4j/2.x/manual/layouts.html (see replace)

But let’s also be realistic, many applications will be configured to simply log to a file and aren’t likely to be changed. At which point we do need to process such situations, so is it done? The process remains largely the same as the tail plugin we illustrated. Except we introduce a different parser called multiline. The documentation provided by Fluentd includes several examples of multiline configurations that will work for default log formats (such as Log4J and Rails). If we took our most basic source setup:

<source>
  @type tail
  path ./Chapter3/basic-file.*
  read_lines_limit 5
  tag simpleFile
  <parse>
    @type none
  </parse>
</source>

Then assuming our log Simulator played back multiline logs (the provided configuration doesn’t do that) extended to consume standard Log4J2 logs we would have a configuration as follows:

<source>
  @type tail
  path ./Chapter3/basic-file.*
  read_lines_limit 5
  tag simpleFile
  <parse>
     @type multiline
     format_firstline /\d{4}-\d{1,2}-\d{1,2}/
     format1 /^(?<time>\d{4}-\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2}) \[(?<thread>.*)\] (?<level>[^\s]+)(?<message>.*)/
  </parse>
</source>

As you can see we’ve set the parser type to multiline. Then there are two regular expressions, format_firstline is used to help recognize the start of a log event. Every line of the log is tested with this expression as we now assume unless this produces a valid result that the line will be part of a multiline event. If you look at the expression you’ll realize it is looking for a DateTime stamp in the form YYYY-MM-DD. This does mean if you generate a log that starts with the date even if it is part of a multiline output then you’ll trip up the parser. You could extend the expression – but the longer it is the slower the processing.

Following format_firstline we have in our example format1 which describes how to process the first line. This can be extended to define how to handle subsequent lines but this could be multiple format definitions. They do need to be presented in numerical order eg. format1, format2, format, and so on.

LogSimulator – Playing back multi-line logs

The Log simulator uses a very similar mechanism to Fluentd to understand how to playback multiple line logs. When it is reading the log lines in for replay it uses a regular expression to recognize the start of a new log entry (FIRSTOFMULTILINEREGEX) defined in the properties file. The simulator will concatenate lines together until it either hits the end of the file or has a new line that complies with the REGEX. It stores the line with an encoded /n (newline character). It will then print the log using the format specified and will allow the /n to create a newline (or not) based on another config parameter (ALLOWNL).

Avoid creating an event storm when using social outputs with Fluentd

14 Thursday Apr 2022

Posted by mp3monster in Fluentd, General, Technology

≈ Leave a comment

Tags

Fluentd, logging, mail storm, monitoring, Plugin, social, supress

Fluentd has an incredible catalogue of plugins including notification and collaboration channels from good old-fashioned email through to slack, teams, and others.

The thing to remember if you use these channels is that if you’re sending errors, from application logs it isn’t unusual for there to be multiple error events as a root event can cause a cascade of related issues. For example, if your code is writing transactions to a database and the database goes down with no failover mechanism, then your code will most likely experience an error, roll back the transaction perhaps to some sort of queue, and then try to process the next event. Which will again fail. This is the classic situation where multiple errors will get reported for the same issue. This problem is often referred to as a mail storm given that there was a time when we didn’t have social collaboration tools and everyone used email.

There are several ways to overcome this problem. But the most simple and elegant of these is using the suppress plugin in its filter mode.

<filter **>
  @type suppress
  interval 60       # period in seconds when the condition to supress is triggered
  num 2             # number of occurences of a value before suppressing
  attr_keys source  # the element of the event to consider.
</filter>

In this example if we encounter an event with an attribute called source containing the same value twice then the suppression will kick in for 60 seconds. If you want the key to the valuebeing checked to be the tag then simply omit the attr_keys parameter.
Of course, we don’t want the suppression to kick in if the same value in the attribute keys occured once every few hours. To address this the occurence count is applied over not a time period, but a number of events received by the configuration of max_slot_num which defaults to 10k, but resets
In the filter mode, this plugin is best positionbed immediately before the match block. This means we don’t accidentally suppress messages before they are routed anywhere else.
For the purposes of a demo this is less of an issue. But for a realworld use case would probably benefit from some tuning. All the documentation for this plugin is at https://github.com/fujiwara/fluent-plugin-suppress