Tags
characters, encoding, filters, Fluent Bit, fstat, GBK, Lua, Open Telemetry, ShiftJIS, tail, Technology, WhatWG
The latest release of Fluent Bit is only considered a patch release (based on SemVer naming). But given the enhancements included it would be reasonable to have called it a minor change. There are some really good enhancements here.
Character Encoding
As all mainstream programming languages have syntaxes that lend themselves to English or Western-based languages, it is easy to forget that a lot of the global population use languages that don’t have this heritage, and therefore can’t be encoded using UTF-8. For example, according to the World Factbook, 13.8% speak Mandarin Chinese. While this doesn’t immediately translate into written communication or language use with computers, it is a clear indicator that when logging, we need to support log files that can be encoded to support idiomatic languages, such as Simplified Chinese, and recognized extensions, such as GSK and BIG5. However, internally, Fluent Bit transmits the payload as JSON, so the encoding needs to be handled. This means log file ingestion with the Tail plugin ideally needs to support such encodings. To achieve this, the plugin features a native character encoding engine that can be directed using a new attribute called generic. encoding, which is used to specify the encoding the file is using.
service:
flush: 1
pipeline:
inputs:
- name: tail
path: ./basic-file.gsk
read_from_head: true
exit_on_eof: true
tag: basic-file
generic.encoding: gsk
outputs:
- name: stdout
match: "*"
The encoders supported out of the box, and the recognized names (in italics are) are:
- GB18030 (earlier Simplifed Chinese Standard from the Chinese government called Information Technology – Chinese coded character set)
- GBK (standard that extends the GB18030 standard for Simplified Chinese)
- UHC (Unified Hangul Code also known as Extended Wandung – for Korean)
- ShiftJIS (Japanese characters)
- Big5 (for Chinese as used in Taiwan, Hong Kong, Macau)
- Win866 (Cyrillic Russian)
- Win874 (Thai)
- Win1250 (Latin 2 & Central European languages)
- Win1251 (Cyrillic)
- Win1252 (Latin 1 & Western Europe)
- Win1254 (Turkish)
- Win1255 (also known as cp1255 and supports Hebrew)
- Win1256 (Arabic)
- Win2513 (suspect this should be Win1253, which covers the Greek language)
These standards are governed by the WhatWG specification (Web Hypertext Application Technology Group), not a well-known name, but have an agreement with the well-known W3C for various HTML and related standards.
The Win**** encodings are Windows-based formats that predate the adoption of UTF-8 by Microsoft.
Log Rotation handling
The Tail plugin, has also seen another improvement. Working with remote file mounts has been challenging, as it is necessary to ensure that file rotation is properly recognized. To improve the file rotation recognition, Fluent Bit has been modified to take full advantage of fstat. From a configuration perspective, we’ll not see any changes, but from the viewpoint of handling edge cases the plugin is far more robust.
Lua scripting for OpenTelemetry
In my opinion, the Lua plugin has been an underappreciated filter. It provides the means to create customized filtering and transformers with minimal overhead and effort. Until now, Lua has been limited in its ability to interact with OpenTelemetry payloads. This has been rectified by introducing a new callback signature with an additional parameter, which allows access to the OLTP attributes, enabling examination and, if necessary, return of a modified set. The new signature does not invalidate existing Lua scripts with the older three or four parameters. So backward compatibility is retained.
The most challenging aspect of using Lua scripts with OpenTelemetry is understanding the attribute values. Given this, let’s just see an example of the updated Lua callback. We’ll explore this feature further in future blogs.
function cb(tag, ts, group, metadata, record)
if group['resource']['attributes']['service.name'] then
record['service_name'] = group['resource']['attributes']['service.name']
end
if metadata['otlp']['severity_number'] == 9 then
metadata['otlp']['severity_number'] = 13
metadata['otlp']['severity_text'] = 'WARN'
end
return 1, ts, metadata, record
end
Other enhancements
With nearly every release of Fluent Bit, you can find plugin enhancements to improve performance (e.g., OpenTelemetry) or leverage the latest platform enhancements, such as AWS services.