Tags
Fluentd, logging, mail storm, monitoring, Plugin, social, supress
Fluentd has an incredible catalogue of plugins including notification and collaboration channels from good old-fashioned email through to slack, teams, and others.
The thing to remember if you use these channels is that if you’re sending errors, from application logs it isn’t unusual for there to be multiple error events as a root event can cause a cascade of related issues. For example, if your code is writing transactions to a database and the database goes down with no failover mechanism, then your code will most likely experience an error, roll back the transaction perhaps to some sort of queue, and then try to process the next event. Which will again fail. This is the classic situation where multiple errors will get reported for the same issue. This problem is often referred to as a mail storm given that there was a time when we didn’t have social collaboration tools and everyone used email.

There are several ways to overcome this problem. But the most simple and elegant of these is using the suppress plugin in its filter mode.
<filter **>
@type suppress
interval 60 # period in seconds when the condition to supress is triggered
num 2 # number of occurences of a value before suppressing
attr_keys source # the element of the event to consider.
</filter>
In this example if we encounter an event with an attribute called source containing the same value twice then the suppression will kick in for 60 seconds. If you want the key to the valuebeing checked to be the tag then simply omit the attr_keys parameter.
Of course, we don’t want the suppression to kick in if the same value in the attribute keys occured once every few hours. To address this the occurence count is applied over not a time period, but a number of events received by the configuration of max_slot_num which defaults to 10k, but resets
In the filter mode, this plugin is best positionbed immediately before the match block. This means we don’t accidentally suppress messages before they are routed anywhere else.
For the purposes of a demo this is less of an issue. But for a realworld use case would probably benefit from some tuning. All the documentation for this plugin is at https://github.com/fujiwara/fluent-plugin-suppress