Securing Fluent Bit operations

Tags

I’ve been peer-reviewing a book in development called ML Workloads with Kubernetes for Manning. The book is in its first review cycle, so it is not yet available in the MEAP programme. I mention this because the book’s first few chapters cover the application of Apache Airflow and Juypter Notebooks on a Kubernetes platform. It highlights some very flexible things that, while pretty cool, could be seen by some organizations as potential attack vectors. I should say, the authors have engaged with security considerations from the outset). My point is that while we talk about various non-functional considerations, including security, there isn’t a section dedicated to security. So, we’re going to talk directly about some security considerations here.

It would be very easy to consider security as not being important when it comes to observability – but that would be a mistake, for a few reasons:

Logging Payloads

It is easy to incorporate all an application’s data payloads into observability signals such as traces and logs. It’s an easy mistake to make during initial development – you just want to initially see everything is being handled as intended during development, so include the payload. While we can go back and clean this up or even remove such output as we tidy up code – these things can slip through the wires. Just about any application today will want login credentials. Input credentials are about identifying who we are and determining if or what we can see. The fact that they can uniquely identify us is where we usually run into Data Protection law.

It isn’t unusual for systems to be expected to record who does what and when – all part of common auditing activities. That means our identity is going to often be attached to data flowing through our application.

This makes anywhere the records this data a potential gold mine of data, and the lack of diligence will mean that our operational support tools and processes will be soft targets.

Code Paths

Our applications will carry details of execution paths – from trace-related activities to exception stacks. We need this information to diagnose issues – it is even possible that the code will handle the issues, but it is typical to record the stack trace so we can see that the application has had to perform remediation (even if that is simply because we decided to catch an exception rather than have defensive code). So what? Well, that information tells us as developers what the application is doing – but in the wrong hands, that tells the consumer how they can induce errors and what third-party libraries we’re using (which means the reader can deduce what vulnerabilities we have) (see what OWASP says on the matter here).

Sometimes, our answer to a vulnerability might not be to fix it but to introduce mitigation strategies—e.g., we’ll block direct access to a system. The issue with such mitigations is that people will forget why they’re there or subvert them for the best of reasons, leaving them accidentally vulnerable again. So, minimizing exposure should be the second line of defense.

How does this relate to Fluent Bit?

Well, the first thing is to assume that Fluent Bit is handling sensitive data, remind ourselves of this from time to time, and even test it. This alone immediately puts us in a healthier place, and we at least know what risks are being taken.

Fluent Bit support SSL/TLS for network traffic

SSL/TLS traffic involves certificates; setting up and maintaining such things can be a real pain, particularly if the processes around managing certificates haven’t been carefully thought through and automated. Imposing the management of certificates with manual processes is the fastest way to kill off their adoption and use. Within an organization, certificates don’t have to be expensive ones that offer big payouts if compromised, such as those provided by companies like Thawte and Symantec. The Linux Foundation with Let’s Encrypt and protocols like ACME (Automated Certificate Management Environment) make it cost-free and provide automation for regular certificate rotation.

Don’t get suckered by the idea that SSL stripping at the perimeter is acceptable today. It used to be an acceptable thing to do because, among other reasons, the overhead of the processing of certificates was a measurable overhead. Moore’s law has seen to it that such computational overhead is tolerable if not fractions of a percentage cost. If not convinced, then consider the fact that there is sufficient drive that Kubernetes supports mutual SSL between containers that are more than likely to be actually running on the same physical server.

Start by Considering File systems on logs

If you’re working with applications or frameworks that direct logs to local files, you can do a couple of things. First, control the permissions on the files.

Many frameworks that support logging configuration don’t do anything with the logs (although some do, like Airflow). For those cases where log location doesn’t have a behavioral impact, we can look to control where the logs are being written. Structuring logs into a common part of the file system can make things easier to manage, certainly from a file system permissions viewpoint.

Watching for sensitive data bleed

If you’re using Fluent Bit to consolidate telemetry into systems like Loki, etc., then we should be running regular scans to ensure that no unplanned sensitive data is being caught. We can use tools like Telemetrygen to inject values into the event stream to test this process and see if the injected values are detected.

If or when such a situation occurs, the ideal solution is to fix the root cause. But, this isn’t always possible when the issue comes through a 3rd party library, an organization is reluctant to make changes or production changes are slow. In these scenarios and discussed in the book, we can use Fluent Bit configurations to mitigate the propagation of such data. But as we said earlier, if you use mitigations, it warrants verifying they aren’t accidentally undone, which takes us back to the start of this point.

Classifying and Tagging data

Telemetry, particularly traces and logs can be classified and tagged to reflect information about origin, nature of the event. This is mostly done nearest the source as understanding the origin helps the classification process. This task is something Fluent Bit can easily do and route accordingly as we can see in the book.

Don’t run Fluent Bit as root

Not running Fluent Bit with root credentials is security 101. But it is tempting when you want to use Fluent Bit to tap in and listen to the OS and platform logs and metrics, particularly if you aren’t a Linux specialist. It is worth investing in getting an OS base configuration that is secure while not preventing your observability. This doesn’t automatically mean you must use containers. Bare metal, etc., can be secured by not installing from a vendor base image but an image you’ve built, or even simpler, taking the base image and then using tools like Chef, Ansible, etc., to impose a configuration over the top.

Bottom Line

The bottom line is, as long as we keep in mind that our observability processes and data should be subject to the same care and consideration as our business data, along with the fact that security should never be an afterthought, something that we bolt on just before go live and pervasive rather than just at the boundary.

When I learnt to drive (in the dark ages), one of the things I was told is – if you assume that everyone on the road is a clueless idiot, then you’ll be ok. We should look at treating systems development and the adoption of security the same way – if you assume someone is likely to make a mistake and take defensive steps — then we’ll be ok — thiswill give us security in depth.