Legislation for software architects

12 Monday May 2025

Posted by mp3monster in General, Technology

Tags

When we start our IT career (and depending on how long ago you started), the idea of software and legislation seemed pretty remote; the only rules you might have to contend with were your local development standards. As an architect today, that is far from the case, as the saying goes, you need to be a ‘Jack of all Trades’. You don’t need to be a lawyer, but you have to have a grasp of legislation and agreements that can impact, and recognise when it is time to talk to the legal eagles.

I thought it worthwhile calling out the different things we need to have a handle on, based on my experience. There will always be domain-specific laws, but the following are largely universal..

Software licenses—Today, we rarely build a solution without using a library, package, utility, or even a full application we haven’t written ourselves.
- But what we can and can’t do with that third-party asset or reasonably expect from it, provided the resource is provided, is dictated by a license, explicit or implicit. Consider the implications of an Apache license compared to a Creative Commons Share-Alike. In terms of negative impact, open source licenses can at worst…
  - Prevent code from being used commercially or to provide commercial services (several software vendors, such as Elastic and Hashicorp, have adopted this).
  - Require you to share whatever you develop using open-source libraries
  - Declare your use of libraries (remember, such information can provide clues on possible attack vectors).
- Fortunately, licenses for software solutions under several organizational umbrellas, such as the Linux Foundation (and its subsidiary organizations, such as the CNCF), require the projects to adopt a permissive licensing model.
- Commercial licenses can come into play as well. The Open Source model often involves the key contributing organizations offering services such as support and training, or extended features. A|ttractive for larger organizations so that they have a fallback and access to specialist resources. However, we also have products that only exist commercially. Understanding the licensing position of these tools is essential – for example, Oracle database, where you pay for production deployments by the number of CPUs, but non-production deployments are free. Such licensing may have material on the architecture, for example, minimizing the amount of non-DB compute effort on those nodes that take place, and sizing your solution such that you have more CPUs but with less power to provide better resilience. In terms of negative impacts…
  - You can become exposed to unplanned license costs that hadn’t been planned.
  - Undermine the solution’s cost-benefit
GDPR – There are many variations of the General Data Protection Regulation (GDPR), but most have taken GDPR as a foundation. Covering concepts of the right to know and correct data held about an individual, disclosure as to personal data use, and the right to be forgotten are essential. There are resources available that cover which laws apply where. The negative impacts…
- Additional development processes and administration to create evidence of compliance (eg, audit of access to data)
- Additional costs to satisfy compliance, e.g, regular mandatory training for all developers that could be impacted
Several acts, such as the US Cloud Act, can also impact the choices of service providers when using hosting, such as cloud providers. This highlights an interesting factor to keep in mind: legislation from other countries can still impact the situation even if the solution will not be used in that country. Impacts could be…
- Using sovereign cloud and any associated costs.
- Solution options are controlled by the availability of sovereign cloud services.
- Limit the use of managed services to make the solution portable to different sovereign clouds.
AI and ML are rapidly evolving areas of legislation. The EU has been proactive in this space with the AI Act. However, secondary legislative factors exist, such as intellectual property law. While we may not all be directly involved in training LLMs, we still need to understand the ramifications and the data we work with. Possible impacts can include…
- Data source assurance processes.
PCI—While the Payment Card Industry (PCI) does not have legal standing, its impact is broad and substantial, so we might as well treat it as such. The exact rules PCI requires depend on whether you’re an organization providing the use and storage of cards or a service provider.
In areas like PCI, while not strictly legislation, certain domain compliances demand compliance with various standards, perhaps the most pervasive of these is ISO27001, which covers information security across the spectrum of business/commercial considerations, but extends to infrastructure, software, and its development IT. Understanding this and standards such as SOC 1, SOC 2, and SSAE16 (now 18 and 22) are essential to understand, as these are standards you need to determine if they are important to you when considering cloud and SaaS services, particularly. Things have improved over time, but we have encountered specialist managed/cloud services where the providers are unaware of such standards and have no position or evidence of addressing some of the expectations set out by SOC1 and SOC2.
If you work for a software vendor, exportation law can impact your business, particularly when the solution involves complex algorithms such as those used in encryption.

These points primarily focus on ‘universal truths’, but there are domain-specific laws and expected standards that can be considered in the same or similar light. As with all domains, there are specialist legislation requirements like the Digital Operational Resilience Act (DORA) that impact financial businesses and Consumer Protection (Distance Selling) for e-tail.

Some useful resources:

AI coding tools vs low code

23 Wednesday Apr 2025

Posted by mp3monster in General, Technology

≈ Leave a comment

Tags

AI, code, Code Assist, Copilot, developer, developer tools, low-code, tools

The rapid development of generative AI in traditional code development (third-generation language use) has had a lot of impact, with claims of massive productivity improvements. Given that developer productivity has historically been the domain of low-code tooling, this has led me to wonder whether the gap is shrinking and whether we are approaching a point where the benefits of low-code tools are being eroded for mainstream development.

To better understand this, let’s revisit how both technologies help.

AI-supported development

Delivered value in several ways:

Code refactoring and optimization
Code documentation generation
Unit test generation
Next generation of auto-complete

This can include creating code in a green field context. If you’ve been following reports on the value of services like Copilot, AWS Q Developer, and Code Assist, you’ll see that these tools are delivering a significant productivity boost. A recent ACM article pointed to benefits as high as a threefold boost for more routine activities, tapering off as tasks became more complex.

Low Code

Low-code tools have been around for a long time, while they have evolved and progressed, and have come in a number of forms, such as:

UI applications that map databases to screens.
Business process is defined with a visual tool support for BPM.
Connecting different data sources by using visual notations to leverage representations of sources and sinks and link them together.

The central value proposition of low-code development is speed and agility. This performance comes with the constraint that your development has to fit into the framework, which may have constraints such as how it can scale, elasticity for rapid scaling, and performance optimization. ACM conducted some research into the productivity gains here.

Development acceleration narrowing

Low-code/no-code tools are often associated with the idea of citizen developers, where people with primarily a business background and a broad appreciation of IT are able to develop applications (personal experience points to more developers being able to focus less on code, and more on usability of apps). KPMG shares a view on this here.

Evolution of AI that could change low-code?

It would be easy to be a doom monger and say that this will be the end of highly paid software engineering jobs. But we have said this many times over in the last twenty or thirty years (e.g Future of Development).

Looking at the figures, the gains of Gen AI for code development aren’t going to invalidate Low/no code tooling. Where it really benefits is where a low-code tool is not going to offer a good fit to the needs being developed, such as complex graphical UI.

What if …

If Low-Code and Generative AI assistive technologies coalesce, then we’ll see a new generation of citizen developers who can accomplish a lot more. Typical business solutions will be built more rapidly. For example, I can simply describe the UI, and the AI generates a suitable layout that incorporates all the UX features, supporting the W3C guidelines. Furthermore, it may also be able to escape the constraints of low-code frameworks.

The work of developing very efficient, highly scalable Ui building blocks, with libraries to use them will still demand talented developers. Such work is likely to also involve AI model and agent development skills, so the AI can work out how to use such building blocks.

To build such capabilities, we’re going to need to help iron out issues of hallucination from the models. Some UX roles could well be impacted as well, as how we impose consistency in a user’s experience probably needs to be approached differently to defining templates.

Merging of assistive technologies

To truly leverage AI for low-code development, we will likely need to bring multiple concepts together, including describing UIs, linking application logic to leverage other services, and defining algorithms. Bringing these together will require work to harmonize how we communicate with the different AI elements so they can leverage a common context and interact with the user if using a single voice.

Conclusion

So the productivity gap between traditional development and low/no-code has shrunk a bit, I suspect we’ll see this grow quickly if generative AI can be harnessed and is applied, not just as a superficial enhancement, but from a ground-up revisit of how the load-code tooling works. Although the first wave, like everywhere else, will be superficial in the rush for everyone to say their service or tool is AI-enabled.

Getting the best music gifts

21 Monday Apr 2025

Posted by mp3monster in Music, Music Resources

≈ 1 Comment

Tags

album, art, book, cd, Hifi, Music, news, records, rock, vinyl

Following on from my previous piece, I thought I’d cover additional music options that aren’t necessarily vinyl.

Not vinyl, but…

Super Deluxe Editions (SDE)

SDE produces Blu-ray versions of albums. These typically consist of high-quality audio mixes of the albums, along with surround sound audio mixes for an immersive sound experience. While not marketed as limited editions specifically, they appear to be produced in limited quantities, with pre-order volumes dictating the number of copies to be produced.

If you have an artist you like, an established rock or indie act that is about to reissue a successful album or release a new title with high expectations, it’s worth checking in with SDE if Blu-ray audio is of interest. To date, releases have been made available for Paul McCartney, Tears for Fears, Kraftwerk, Suede, Bob Dylan, and others.

Subscriptions

Many artists, particularly those who are not multi-million-selling artists, are exploring the use of subscription models through services such as Patreon and Bandcamp. It is possible to buy such subscriptions as gifts.

The subscription’s benefits vary from artist to artist, but they usually involve additional releases not available elsewhere. Examples of this include Thea Gilmore (a new song every month) and Peter Gabriel (previously unreleased recordings, versions of songs during their development, etc.).

Books

Books seem to be a growing area, not just in the form of biographies, but also in narratives about music collections, album artwork, and so on.

Aubrey ‘Po’ Powell – complete Hipgnosis Catalogue (I got my copy from Hypergallery)

Dust & Grooves is the best book I’ve seen on vinyl collecting

Some of these books, while substantial volumes, are getting very expensive. We have a signed copy of Aubrey Powell’s Complete Hipgnosis Catalogue (the group responsible for the art on all of Pink Floyd’s albums) from a couple of years ago for less than £50. The second volume of Dust & Grooves, released this year, costs £100 for a standard copy.

Some indie record stores are expanding to cover music-related books, such as Resident Music.

Kit

Aside from buying music, another option is resources to help care for a vinyl collection. There are some nice kits available, which bundle vinyl brushes, cleaning solutions, and more. But such kits have limited benefit. To provide proper care, consider a suitable vinyl cleaning machine. Good ones start at a couple of hundred pounds and are best purchased through a hi-fi dealer, such as Audio-T or Sevenoaks Sound and Vision. They typically use ironised water to gently wash the vinyl. Don’t suggest tap or typical bottled water as these will contain small impurities that dry into the grooves – the very thing you’re trying to avoid.

Better still, to minimize the problems of dust and dirt, is to store records within antistatic inserts. Have you ever noticed how brand-new vinyl can be challenging to remove from the inner sleeve? That’s static at play, and it also attracts dust and dirt into the grooves. The static will build up as you slide the vinyl in and out of the inner sleeves. So, putting the vinyl into an antistatic sleeve first removes that problem. Some record companies provide the albums in a paper inner sleeve, which is lined with an anti-static layer – Godwana Records do this. However, the inner sleeve is typically plain, without any printing (i.e., printed lyrics, musician details, or artist commentary).

*Pro-Ject VC-E2 Vinyl Record Cleaner – from dealers like Audio-T*

There are several brands available, but the best ones, which many people swear by, are Nagaoka RS-LP2 Anti-Static Record Sleeves, also referred to as Nagaoka No. 102. These usually come in packs of 50, and you can expect to pay £30 per pack.

If the records are not being stored in a nice soft-lined sturdy record case, then consider outer sleeves. This will help in several ways …

Reduces dust and dirt getting into the sleeve in the first place.
Reduces the potential for sleeve wear (corners and edges can show wear) as the records are slid in and out of shelves.
Reducing sun bleaching of the sleeve is the shelving that gets exposed to direct sunlight.
Replace the PVC packaging that records are shipped in, as it can cause the sleeve and record to tarnish over time due to plastic ‘off-gassing’ (a more detailed explanation can be found here). You want to replace that with Polyethylene (also known as polythene) sleeves.

These are pretty easy to source. Personally, I’ve dealt with Covers33 and found their products to be of good quality and well-priced. Remember, if you’re using sleeves for box sets, you’ll need larger sleeves, which are not always easy to obtain.

Artwork

Most people think of hanging original or limited, signed prints from artists or photographers, where the art was not created for a specific purpose, such as album sleeves. However, the art for album sleeves is no less of an artistic endeavour, and doesn’t have to be plastered with titles and other text. You can collect such art with limited-edition prints approved and signed by the artist and/or the musician. You can find this sort of thing at galleries such as Hypergallery or St Paul’s Gallery. If you go something like this, the picture needs to be framed appropriately and, importantly, has a certificate of authenticity.

Conclusion

These two posts started out as just sharing some brief thoughts, but have morphed into a bit of a monster. I hope you find them useful. As I wrote these two posts, a couple of pieces on HiFi Pig Magazine came to mind, which I thought were worth sharing as they do reflect aspects of the mind of vinyl music collectors:

Getting the best vinyl gifts

21 Monday Apr 2025

Posted by mp3monster in Music, Music Resources

≈ 1 Comment

Tags

bootlegs, box set, collection, collector, color, discogs, Music, news, rare vinyl, records, RSD, vinyl, vinyl-records

Buying vinyl as a gift for a loved one can be tricky if you’re not an aficionado. Buying vinyl generally is easy – just hitting a box shifter like Amazon can do it. However, in most cases, you will only get a generic pressing for mainstream artists, and won’t receive something collectible.

This isn’t a hint for me, as I always keep a list of suggestions for those significant dates. As I’m not the only vinyl fan in our extended family, I thought I’d share the thinking I go through – or at least that is how this post started out.

New releases

From a collector’s perspective, like books generally, 1st issues are more collectible than later additional pressings or reissues. Often, reissues will be on standard black vinyl with a standard sleeve. There are some things where reissues are worth considering, and we’ll come back to this.

In the last couple of years, new releases have seen multiple versions being made available. The versions differ in two ways. Firstly,y special editions will come with extra tracks, typically these tracks are:

Alternate mixes result from how the song is put together in the studio.
Demos, early versions that artists have put together before entering the studio to produce the song properly. In some cases, these versions can turn out to be better than the final production (as was the case with Norah Jones’ debut album).
Live performances, artists often record their own shows, even if it’s just to review and improve.
B-sides, when vinyl and CD singles were dominant, you would have multiple additional tracks. It was once common for an artist to record 20 or more songs for an album. Ten or so tracks would make the album, and others would be included as B-sides.
A recent development is the emergence of different audio mixes, such as 5.1 or 7.1 mixes, which often accompany Blu-ray Disc releases.

The most common variations are the different coloured vinyl. Indie record stores often offer a limited run of coloured vinyl. Sometimes, even picture discs, and in recent years, Zoetrope art and etched album sides. These are harder and more expensive to produce, so they are rarer, often limited runs, so more collectible.

Explanation to zoetrope vinyl

Coloured Vinyl sources

As just mentioned, artists and record labels have supported independent record stores by providing not only the standard black, but also coloured vinyl versions. This has expanded in recent years to having special colors being offered to fans through streaming services such as Spotify and artist websites.

Resident Music details for the new Turin Brakes album, vs the band’s (label-managed) website

Sadly, this approach has been adopted by some very successful artists to entice their loyal fans to buy multiple copies of the same album, with even Amazon getting its own special, colored vinyl. I personally feel Taylor Swift having more than 45 versions of one album is somewhat exploitative of her fan base.

Numbered Editions

We’ve mentioned that often, the colored vinyl runs will be limited. How limited it can vary, so when a release is listed as a limited edition, it is worth checking if a number of copies is identified. These releases will also have numbering printed and written onto them. How many in the run will influence that value? Some runs can be as many as 10,000. For a popular artist, it’s still relatively rare, but not for a smaller name. Others will be as low as 500 copies. So, consider the artist’s popularity when looking at the numbering.

Dinked Editions

In the U.K., a group of indie record stores have been working with smaller indie artists to release ‘Dinked Editions‘ of albums. These versions have been developed with the artists, and often have different album covers, additional tracks on a supplemental single, and will be numbered as part of a limited edition. These are usually numbered and have between 500 and 1000 copies.

Signed artwork

Artists signing the albums will always make the records more collectible. But, signing sleeves can be problematic (doing it once an album is packaged means a lot of weight has to be transferred around). Sign the sleeve before adding the record, as this may impact production. As a result, sadly, the signed piece is a separate art card. So a signed, genuine sleeve will always be more collectible.

Record Store Day / National Album Day

Record Store Days (RSDs) have been going for approaching 20 years. Started as a lifeline to help keep independent record stores stay alive when a lot of stores were closing down. RSD releases are usually limited runs, where just enough copies are produced to sell in stores on a specific date. RSD releases tend to go beyond just coloured vinyl to artists releasing ‘new’ (sometimes older but previously unreleased, remastered, or demo material).

While RSD releases are aimed at physical instore sale events, on the Monday evening after the RSD stores can then sell their RSD overstock online. Occasionally, you may find a store with an odd copy lingering well after the RSD event, but these are relatively rare.

So getting an RSD release involves a bit of luck and timing. Being willing to queue at a store will boost the chances of getting the desired RSD release. It does help in advance of the RSD day, the releases for the event are published, so you know in advance what to get.

RSDs normally take place on a Saturday in April and have become a bit of an event. So, if you’re not that interested in the music you’re buying for someone, you’re better off gambling on a store having it online on the following Monday at 8pm. Most indie stores have online sales channels (often this is how they survive during quiet times), so bookmark several to try.

Unlike RSD, National Album Day isn’t a global setup, although there are similarities National Album Day currently doesn’t carry the same impact.

Vinyl weight and Audiophile pressings

It is worth keeping an eye on the vinyl weight referenced. Good-quality releases are typically 180 g, although 140 g is not uncommon; standard releases can be as low as 100-120 g. The heavier the vinyl, the less susceptible it is to warping, and any surface scuffing is less likely to impact sound reproduction.

There is a correlation between vinyl weight and vinyl quality, with some specialist pressings over 180 g. These come from specialist companies such as Mobile Fidelity Sound Lab. These releases will also feature albums playing back at 45 RPM, as this effectively adds 50% more groove length to the recording. These releases can cost multiples of a standard release. Such purchases will pay off when used with an audiophile setup.

Box sets

Album box sets can be an attractive vinyl gift option, but can be rather expensive for the quantity of music that can be included. Not all box sets are published as Limited Editions, but in most cases, they can be considered so because the number of buyers prepared to spend hundreds of pounds on a small set of new actual material. For example, Rush R50 has 7 new previously recorded tracks with a 234.99 price tag.

While many box sets focus on a single release title, e.g., Peter Gabriel’s I/O or Genesis’ 50th Anniversary edition of Lamb Lies Down On Broadway. Another example is the new Springsteen’s Tracks II priced at nearly £300, but it does contain 7 albums worth of new material.

Vinyl Bootlegs

The recovery of vinyl has also driven the bootleg market. Bootleg (sometimes referred to as Recordings Of Illegitimate Origin – ROIO) really took off with CDs, as the production costs are low, and even more with downloading. While downloading has dented the value of the market. What it doesn’t replace is the sleeves and artwork that can go into such releases.

When it comes to the legality of such recordings, some legal loopholes in the U.K. can give sufficient legitimacy to the releases, which is why they do show up in record stores.

There is an important consideration when buying bootlegs, which is the lineage of the recording. For example, a recording taken from an FM broadcast is transferred to a cheap cassette before being transferred to vinyl. Some labels, and bootleg series have a reputation for quality lineage, such as Transmission Impossible.

Sometimes, a bootleg recording of a specific performance is worth having, regardless of its quality. But to know this requires research and understanding of the performances and bootlegging labels.

Rare Vinyl

Like rare books, rare record buying can need an understanding of what is valuable and what is not. Many factors can influence value, factors such as:

Flawed productions or label printing,
Original pressings and pressings of the wrong version of an album (different mixes etc.).
Alternate sleeves, or where early releases had gatefold sleeves, but later changed to have a conventional sleeve.
Artists signing the sleeve.
First pressings of some albums.

In addition to these points, all the previous considerations, such as numbering, limited issues, coloured vinyl, etc., what is important is evidence of authenticity. Some details such as matrix codes etched in the vinyl, which can help identify specific versions (such information can be found on sites like Discogs).

However, there are some easy value propositions, such as pressed broadcasts like King Biscuit Flower Hour and BBC Top of the Pops, which had limited pressings made so that the vinyl copies could be distributed to regional radio stations.

Understanding the value and pricing becomes easier when you understand how second-hand vinyl is graded. The sleeves and vinyl are graded separately. Grading goes from mint (like new), near mint (NM), very-good (VG), excellent (Ex), and so on. Personally, I would focus on VG or better, except in some exceptional cases (a more precise definition can be seen here).

Finding Your Indie Store

As I’ve mentioned Indie record stores several times, the question becomes, where are they, how do I find them? Well, most cities these days will have an indie store, but they aren’t usually on the high street. You’ll need to find them. Of course, a LOT (not all) are also online. The easiest places to locate your nearest indie store are via:

Google search with something like ‘indie record stores‘.
RSD Record Shop Locator
Discogs map

Conclusion

We’ve covered a lot of possibilities here. However, this is far from exhaustive, and we will follow up with another post that explores other ideas, although not all of them involve vinyl.

Mindmaps

17 Thursday Apr 2025

Posted by mp3monster in General, Technology

≈ Leave a comment

Tags

edrawmind, interactive, iThoughs, mindmap

I’ve been a long-time fan of mind maps, as a means to take notes while reading books, and to help organize thoughts and ideas in my day job. If you’ve explored the content of this blog, you’ll have seen I have a page of mind maps (here) covering various subjects. I’ve published them, as much to share them freely, as to provide a quick access back to them for myself.

For a long time, I’ve been using the excellent iThoughts tool from ToketaWare (great UX and works across multiple platforms – IOS, Mac, and Windows with support for cloud storage). Sadly, in 2023, iThoughts reached end-of-life. I had resisted moving tools for a long time as I couldn’t settle on a solution I liked that wasn’t crazy expensive and has bells and whistles I didn’t want. I finally settled on Wondershare’s EdrawMind which has all the core capabilities, with one additional feature I’d always wished for with iThoughts – interactive navigation of the maps online.

I’ve finally found the time to migrate a lot of my shared documents and add a couple I hadn’t shared. You can access and interact with the mindmaps I’ve migrated by clicking on the icon on the Mindmaps Index page.

Fluent Bit v4 the big news

17 Thursday Apr 2025

Posted by mp3monster in Books, Fluentbit, General, Technology

≈ Leave a comment

Tags

book, CNCF, development, eBPF, Fluent Bit, logs, OpenTelemetry, processors, sampling, Security, Trace, zig

With the announcement of Fluent Bit v4 at Kubecon Europe, we thought it worthwhile to take a look at what it means, aside from celebrating 10 years of Fluent Bit.

Firstly, normally using Semantic Versioning would suggest likely breaking (or incompatible changes to use SemVer wording) changes. The good news is that, like all the previous version changes for Fluent Bit the numbering change only reflects the arrival of major new features.

This is good news for me as the author of Logs and Telemetry with Fluent Bit, as it means the book remains entirely relevant. The book obviously won’t address the latest features, but we’ll try to cover those here as supplemental content.

Let’s reflect upon the new features, their benefits, and their implications.

New features in Processors allowing:
- Conditionality to be included.
- Trace sampling.
More flexible support for TLS (v1.3, choosing ciphers to enable)
New language for custom plugins in the form of Zig

Security Improvements

While security for many is not something that will get most developers excited about, there are things here that will make a CSO (Chief Security Officer) smile. Any developer who knows implementing security behaviors because it is a good thing, rather than because you have been told to do it, makes a CSO happy, puts them in a good place, to be given some more lianency when there is a need to do something that would get the CSO hot under the collar. Given this, we can now win those points with CSOs by using new Fluent Bit configurations that control TLS versions (1.1 – 1.3) and ciphers to support in use.

But even more fundamental than that are the improvements around basic credentials management. Historically, credentials and tokens had to be explicit in a configuration file or referenced back to an environment variable. Now, such values can come from a file, and as a result, there is no explicitness in the configuration. File security can manage access and visibility of such details. This will also make credentials rotation a lot easier to implement.

Processor Improvements

The processor improvements are probably the most exciting changes. Processors allow us to introduce additional activities within the pipeline as part of a process such as an input, rather than requiring additional buffer fetch and return which we see in standard plugin operations.

Of course, the downside is that if the processor introduces a lot of effort, we can create unexpected problems, such as back pressure, for example, as a result of a processor working hard on an input.

The other factor that extending processors bring is that they are not supported in classic format, meaning that to exploit such formats, you do need to define your configuration using YAML. The only thing I’m not a fan of, is that the configuration for these features does make me think I’m having to read algorithms expressed with Backus Naur form (BNF).

Trace Sampling

Firstly, the processors supporting OpenTelemetry Tracing can now sample. This is probably Fluent Bit’s only weakness in the Open Telemetry domain until now. Sampling is essential here as traces can become significant as you track application executions through many spans. When combined with each new transaction creating a new trace, traces can become voluminous. To control this explosion on telemetry data, we want to sample traces, collecting a percentage of typical traces (performance, latency, no errors, etc) and the outliers, where tracing will show us where a process is suffering, e.g., an end-to-end process is slowing because of a bottleneck. We can dictate how the sampling is applied based on values of existing attributes, the trace status, status codes, latencies, the number of spans, etc.

Conditionality in Processors

Conditionality makes it easier to respond to aspects of logs. For example, only when the logging payload has several attributes with specific values do we want to filter the event out for more attention. For example, an application reporting that it is starting up, and logs are classified as representing an error – then we may want to add a tag to the event so it can be easily filtered and routed to the escalation process.

Plugins with Zig

The enablement of Zig for plugin development (input, output and filters) is strictly an experimental feature. The contributors are confident they have covered all the typical use cases. But the innate flexibility supporting a language always represents potential edge cases never considered and may require some additional work to address.

Let’s be honest: Zig isn’t a well-known language. So, let’s start by looking briefly at it and why the community has adopted it for custom plugin development as an alternative to the existing options with Lua and WASM.

So Zig has a number of characteristics that align with the Fluent Bit ethos better than Lua and WASM, specifically:

It is a compiled rather than interpreted language, meaning that we reduce the runtime overheads of an interpreter or JIT compiler such as Lua and the proxy layer of WASM. This aligns to be very fast/minimal compute overhead to do its job, – ideal for IoT and minimising the cost of side-care container deployments.
The footprint for the Zig executable is very, very small—smaller than even a C-generated binary! As with the previous point, this lends itself to common Fluent Bit deployments.
The language definition is formally defined, compact, and freely available. This means you should be able to take a tool chain from anyone, and it is easy for specialist chip vendors to provide compilers.
Based on those who have tried, cross-compiling is far easier to deal with than working with GCC, MSVC, etc. Making it a lot easier to develop with the benefits we want from Go. Unlike Go – to connect to the C binary of Fluent Bit doesn’t require the use of a translation layer.

One of Zig’s characteristics that differs from C is its stronger typing and its approach of, rather than prescribing how edge cases are handled, e.g., null pointers, working to prevent you from entering those conditions.

Zig has been around for a few years (the first pre-release was in 2017, and the first non-pre-release was in August 2023). This is long enough for the supporting tooling to be pretty well fleshed out with package management, important building blocks such as the HTTP server, etc.

While asking a large enterprise with more conservative approaches to development (particularly when IT is seen as an overhead, and source of risk rather than a differentiator/revenue generator) to consider adopting Zig could be challenging compared to adopting, say Go. The different potential values here, make for some interesting potential.

Not Only, but Also

While we have made some significant advancements, each Fluent Bit release brings a variety of improvements in its plugins. For example, working with it with eBPF, HTTP output supports more compression techniques, such as Snappy and ZSTD, and Exit having a configurable delay.

The Plus version of library dependencies is being updated to exploit new capabilities or ensure Fluent Bit isn’t using libraries with vulnerabilities.

Additional resources

- Chronosphere announcement

- Announcement YouTube video

- book
- My links to technical resources – we’ve extended to include Zig related resources

Open API’s Arazzo and overlay specifications

03 Thursday Apr 2025

Posted by mp3monster in APIs & microservices, General, Technology

≈ Leave a comment

Tags

Apache, Apache Camel, Async API, Azarro, BPEL, business process orchestration language, GeoJSON, GraphQL, OAI, OAS, Open API, orchestration, Overlays, PolyAPI, SOAP, specifications, standards, Swagger, WS-BPEL

The OpenAPI Specification OAS and its Open API Initiative (OAI)—the governing body—have been around for 10 years, and of course, OAS’s foundation, Swagger, has been around a lot longer. OpenAPI is very much a mature proposition. But the OAI community hasn’t stood still. Two standards have been developed, the first being Overlays and the latter being Arazzo.

Overlays

Overlays support the Arazzo specification. So let’s start there. It is a simple specification that describes how an OpenAPI definition can be extended, particularly for providing additional information about the API. While we don’t strictly need such a specification, as the OpenAPI spec provides the means to incorporate additional information, it doesn’t say how to best use the extension points to support use cases such as elaborating on the application.

This means an organisation could use an overlay to describe how internally particular APIs from mainly 3rd party APIs or standards can or should best be used. For example, if we built an API using GeoJSON for passing data describing no-fly zones (sometimes called prohibited airspace), the zone’s shape is easily expressed as a polygon or circle. However, no-fly zones can often have ceilings or base altitudes (consider the use of airspace for military low-altitude air training, which shouldn’t impact airliners at cruising altitude). GeoJSON can support this by attaching attributes to the shapes. What GeoJSON doesn’t describe is the name of the additional attributes. We can document this attribute using the overlay without refining the GeoJSON specification.

Simply put, an Overlay describes a structured way to add detail to an API without changing the original specification. Hopefully, we’ll see tooling to take the overlay detail, merge that content into the original specification, and generate enhanced API documentation.

This presents some interesting possibilities. With the rise of AI, we could potentially use it to provide a structured explanation to an LLM that can then take the additional information to generate the code needed to build functionality using a selected API, which could then be reused when an API is updated. While asking an LLM to generate code will not guarantee the same result (the result of reranking, ongoing training, etc), it is unlikely things will drift radically. This means any breaking changes in the API should be more easily absorbed.

Arazzo

Arazzo, takes the ability to define overlays to APIs a step further, as it leverages the OpenAPI overlay concept to define workflows that can be used to show how APIs can be orchestrated. This is hardly a new idea. Before RESTful APIs became dominant, we saw various standards complementary to WSDL, such as WS-BPEL (bringing BPEL together with WSDL). After open source solutions, which may have closer alignment to languages such as Apache Camel, they also provide the means to define orchestration of APIs that can be used in a language-agnostic manner.

Unlike OAS and Overlays, this standard is not being presented a contract, which will always need a specific way of being written to minimize ambiguity as it is effectively a contract between two or more parties (we even see this in the way contracts are drawn up, from NDAs to T&Cs and Liability disclaimers). It is being presented as a means to be illustrative of API use, where ambiguity can be tolerated (by being stateless, we have to accept some ambiguity in how people will use APIs and eliminate ambiguity through contractual clarity.

While Arrazo’s structure and schema are much easier to work with than BPEL, particularly if you’re comfortable with AOS, as the schema has a similar style and weaves OAS specifications as first-class citizens. My concern is that BPEL, and the more domain-specific orchestration definitions, while adopted by some more prominent organisations in the search for standardisation and consistency, never had a profound impact; most organizations ended up extending, tailoring it, or using the notation as a means to apply effective configuration management. Only time will tell whether Arazzo will make a profound impact. There are certainly some headwinds for Arazzo to overcome. Consider these …

The LLM domain is evolving so quickly that we aren’t too far away from mainstream tool vendors that have built or acquired companies like Poly API, which can document and integrate APIs using LLMs. We can also look at LangGraph’s work on developing AI agents’ ability to orchestrate tools such as APIs to solve complex problems. Remember that LangGraph was launched in January 2023, whereas the Arazzo committee was formed mid-2021.
If we can’t reach a point where natural language will be sufficient to see APIs orchestrated in a predictable manner, is it possible to describe sufficient information using structured English (language)? PlantUML and Mermaid diagrams provide sufficient structured English to achieve the goal, which is less sensitive to things like positioning and white space, such as YAML.

Personal wish

While I applaud Overlays as they allow me to add qualification to an existing API (contract), I would be happier if the OAI worked to find a way for the core OAS syntax to bring OAS and Async API (very possible as Async API makes use of a similar schema structure) without needing the additional complexity of the orchestration concepts in Arazzo. The North Star ideal would be a means to weave GraphQL capabilities into the notation without complexity, although, to be honest, this is a lot further apart, maybe too far apart today.

Today, we must use more advanced (often commercial) tools that combine the notations in a single tool or multiple plugins sourced from different places. These tools are not aligned and don’t offer a seamless experience, e.g., defining JSON structures that could work across multiple APIs.

Additional Reading

Fluentd Labels and Fluent Bit

31 Monday Mar 2025

Posted by mp3monster in Fluentbit, Fluentd, General, Technology

≈ 1 Comment

Tags

configuration, Fluent Bit, Fluentd, labels, migrating, migration, regex, relabelling, tag, tags

Recently, a question came up on the Fluent Bit Slack group about how the Fluentd Label feature and the associated relabel plugin map from Fluentd to Fluent Bit.

Fluentd Labels are a way to implement event routing, and we’ll look more closely at them in a moment. Both Fluent Bit and Fluentd support the control of which plugins respond to an event by using tags, which are based on a per-plugin basis.

To address this, let’s take a moment to see what Fluentd’s labels do.

What do Fluentd’s Labels Do?

Fluentd introduced the concept of labels, where one or more plugins could be grouped, creating a pipeline of plugins. Events can also be labeled, effectively putting them into a pipeline that can be defined with that label.

As a result, we would see something like:

<input>
  Label myLabel
</input>

<input>
  Label myOtherLabel
</input>

<label myLabel>
    <filter>
         -- do something to the log event
    </filter>

    <filter>
         -- do something to the log event
    </filter>

    <output>
    </output>
</label>

<label myOtherLabel>
    <filter>
         -- do something to the log event
    </filter>

    <output>
    </output>
</label>

<output tagName>
</output>

Fluentd’s labelling essentially tries to simplify the routing of events within a Fluentd deployment, particularly when multiple plugins are needed, aka a pipeline.

Fluent Bit’s routing

Fluent Bit doesn’t support the concept of labels. While both support tags can include wildcards within the tag name, Fluent Bit has an extension that adds power to the routing using tags. Rather than introducing an utterly separate routing control, it extended how tags can be used by allowing the matching to be achieved through regular expressions, which is far more flexible. This could look like (using classic format):

[INPUT]
    Name plugin-name
    Tag  myTag

[INPUT]
    Name another-plugin-name
    Tag  myOtherTag

[INPUT]
    Name alternate-plugin-name
    Tag  myAlternateTag

[OUTPUT]
    Name   output-plugin-name
    Match  myTag

[OUTPUT]
    Name   another-output-plugin
    Match_regex  my[Other|Alternate]Tag

The only downsides to processing regular expressions this way are the potentially greater computational effort (depending on the sophistication of the regular expression) and the use of the match on every plugin.

Migration options

The original question was prompted by the idea of migrating from Fluentd to Fluent Bit. When considering this, Labels don’t have a natural like-for-like transition path.

There are several options …

Refactor to use a structured tag approach
Adopt REGEX matches against tags
Separate Fluent Bit deployments

Refactor

Often, tags originate from a characteristic or direct attribute of the event message (payload). Instead, treat a tag purely as a routing mechanism, design a hierarchical routing strategy based on domains, and then use your tags for just this purpose. Aligning tags to domains rather than technical characteristics will help.

This creates the opportunity to progressively refactor out the need for labels. this will then make the transition through

REGEX

An alternative to this is to adopt regular expressions to select appropriate tags regardless of their structure, naming convention, or use of case. While this is very flexible, the expressions can be harder to maintain, and if some tags are driven by event data, there is an element of risk (although likely small) of an unexpected event being caught and processed by a plugin as it unwittingly matches a regular expression.

Multiple Fluent Bit Instances

Fluent Bit’s footprint is very small, notably smaller than that of Fluentd, as no runtime components like Ruby are involved. This means we could deploy multiple instances, with each instance acting as an implicit pipeline for events to be processed. The downside is that the equivalent of relabelling is more involved, as you’ll have to have a plugin to explicitly redirect the event to another instance. We also need to ensure that the events start with the correct Fluent Bit node.

Conclusion

When we try to achieve the same behaviour in two differing products that do have feature differences, trying to force a new product to produce exactly the same behaviour can result in decisions that can feel like compromises. In these situations, we tend to forget that we may have made trade-off decisions that led to the use of a feature in the first place.

When we find ourselves in such situations, while it may feel more costly, it may be worth reflecting on whether it is more cost-effective to return to the original problem, design the solution based on the new product’s features, and maximize the benefit.

Fluentd to Fluent Bit Portability a possibility?

This may start to sound like Fluentd to Fluent Bit portability is impossible and requires us to build our monitoring stack from scratch. Based on my experiences, these tools are siblings, not direct descendants, so there will be differences. In the work on building a migration tool (a project I’ve not had an opportunity to conclude), I’d suggest there is an 85/15 match, and migration is achievable, and the bulk of such a task can be automated. But certainly not all, and just seeking a push-button cutover means you’ll miss the opportunity to take advantage of the newer features.

Do you need a migration?

Fluentd may not be on the bleeding edge feature-wise now, but if your existing systems aren’t evolving and demanding or able to benefit from switching to Fluent Bit, then why force the migration? Let your core product path drive the transition from one tool to another. Remember, the two have interoperable protocols – so a mixed estate is achievable, and for the most part will be transparent.

Fluent Bit and AI: Unlocking Machine Learning Potential

30 Monday Dec 2024

Posted by mp3monster in Fluentbit, General, Technology

≈ Leave a comment

Tags

AI, artificial-intelligence, Cloud, Data Drift, development, Fluent Bit, GenAI, Machine Learning, ML, observability, Security, Technology, Tensor Lite, TensorFlow

These days, everywhere you look, there are references to Generative AI, to the point that what have Fluent Bit and GenAI got to do with each other? GenAI has the potential to help with observability, but it also needs observation to measure its performance, whether it is being abused, etc. You may recall a few years back that Microsoft was trailing new AI features for Bing, and after only having it in use for a couple of days, it had been recorded generating abusive comments and so on (Microsoft’s Tay is such an example).

But this isn’t the aspect of GenAI (or the foundations of AI with Machine Learning (ML)) I was thinking about. Fluent Bit can be linked to GenAI through its TensorFlow plugin. Is this genuinely of value or just a bit of ‘me too’?

There are plenty of backend use cases once the telemetry has been incorporated into an analytics platform, for example:

Making it easy to query and mine the observability data, such as natural language searching – to simplify expressing what is being looked for.
Outlier / Anomaly detection – when signals, particularly metrics, diverge from the normal patterns of behavior, we have the first signs of a problem. This is more Machine Learning than generative AI.
Using AI agents to tune monitoring thresholds and alerting scenarios

But these are all backend, big data style use cases and do not center on Fluent Bit’s core value of getting data sources to appropriate destination systems for such analysis or visualization.

To incorporate AI into Fluent Bit pipelines, we need to overcome a key issue – AI tends to be computationally heavy – making it potentially too slow for streams of signals being generated by our applications and too expensive given that most logs reflecting ‘business as usual’ are, in effect, low value.

There are some genuine use cases where lightweight AI can deliver value. First, we should be a little more precise. The TensorFlow plugin is the TensorFlow Lite version, also known as LiteRT. The name comes from the fact that it is a lite-weight solution intended to be deployable using small devices (by AI standards). This fits the Fluent Bit model of having a small footprint.

So, where can we put such a use case:

Translating stack traces into actionable information can be challenging. A trained ML or AI model can help classify and characterize the cause of a stack trace. As a result, we can move from the log to triggering appropriate actions.
Targeted use cases where we’ve filtered out most signal data to help analyze specific events – for example, we want to prevent the propagation of PII data downstream. Some PII data can be easily isolated through patterns using REGEX. For example, credit card IDs are a pattern of 4 digits in 4 groups. Phone numbers and email addresses can also be easily identified. However, postal addresses aren’t easy, particularly when handling multinational addresses, where the postal code/zip code can’t be used as an indicative pattern. Using AI to help with such checks means we must filter out signals to only examine messages that could accidentally carry such information.

When adopting AI into such scenarios, we have to be aware of the problems that can impact the use of ML and AI. These use cases are less high profile than the issues of hallucinations but just as important. As we’re observing software, which will change over time. As a result, payloads or data shifts (technically referred to as data drift) and the detection rate can drop. So, we need to measure the efficacy of the model. However, issues such as data drift need to be taken into account, as the scenario being detected may change in volume, reflecting changes in software usage and/or changes in how the solution works.

There are ways to help address such considerations, such as tracking false positive outcomes, and if the model can provide confidence scoring, is there a trend in the score?

Conclusion

There are good use cases for using Machine Learning (and, to an extent, Artificial Intelligence) within an observability pipeline – but we have to be selective in its application as:

The cost of the computation can outweigh the benefits
The execution time for such computation can be notably slower than our pipeline, leading to risks of back pressure if applied to every event in the pipeline.
The effectiveness and how much data drift might occur (we might initially see very good results, but then things can fall off).

Possibly, the most useful application is when the AI/ML engine has been trained to recognize patterns of events that preceded a serious operational issue (strictly, this is the use of ML).

Forward-looking

The true potential for Gen AI is when we move beyond isolating potential faults based on pattern recognition to using AI to help recommend or even trigger remediation processes.

Fluent Bit 3.2: YAML Configuration Support Explained

23 Monday Dec 2024

Posted by mp3monster in Fluentbit, General, Technology

≈ Leave a comment

Tags

book, Cloud, config, configuration, development, Fluent Bit, parsers, streams, stream_task, YAML

Among the exciting announcements for Fluent Bit 3.2 is the support for YAML configuration is now complete. Until now, there have been some outliers in the form of details, such as parser and streamer configurations, which hadn’t been made YAML compliant until now.

As a result, the definitions for parsers and streams had to remain separate files. That is no longer the case, and it is possible to incorporate parser definitions within the same configuration file. While separate configuration files for parsers make for easier re-use, it is more troublesome when incorporating the configuration into a Kubernetes deployment configuration, particularly when using a side-car deployment.

Parsers

With this advancement, we can define parsers like this:

Classic Fluent Bit

[PARSER]
    name myNginxOctet1
    format regex
    regex (?<octet1>\d{1,3})

YAML Configuration

parsers:
  - name: myNginxOctet1
    format: regex
    regex: '/(?<octet1>\d{1,3})/'

As the examples show, we swap [PARSER] for a parsers object. Then, each parser is an array of attributes starting with the parser name. The names follow a one-to-one mapping in most cases. This does break down when it comes to parsers where we can define a series of values, which in classic format would just be read in order.

Multiline Parsers

When using multiline parsers, we must provide different regular expressions for different lines. In this situation, we see each set of attributes become a list entry, as we can see here:

Classic Fluent Bit

[MULTILINE_PARSER]
  name multiline_Demo
  type regex
  key_content log
  flush_timeout 1000
  #
  # rule|<state name>|<regex>|<next state>
  rule "start_state" "^[{].*" "cont"
  rule "cont" "^[-].*" "cont"

YAML Configuration

multiline_parsers:
  - name: multiline_Demo
    type: regex
    rules:
    - state: start_state
      regex: '^[{].*'
      next_state: cont
    - state: cont
      regex: "^[-].*"
      next_state: cont

In addition to how the rules are nested, we have moved from several parameters within a single attribute(rule) to each rule having several discrete elements (regex, next_state). In addition to this, we have also changed the use of single and double quote marks.

If you want to keep the configurations for parsers and streams separate, we can continue to do so, referencing the file and name from the main configuration file. While converting the existing conf to a YAML format is the bulk of the work, in all likelihood, you’ll change the file extension to be .YAML will means you must also modify the referencing parsers_file reference in the server section of the main configuration file.

Streams

Streams follow very much the same path as parsers. However, we do have to be a lot more aware of the query syntax to remain within the YAML syntax rules.

Classic Fluent Bit

[STREAM_TASK]
  name selectTaskWithTag
  exec SELECT record_tag(), rand_value FROM STREAM:random.0;

[STREAM_TASK]
  name selectSumTask
  exec SELECT now(), sum(rand_value)   FROM STREAM:random.0;

[STREAM_TASK]
  name selectWhereTask
  exec SELECT unix_timestamp(), count(rand_value) FROM STREAM:random.0 where rand_value > 0;

YAML Configuration

stream_processor:
  - name: selectTaskWithTag
    exec: "SELECT record_tag(), rand_value FROM STREAM:random.0;"
  - name: selectSumTask
    exec: "SELECT now(), sum(rand_value) FROM STREAM:random.0;"
  - name: selectWhereTask
    exec: "SELECT unix_timestamp(), count(rand_value) FROM STREAM:random.0 where rand_value > 0;"

Note, it is pretty common for Fluent Bit YAML to use the plural form for each of the main blocks, although stream definition is an exception to the case. Additionally, both stream_processor and stream_task are accepted (although stream_task is not recognized in the main configuration file)..

Incorporating Configuration directly into the core configuration file

To support directly incorporating these definitions into a single file, we can lift the YAML file contents and apply them as root elements (i.e., at the same level as the pipeline, and service, for example).

Fluent Bit book examples

Our Fluent Bit book (Manning, Amazon UK, Amazon US, and everywhere else) has several examples of using parsers and streams in its GitHub repo. We’ve added the YAML versions of the configurations illustrating parsers and stream processing to its repository in the Extras folder.

Share this:

AI-supported development

Low Code

Development acceleration narrowing

Evolution of AI that could change low-code?

What if …

Merging of assistive technologies

Conclusion

Share this:

Not vinyl, but…

Super Deluxe Editions (SDE)

Subscriptions

Books

Kit

Artwork

Conclusion

Share this:

New releases

Coloured Vinyl sources

Numbered Editions

Dinked Editions

Signed artwork

Vinyl weight and Audiophile pressings

Box sets

Vinyl Bootlegs

Rare Vinyl

Finding Your Indie Store

Conclusion

Share this:

Share this:

Security Improvements

Processor Improvements

Trace Sampling

Conditionality in Processors

Plugins with Zig

Not Only, but Also

Additional resources

Share this:

Overlays

Arazzo

Personal wish

Additional Reading

Share this:

What do Fluentd’s Labels Do?

Fluent Bit’s routing

Migration options

Refactor

REGEX

Multiple Fluent Bit Instances

Conclusion

Fluentd to Fluent Bit Portability a possibility?

Do you need a migration?

More reading

Share this:

Conclusion

Forward-looking

Share this:

Parsers

Classic Fluent Bit

YAML Configuration

Multiline Parsers

Classic Fluent Bit

YAML Configuration

Streams

Classic Fluent Bit

YAML Configuration

Incorporating Configuration directly into the core configuration file

Fluent Bit book examples

Share this: