All things general

Selling a music collection

19 Thursday Feb 2026

Tags

album, cd, Music, records, vinyl, collecting, vinyl-records, rock

This may come across as maudlin or possibly depressing, but as the popular financial advisor Martin Lewis says, the grim reaper gets us all, and leaving your partner and family in the dark about how to pick up and manage the finances, etc., is pretty hard. But any half-committed music collector will want their collection treated with the consideration that personal effects like jewellery would have.

When I thought about this, initially I thought it just needed to be a document alongside my will, it occurred to me that the guidance would be specific to me, I’d largely apply it to many. Which is why this became a blog post.

My son is starting to make his way in the world of music and might find the courage to take on the collection. Failing that, let friends and family have a rummage before disposal is started.

My better half is not a die-hard music collector; she enjoys music, but it is more a transient pleasure. When in record stores and record fairs, it really seems like a couple of things, so the temptation to sell a collection wholesale will be there. There are people out there who will buy up entire collections or simply let a house clearing company take it away – you will get as the expression goes ‘only get pennies n the pound’ for the value. I don’t begrudge these people that act, after all, a living needs to be made, and the fun of crate digging comes from these people selling on, and sometimes they don’t recognize the value of the music they have acquired. If you want to really make someone turn in their grave, well then, the collection goes to the skip, but think of the environmental harm you’re inflicting.

Discogs is your friend

Discogs is a website that tracks the details of releases to great detail, distinguishing the releases down to specific pressing form a particular record plant. As Discogs pays for itself by also operating as an online market place, It tracks the highest and lowest prices people have paid for a release – this is the first clue as to the true value of any item in the collection. That said, rare items, which don’t change hands very often will have prices that might not be representative.

If the collection isn’t on Discogs, it might be worth adding. The process is easy enough with any device that has a camera and a browser. You start by simply scanning the barcode (best to use an app integrated with Discogs). Most of the time, the app will find one ore matching results. You might get more than 1 as the barcode can sometimes represent multiple different pressings (typically because a standard printed sleeve may be used, but the vinyl may be from different plants or issuing cycles). When this happens, you’ll need to choose the correct version. This can best be addressed by looking at any information included that relates to run out groove details on vinyl, or its equivalent for CDs. The Discogs guides will help you better understand this.

My Discog possible errors

For my collection, there are a few details worth keeping mind, firstly when I first cataloged the collection with Discogs we already had a lot, so when there lots of versions I selected the one, that had the ‘headline details’ that matched – sleeve type, colour, release date. But, during this phase there is a chance I choose the wrong one, so it’s worth checking before selling. Why check? Well, like books, first pressings usually fetch more value. In some cases certain pressing plants have been noted to produce higher quality pressings.

Everything released since the late 80s onwards in my collection will be likely be first issues/pressings. This can be verified since the addition date will be within days of the release date.

Grading

All media when sold, is sold with a condition score from mint, near mint down, and this is applied to both vinyl, CD etc. and a separate assessment for the sleeve. Here, Discogs can help as the scoring system I well described in their guide. For my personal collection, very nearly everything will score highly; there are a couple of exceptions where quality was compromised as an acceptance of lower quality when I’ve sourced through crate digging (virtual or real), which is fairly small.

So, how can I claim this, well …

Media stored properly, never left out when not being played. vinyl is never stacked (a cause of warping) or even leaning.
We’ve stored vinyl with antistatic sleeves, very nearly exclusively using Nagaoka Discfile 102s. These are considered by many as the Rolls-Royce of anti-static sleeves.
Media has been well stored – record cases, replaced with custom flight cases, and now professional-grade outer sleeves in an IKEA Kallax setup (considered good for vinyl as it can handle the weight).
With the advent of the Digipak (folding card sleeves) for CDs we’ve protected them with sleeves so they don’t scuff etc.
Vinyl was never played to death. I used to copy everything to cassette for freedom and casual listening. CDs never got played in cars (a classic source of scratching and tarnishing) – they were copied to CDR or minidisc and later hard disks, to copy onto USB sticks for portability.

My music collection has been cared for in part as I’ve had to work to pay for nearly everything, from paper rounds to Saturday jobs and so on.

Understanding Valuation

Valuation isn’t just driven purely by the quality of the media or rarity (which can be from deliberately limiting numbers produced, to production errors).

The value of any album or single doesn’t often make sense, this because some artists seem to attract collectors. Depeche Mode for example is a mainstream artist that has this kind of community. But others, maybe pretty obscure but do well, these are often what can be described as an artist’s artist. In other words an artist that has been admired or influential for other artists, as a result you get a ‘cognoscenti’ culture.

There are also factors such as the record label involved, an original Chess records release will be highly prized, because of the import of the label.

Trying to identify what influences value without getting into the head of the collector community isn’t easy. But I’ve tried to distill some easy to spot influencers. These are certainly true for my collection.

The bottomline is the closer to the best possible price for any artefact, the more you’re going to need to understand that collector community. If you’re honoring wishes, of not letting a collection go for rock bottom prices, then we’d recommend checking prices on several web sites such as Discogs, EBay and others.

Of course if you’re dealing with a large collection, you need to filter down what is run of the mill vs potentially valuable. The following are general quick clues:

Singles (they rarely get repressed, have tracks that don’t show up on other releases).
Numbering on the sleeve, the smaller the batch the greater the possible value
Signed by the artist
Die cut, lenticular covers
Box sets often have extra content not available elsewhere and are produced in smaller numbers.
Anything produced before the mid 1950s
coloured vinyl (picture discs can fall into this category)
In North America and Europe, there is value in Japanese releases (usually with a mobile strip – paper strip wrapping the recording.
Bootleg recordings – usually live recordings
Vinyl releases with gatefold sleeves for albums with only one piece of vinyl or booklets (this costs money for no real gain other than possibly driving up early sales).
Labelled as a Record Store Day release (indicates limited issue).

CD valuation

The pricing of CDs generally have cratered, this is a combination of vinyl gaining popularity again. Easier to create fakes, and volumes, and unlike Vinyl, generally don’t go out of print, as we’re able to produce pretty much on demand now.

But this isn’t true for everything. CD singles have definitely retained and even gained value, this can be attributed to:

They have versions of tracks or even extra songs that haven’t made it to the streaming platforms.
The singles have definitely gone out of print.
Unusual sleeves, die cut, lenticular, different artwork.
Singles by the 90s saw smaller production numbers.

Some record labels were prepared to do things, particularly with singles to drive up sales, which meant chart positions, which helped propel album sales.

For me, we collected a lot of CD singles because of all the extra tracks that didn’t make it to albums. It became common to also release multiple versions of singles, and yes with my favourite artists, or those with a reputation to invest in remixes or B sides I’d eat all the versions, which together could boost the value.

Some CDs experienced limited runs, with sleeves sometimes having numbering printed on them, or hand signed by the artist. A benefit of buying directly from artist websites, as soon as the album was announced.

Provenance

Provenance particularly for signed albums can be tricky at times. Sometimes the delivery note, may record that the release may be a special edition,, or signed etc. sometimes, the unique characteristic may be acknowledged on a delivery note, but usually it would mention the value on the website and web order, which I usually saved as a PDF in among the record of purchases kept electronically.

Pricing guides

Aside from Discogs, there are other places to try and ascertain value. There is the Rare Record Price Guide book, which picks up on the better-known collectables, but I’ve found its prices are often below potential. Then there is Record Collector magazine, which provides a way to list sales, and you can also see what people are selling for.

Selling

While Discogs is one option for selling, eBay is another, and there appears to be greater tolerance for price ranges (and lower fees than on Discogs). When it comes to editions with distinct uniqueness/value, another option to maximise value is to sell via fan websites (and Facebook pages), where people are more likely to recognise the release’s value, such as a complete set of CD singles, a first pressing, etc.

The watchword here is be patient, don’t skimp on the postal packing, we’ve hung onto some packaging, but selling online will need a lot more. Sourcing these from mainstream channels will make this expensive. Go to a specialist like Covers33 and buy in quantity.

Useful resources

Discogs
Record Collector magazine
Covers33
Popslike website – we’ve not used, but has been collecting auction price data.
Rare Record Price Guide

Intersection of API documentation and Search optimisation

18 Wednesday Feb 2026

Posted by mp3monster in APIs & microservices, General, Technology

≈ Leave a comment

Tags

API, API Commons, APIs.io, Async API, IETF, OAS, RFC, specifications, Technology, WELL-KNOWN

I have long said that APIs are more than just payload definitions. For public APIs or those being made available within large organisations, where discoverability and making it easy for APIs to be adopted are just as important to the definition of the API. API specification standards such as Open API Specification and Asynchronous API Specification have addressed some of the challenges, the specifications don’t address everything.

Diagram I was using in 2021, to convey the point that an API involves more than simply a specification – https://www.slideshare.net/slideshow/api-more-than-payload/247135283?utm_source=clipboard_share_button&utm_campaign=slideshare_make_sharing_viral_v2&utm_variation=control&utm_medium=share

I’ve recently been working on the public API and integration strategy for a product, and revisiting this very point to ensure there is time for the wider needs.

There is good news in this area. The IANA-governed WELL-KNOWN path has been added to as a result of the IETF RFC 9727, which defines a way of structuring a list of defined APIs for the api-catalog path (e.g., https://www.example.com/.well-known/api-catalog).

The URI then returns a JSON-based payload using the relatively new media type of linkset (application/linkset+json), which is consists of an anchor to the actual API. Additional attributes (per IETF recommendations) can be supplied to reference metadata, such as the API specification. and other attributes made up of an href and a type. The RFC includes a set of suggested additional metadata references, which could cover details such as:

Attribute Name	Description
service-doc	Link to the API specification document, such as the Open API Specification
status	Link to the endpoint providing status information the API, for example whether the service is currently available.
service-meta	Additional machine readable metadata that maybe needed.
availability	Details about the service availability, e.g. are there any SLAs/SLOs, when maintenance windows may occur etc
performance	Details of any rate limits imposed upon the API
usage	The provider of the api-catalog may wish to correlate requests to the /.well-known/api-catalog URI with subsequent requests to the API URIs listed in the catalog
current	Provide information about which services may be deprecated or no longer available.
	Suggested additional attributes.

Given this, the response data structure could look something like this:

			
{"linkset": [
     {
      "anchor": "https://developer.example.com/apis/foo_api",
      "service-desc": [
       {
        "href": "https://developer.example.com/apis/foo_api/spec",
        "type": "application/yaml"
       }
      ],
      "status": [
       {
        "href": "https://developer.example.com/apis/foo_api/status",
        "type": "application/json"
       }
      ],
      "service-doc": [
       {
        "href": "https://developer.example.com/apis/foo_api/doc",
        "type": "text/html"
       }
       ]
       },
       [#... next API service]
}

		

This opens the door to referencing additional content that may not be available in the current API specifications (and for wrapping structural context around GraphQL APIs, which don’t have as good documentation semantics as OAS, for example). But, as we’ve mentioned, even OAS doesn’t provide an easy way to publish references to SDKs for example, which using the catalog could not be easily referenced. The question is: how do we identify these different building blocks? Here, apicommons.org can come to our rescue. Rather than provide a highly prescriptive specification like the OpenAPI Specification and others. It has reviewed the different standards and practices and identified the commonly needed resources, ranging from authentication details to versioning strategies. Each entry provides recommended tags or attributes to use, which map perfectly into the additional entries for the IETF 9727 linkset.

If you review all the aspects described in apicommons.org, you’ll note that the potential list of resources that you may wish to offer is extensive, and may not be best suited to all being in each catalog entry. This isn’t an issue. Alongside the API Commons site, there is another partner site, APIS.json. This offers a machine-readable API definition. It does not seek to compete with OAS, etc., or to disrupt or displace standards such as OAS; doing so would be a real uphill struggle, as it is too well adopted (to the point we’ve seen good alternatives such as API Blueprint, and RAML fall away to minimal or no advancement). APIs.json is a wrapper definition to the likes of OAS, providing a standardization of the elements identified by APICommons. So we could simplify our catalog to the mandatory API endpoint and a supporting reference to the API.json, which will bring all the resources mentioned together.

While these sights place a clear emphasis on JSON-centric APIs, nothing prevents much of this from being applied to non-JSON APIs (in fact, if you look at the spec for JSON.APIs, you’ll see references to technologies such as WADL and others). The only JSON restriction is that files must be defined in JSON or YAML.

Human-searchable API catalog

Until a few years ago, the Programmable Web website managed a curated catalog of APIs – and it was a fantastic resource, both for discovering potential third-party resources, but also ideas on how to best model your own API specifications, as the API Evangelist blogged that site has since closed its doors. With all the above efforts, there is clearly an opportunity to automate curation of public API services, which is much more viable. This is the direction APIs.io is taking: it is not crawling the web to collect API documents. However, given the support of a search engine and the WELL-KNOWN specification, it is certainly achievable and more a question of funding and, probably, investment in functionality to filter out poor or incomplete API specifications. Currently, to have an API integrated into the catalog, you must manually submit your APIs.json specification.

When using APIs.io, APIS.json, and API Commons, we should treat these resources with respect; they’re beautifully clean sites, knowledge-rich, with no advertising to fund them, and being paid for by the likes of Kin Lane (API Evangelist), Nicolas Grenier (Typeform Inc.), and Steven Willmott (Timewarp Labs).

Useful links

Fluent Bit classic to YAML

16 Friday Jan 2026

Posted by mp3monster in development, Fluentbit, Fluentd, General, python, Technology

≈ Leave a comment

Tags

Fluent Bit, Fluentbit, tool, utility, YAML

Many of the most potent features being added to Fluent Bit are only available in the newer YAML format. At the same time Fluent Bit is celebrating its 10th anniversary, so we can be sure that are some established deployments that are being maintained with the classic format.

The transition between the two Fluent Bit formats can be a bit fiddly. We built a tool as described in my Fluent Bit config from classic to YAML post using Java over a year ago. While it addressed many of the needs, I wasn’t entirely happy with the solution, for several reasons:

Written in Java, which, as a language, is the one I’m most at home with, as an implementation, it does increase the workload for using and implementing – needing a container, etc., ideally. Not unreasonable, but as a small tool project, we’re not working with a nice enterprise build pipeline, so from a code tweak to test is just fiddlier. I’m sure any Java devs reading this will be shouting, ” Setting up Maven, Jenkins, JUnit isn’t difficult – and they’re right, but it’s a time-consuming distraction from working on the real problem.
While it works, the code didn’t feel as clean as it could or should.

Over the last couple of years, we’ve become more comfortable with Python, and messing around with LLMs and code generation has helped accelerate our development skills. Python offers a potentially better solution: many environments that use Fluentd or Fluent Bit are Linux-based and include Python by default, so you can use the tool without messing with containers if you don’t want to. For a tool intended to support development and maintenance of configurations rather than in production being able to use or modify/extend the code easily is attractive.

We’ve started afresh and built a new implementation that is more extensible and maintainable. The new Python implementation is under the same Apache 2 license, but in a different repo so that existing use and forks of the Java implementation aren’t disrupted.

We can wrap Python with all the production-class build processes, but it’s not necessary to get a tool up and running or to enable people to tinker as needed. So we have focused on that. Over time, we’ll play with Poetry or other packaging mechanisms to make it even easier.

The new implementation can be found at https://github.com/mp3monster/fluent-bit-classic-to-yaml-converter

Continue reading →

Fluent Bit Processors and Processor Conditions

18 Tuesday Nov 2025

Posted by mp3monster in Fluentbit, General, Technology

≈ Leave a comment

Tags

conditional, configuration, examples, FLB, Fluent Bit, Logs and Telemetry, processors

Fluent Bit’s processors were introduced in version 3. But we have seen the capability continue to advance, with the introduction of the ability to conditionally execute processors. Until the ability to define conditionality directly with the processor, the control has had to be fudged around (putting processors further downstream and controlling it with filters and routing, etc).

In the Logs and Telemetry book, we go through the basic Processor capabilities and constraints, such as:

Configuration file constraints
Performance differences compared to using filter plugins

So we’re not going to revisit those points here.

We can chain processors to run in a sequence by directly defining the processors in the order in which they need to be executed. You can see the chaining in the example below.

Scenario

In the following configuration, we have created several input plugins using both the Dummy plugin and the HTTP plugin. Using the HTTP plugin makes it easy for us to control execution speed and change test data values, which helps us see different filter behaviours. To make life easy we’ve provided several payloads, and a simple caller.[bat|sh] script which takes a single parameter [1, 2, 3, …] which identifies the payload file to send.

All of these resources are available in the Book’s GitHub repository as part of the extras folder, which can be found here. This saves us from embedding everything into this blog.

Filters as Processors and Chaining

Filters can be used as processors as well as the dedicated processor types such as SQL asnd content modifier. The Filters just need to be referenced using the name attribute e.g. name: regex would use the REGEX filter.

Processor Conditions

If you’re familiar with Kubernetes selectors syntax, then the definition conditions for a processor will feel familiar. The condition is made up of a condition (#–A–) and the condition will contain one or more rules (#–B–). The rule defines an expression which will yield a Boolean outcome by identifying:

An element of the payload (for example, a field/element in the log structure, or a metric, etc).
The value to evaluate the field against
The evaluation operator, which can be one of the typical operators e.g eq (equals), (see below for the full list).

Since the rules are a list, you can include as many as needed. The condition, in addition to the rule , has its own operator (#–C–), which tells the condition how to combine the results of each of the rules together. As we need a Boolean value, we can only use a logical and or a logical or. When we have a single rule then the operation is tested with itself.

In the following example, we have two inputs with processors to help demonstrate the different behavior. In the dummy source, we can see how a nested element can be accessed (i.e. $<element name>[‘<child element name>‘] ), performing a string comparison. Here we’re using a normal filter plugin as a processor.

With our HTTP source, we’re demonstrating that we can have two processors with their own conditions. The first processor is interesting, as it illustrates an exception to the convention; we can express conditionality within the Lua code (#–D–), but it ignores the condition construct. It is obviously debatable as to the value of a condition for the Lua processor, but it is worth considering, as there is an overhead when calling the LuaJIT if the condition can be quickly resolved internally.

service:
  flush: 1
  log_level: debug
  
pipeline:
  inputs:
    - name: dummy
      dummy: '{"request": {"method": "GET", "path": "/api/v1/resource"}}'
      tag: request.log
      Interval_sec: 60
      processors:
        logs:
          - name: content_modifier
            action: insert
            key: content_modifier_processor
            value: true
            condition:     #--A--
              op: and      #--C--
              rules:       #--B--
                - field: $request['method']"
                  op: eq
                  value: "GET"    
    - name: http
      port: 9881
      listen: 0.0.0.0
      successful_response_code: 201
      success_header: x-fluent-bit received
      tag: http
      tag_key: token
      processors:
        logs:
          - name: lua
            call: modify
            code: |
              function modify(tag, timestamp, record)
                new_record = record
                new_record["conditional"] = "condition-triggered"
                return 1, timestamp, new_record
              end
            condition:   #--D--
              op: and
              rules:
                - field: "$classifier"
                  op: eq
                  value:  "1"
          - name: content_modifier
            action: insert
            key: content_modifier_processor2
            value: true
            condition:
              op: and
              rules:
                - field: "$classifier"
                  op: eq
                  value: "2"
          - name: sql
            query: "SELECT token, classifier FROM STREAM;"
            condition:
              op: and
              rules:
                - field: "$classifier"
                  op: eq
                  value: "3"
  outputs:
    - name: stdout
      match: "*"

To run the demonstration, we’ve provided several test payloads and a simple script that will call the Fluent Bit HTTP input plugin with the correct file. We just need to pass the number associated with the log file e.g. <Log>1<.json> is caller.[bat|sh] 1, and so on. The script is a variation of:

set fn=log%1%.json
echo %fn
curl -X POST --location 127.0.0.1:9881 --header Content-Type:application/json --data @%fn%

An example of one of the test payloads:

{"msg" : "dynamic tag", "helloTo" : "the World", "classifier" : 1, "token": "token1"}

Conclusion

Once you’ve got a measure of the condition structure, making the processors conditional is very easy.

Operators available


Operator	Greater than ( > )
eq	Equals
neq	Not Equals
gt	Greater than, or equal to ( >= )
gte	Grather than, or equal to ( >= )
lt	Less than ( < )
lte	Less than or equal to ( =< )
in	Is a value in a defined set (array) e.g. op: in value: [“a”, “b”, “c”]
not_in	Is a value not in a defined set (array) e.g. op: not_in value: [“a”, “b”, “c”]
regex	Matches the regular expression defined e.g op: regex value: ^a*z
not_regex	Does not match the regular expression provided e.g op: not_regex value: ^a*z

KubeCon Fluent Bit Release 4.2

16 Sunday Nov 2025

Posted by mp3monster in Fluentbit, General, Technology

≈ Leave a comment

Tags

4.2.0, Dead Letter Queue, DLQ, Fluent Bit, Kube Con, storage

With KubeCon North America just wrapped up, the team behind Fluent Bit has announced a new major release -version 4.2. According to the release notes, there are some interesting new features for routing strategies in Fluent Bit. We also have a Dead Letter Queue (DLQ) mechanism to handle some operational edge scenarios. As with all releases, there are continued improvements in the form of optimizations and feature additions to exploit the capabilities of the connected technology. The final aspect is ensuring that any dependencies are up to date to exploit any improvements and ensure that there are no unnecessary vulnerabilities.

At the time of writing, the Fluent Bit document website is currently only showing 4.1, but I think at least some of the 4.2 details have been included. So we’re working out what exactly is or isn’t part of 4.2.

Dead Letter Queue

Ideally, we should never need to use the DLQ feature as it is very much focused on ensuring that in the event of errors processing the chunks of events in the buffer, the chunks are not lost, so the logs, metrics, and traces can be recovered. The sort of scenarios that could trigger the use of the DLQ are, for example:

The buffer is being held in storage, and the storage is corrupted for some reason (from a mount failure, to failing to write a full chunk because storage has been exhausted).
A (custom) plugin has a memory pointer error that corrupts chunks in the buffer. I mention customer plugins, anecdotally, they will not have been exercised as much as the standard plugins; therefore, a higher probability of an unexpected scenario occurring.

The control of the DLQ is achieved through the configuration of several service configuration attributes …

storage.keep.rejected – this enables the DLQ behavior and needs to be set to On (by default, it is Off).
storage.path – is used to define where the storage of streams and chunks is handled. This is not specific to the DLQ. But by default a DLQ is created as a folder called rejected as a child of this location.
storage.rejected.path allows us to define an alternate subfolder for the DLQ to use based on storage.path.

Trying to induce behavior that needs this feature to trigger is not going to be trivial, for example if we dynamically set a tag, and there is no corresponding match this won’t get trapped in the DLQ. Although arguably this is just as good a scenario for using the DLQ (if you receive events with tags that you don’t wish to retain, it would be best to be matched to a null output plugin, so it is obvious from the configuration that dropping events is a deliberate action).

Other 4.2 (and 4.x) features

As we work out exactly what and how the new features work we’ll add additional posts. For example, we have a well-developed post to illustrate 4.x features for conditional control of processors (with demo configurations etc).

Kube Con Content

While there isn’t a huge amount of technical content from sessions at Kube Con available yet, particularly in the Telemetry and Fluent space, one blog article resulting from the conference did catch my eye: New Stack covering OpenAI’s use of Fluent Bit. The figures involved are staggering, such as the 10 petabytes of logs being handled daily.

18th Nov Update

More content showing up – will update the list below as we encounter more:

OpenAI on Fluent Bit savings – keynote presentation
Fluent Bit: Smarter Telemetry Routing, Faster Pipelines – Eduardo Silva, Chronosphere

Observations on vibe coding Fluent Bit configurations

13 Thursday Nov 2025

Posted by mp3monster in AI, development, Fluentbit, General, Technology

≈ Leave a comment

Tags

AI, Cline, code generation, configuration, Fluent Bit, Grok, LLM, LLMs, observations, plugins, prompt engineering, prompts, vibe coding

With vibe coding (an expression now part of the Collins Dictionary) and tools like Cline, which are penetrating the development process, I thought it would be worthwhile to see how much can be done with an LLM for building configurations. Since our preference for a free LLM has been with Grok (xAI) for development tasks, we’ve used that. We found that beyond the simplest of tasks, we needed to use the Expert level to get reliable results.

This hasn’t been a scientific study, as you might find in natural language-to-SQL research, such as the Bird, Archer, and Spider benchmarks. But I have explained below what I’ve tried and the results. Given that LLMs are non-deterministic, we tried the same question from different browser sessions to see how consistent the results were.

Ask the LLM to list all the plugins it knows about

To start simply, it is worth seeing how many plugins the LLM might recognize. To evaluate this, we asked the LLM to answer the question:

list ALL fluent bit input plugins and a total count

Grok did a pretty good job of saying it counted around 40. But the results did show some variation: on one occasion, it identified some plugins as special cases, noting the influence of compilation flags; on another attempt, no references were made to this. In the first attempts, it missed the less commonly discussed plugins such as eBPF. Grok has the option to ask it to ‘Think Harder’ (which switches to Exper mode) when we applied this, it managed to get the correct count, but in another session, it also dramatically missed the mark, reporting only 28 plugins.

Ask the LLM about all the attributes for a Plugin

To see if the LLM could identify all the attributes of a plugin, we used the prompt:

list all the known configuration attributes for the Fluent Bit Tail plugin

Again, here we saw some variability, with the different sessions providing between 30 and 38 attributes (again better results when in Expert mode). In all cases, the main attributes have been identified, but what is interesting is that there were variations in how they have been presented, some in all lowercase, others with leading capitalization.

Ask the LLM to create a regular expression

Regular expressions are a key way of teasing out values that may appear in semi-structured or unstructured logs. To determine if the LLM could do this, we used the following prompt:

Create a regular expression that Fluent Bit could execute that will determine if the word fox appears in a single sentence, such as 'the quick brown fox jumped over the fence'

The result provided offered variations on the expression \bfox\b (sometimes allowing for the possibility that Fox might be capitalized). In each of the cases where we went back to the LLM and added to the prompt that Note that the expression should be case sensitive – It was consistently correct.

The obvious takeaway here is that to avoid any errors, we do need to be very explicit.

Ask the LLM to create a simple filter and output

Given that we can see the LLM understands the most commonly used Fluent Bit plugins, let’s provide a simple scenario with a source log that will have some rotation, and see how it processes it. The prompt provided is:

create a classic configuration for fluent bit that will read an input from a file called myApp which has a post fix with the date. The input needs to take into account that the file extension will change at midnight. Any log entries that show an error and reference an IP need to be filtered to go to a CSV file with columns in the order tag, IP, log, timestamp

The handling of this question differed significantly across sessions: one response implemented filtering and output structuring via a Lua function, while another maximized the use of multiple filters. None of the solutions recognized that the output could reference standard Fluent Bit attributes for the tag and timestamp.

What was interesting was that the LLM also picked up on the possibility of externalizing the parser expressions, such as finding the IP. Interestingly, Fluent Bit includes a predefined set of filters, including an IPv4 filter.

Ask the LLM to use a stream processor

Stream Processors are a less commonly used feature of Fluent Bit, so it would be interesting to see how well the LLM can handle such use cases.

create a Fluent Bit YAML configuration which takes an NGINX log file input and uses a SQL stream processor to count the number of calls identified and the average time between these calls as coming from IP addresses starting 10. every minute. The NGINX logs need to be forwarded to a fluent bit node on the server named ops.myCorp.com and the output of the stream processing needs to go to a local log file called counter.txt and written to a Prometheus end point called ops_prometheus.myCorp.com

The result of this prompt comes in two parts. A main configuration:

service:
  log_level: info
  flush: 1

pipeline:
  inputs:
    - name: forward
      host: 0.0.0.0
      port: 24224
      tag: nginx

  filters:
    - name: parser
      match: nginx
      key_name: log
      parser: nginx
      reserve_data: true

  outputs:
    - name: file
      match: ip10.stats
      path: /var/log/counter.txt  # Adjust path as needed
      format: template
      template: '{ "timestamp": "{time}", "count": {count}, "avg_interval": {avg_interval} }'

    - name: prometheus_remote_write
      match: ip10.stats
      host: ops_prometheus.myCorp.com
      port: 9091  # Common port for Prometheus remote write; adjust if needed
      add_label:
        - job fluent-bit
        - instance ${HOSTNAME}

parsers:
  - name: nginx
    format: regex
    regex: '^(?<remote>[^ ]*) (?<host>[^ ]*) (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)"(?: "(?<xforwardfor>[^\"]*)")?)?$'
    time_key: time
    time_format: '%d/%b/%Y:%H:%M:%S %z'

stream_processor:
  - name: ip10_stats
    exec: "CREATE STREAM ip10_stats WITH (tag='ip10.stats') AS SELECT COUNT(*) AS count, (MAX(RECORD_TIME()) - MIN(RECORD_TIME())) / (COUNT(*) - 1) AS avg_interval FROM TAG:'nginx' WHERE remote LIKE '10.%' WINDOW TUMBLING (1 MINUTE);"

The configuration for the second node provided is:

service:
  log_level: info

pipeline:
  inputs:
    - name: tail
      path: /path/to/nginx/access.log  # Adjust to your NGINX access log path
      tag: nginx
      db: /var/log/fluent-bit-tail.db  # For position tracking

  outputs:
    - name: forward
      match: nginx
      host: ops.myCorp.com
      port: 24224

As you can see, while the configuration is well-formed, the LLM has confused the input and output requirements. This is because we have not been explicit enough about the input. The thing to note is that if we tweak the prompt to be more explicit and resubmit it, we get the same mistake, but submitting the same revised prompt produces the correct source. We can only assume that this is down to the conversational memory yielding a result that takes precedence over anything older. For transparency, here is the revised prompt with the change underlined.

create a Fluent Bit YAML configuration which takes an NGINX log file as an input. The NGINX input uses a SQL stream processor to count the number of calls identified and the average time between these calls as coming from IP addresses starting 10. every minute. The input NGINX logs need to be output by forwarding them to a fluent bit node on the server named ops.myCorp.com and the output of the stream processing needs to go to a local log file called counter.txt and written to a Prometheus end point called ops_prometheus.myCorp.com

The other point of note in the generated configuration provided is that there doesn’t appear to be a step that matches the nginx tag, but results in events with the tag ip10.stats. This is a far more subtle problem to spot.

What proved interesting is that, with a simpler calculation for the stream, the LLM in an initial session ignored the directive to use the stream processor and instead used the logs_to_metrics filter, which is a simpler, more maintainable approach.

Creating Visualizations of Configurations

One of the nice things about using an LLM is that, given a configuration, it can create a visual representation (directly) or in formats such as Graphviz, Mermaid, or D2. This makes it easier to follow diagrams immediately. Here is an example based on a corrected Fluent Bit configuration:

The above diagram was created by the LLM as a mermaid file as follows

graph TD
A[NGINX Log File
/var/log/nginx/access.log] -->|"Tail Input & Parse (NGINX regex)"| B[Fluent Bit Pipeline
Tag: nginx.logs]
B -->|Forward Output| C[Remote Fluent Bit Node
ops.myCorp.com:24224]
B -->|"SQL Stream Processor
Filter IPs starting '10.'
Aggregate every 60s: count & avg_time"| D[Aggregated Results
Tag: agg.results]
D -->|"File Output
Append template lines"| E[Local File
/var/log/counter.txt]
D -->|"Prometheus Remote Write Output
With custom label"| F[Prometheus Endpoint
ops_prometheus.myCorp.com:443]

Lua and other advanced operations

When it comes to other advanced features, the LLM appears to handle this well, but given that there is plenty of content available on the use of Lua, the LLM could have been trained on. The only possible source of error would be the passing of attributes in and out of the Lua script, which it handled correctly.

We experimented with handling open telemetry traces, which were handled without any apparent issues. Although advanced options like compression control weren’t included (and neither were they requested).

We didn’t see any hallucinations when it came to plugin attributes. When we tried to ‘bait’ the LLM into hallucinating an attribute by asking it to use an attribute that doesn’t exist, it responded by adding the attribute along with a configuration comment that the value was not a valid Fluent Bit attribute.

General observations

One thing that struck me about this is that describing what is wanted with sufficient detail for the LLM requires a higher level of focus (albeit for a shorter period) than manually creating the configuration. The upside is that we are, in effect, generating part of a detailed design document. Today, I’d go so far as to suggest including the prompt(s) used in the configuration header(s) to help people understand what is needed from the configuration.

When the LLM generates the configuration, there will be no escaping the need to understand what the plugins do and the options they offer to determine whether the LLM has got it right. If the LLM gets it wrong, we would recommend starting a fresh session.

This does raise a bit of an interesting challenge. If you need to have a fair handle on how Fluent Bit works and common ways plugins work, then actually writing your own configuration files may be better anyway, as that approach will provide more reinforcement to learning. We, of course, can use the LLM to help ‘peer review’ our configurations in the sense of checking syntax, and generating visual representations can help confirm the configuration is as we expect.

This may be a personal thing, as I find myself working with Fluent Bit in small bursts rather than continually. As a result, I remember what I can do with a plugin, but I often forget the exact names of the less commonly used attributes to effect the desired behavior. Looking up, and keying in the attribute names can take time. One thing that we have tried was to include in the prompt statements, such as:

In the configuration include all the attributes even if they are only populated with their default or example values. Add a comment for each attribute to show its surpose and whether it is mandatory or not.

What you get then is all the possible options, with the correct names (something that is easy for the LLM, as it is essentially asking it to reformat the documentation for you). Then it becomes a case of removing the attributes that aren’t needed line by line—a lot quicker and easier than manually adding them.

LLM generated plugin with all attributes described (slightly modified output to make the point clearer).

[INPUT]
   Name tail  
     # Purpose: Specifies the input plugin to use. Mandatory: yes
   buffer_chunk_size 32k   
     # Purpose: Set the initial buffer size to read file data. The value must be according to the Unit Size specification. Mandatory: no
   buffer_max_size 32k  
    # Purpose: Set the limit of the buffer size per monitored file. If reading a file exceeds this limit, the file is removed from the monitored file list. The value must be according to the Unit Size specification. Mandatory: no
   path /var/log/blah.txt  
    # Purpose: Pattern specifying a specific log file or multiple ones through the use of common wildcards. Allows multiple patterns separated by commas. Mandatory: yes
    path_key filename  
     # Purpose: If enabled, it appends the name of the monitored file as part of the record. The value assigned becomes the key in the map. Mandatory: no
  exclude_path *.gz,*.zip  
     # Purpose: Set one or multiple shell patterns separated by commas to exclude files matching certain criteria, e.g., *.gz,*.zip. Mandatory: no
  offset_key offset  
     # Purpose: If enabled, Fluent Bit appends the offset of the current monitored file as part of the record. The value assigned becomes the key in the map. Mandatory: no
  read_from_head false  
    # Purpose: For new discovered files on start (without a database offset/position), read the content from the head of the file, not tail. Mandatory: no
  refresh_interval 60  
    # Purpose: The interval of refreshing the list of watched files in seconds. Mandatory: no
  rotate_wait 5  
    # Purpose: Specify the number of extra time in seconds to monitor a file once it is rotated in case some pending data is flushed. Mandatory: no
  ignore_older 1d  
    # Purpose: Ignores files older than ignore_older. Supports m, h, d (minutes, hours, days) syntax. Mandatory: no
  ignore_active_older_files  false  
    # Purpose: Ignore files that are older than the value set in ignore_older even if the file is being ingested. Mandatory: no
  skip_long_lines off  
    # Purpose: When a monitored file reaches its buffer capacity due to a very long line (buffer_max_size), the default behavior is to stop monitoring that file. skip_long_lines alters that behavior and instructs Fluent Bit to skip long lines and continue processing other lines that fit into the buffer size. Mandatory: no
  skip_empty_lines off  
    # Purpose: Skips empty lines in the log file from any further processing or output. Mandatory: no
  db /fluent-bit/tail/blah.db  
    # Purpose: Specify the database file to keep track of monitored files and offsets. Recommended to be unique per plugin. Mandatory: no
  db.sync normal  
    # Purpose: Set a default synchronization (I/O) method. Affects how the internal SQLite engine synchronizes to disk. Values: extra, full, normal, off. Mandatory: no
  db.locking false  
    # Purpose: Specify that the database will be accessed only by Fluent Bit. Enabling this increases performance but restricts external tools from querying the content. Mandatory: no
  db.journal_mode wal  
    # Purpose: Sets the journal mode for databases (wal). Enabling wal provides higher performance. Not compatible with shared network file systems. Mandatory: no
  db.compare_filename false  
    # Purpose: Determines whether to review both inode and filename when retrieving stored file information from the database. true verifies both; false checks only the inode. Mandatory: no
  mem_buf_limit 10MB  
    # Purpose: Set a memory limit that the Tail plugin can use when appending data to the engine. If the limit is reached, it will be paused. When the data is flushed, it resumes. Mandatory: no
  exit_on_eof false  
    # Purpose: When reading a file, exit as soon as it reaches the end of the file. Used for bulk load and tests. Mandatory: no
  parser none  
    # Purpose: Specify the name of a parser to interpret the entry as a structured message. Mandatory: no
  key log  
   # Purpose: When a message is unstructured (no parser applied), it's appended as a string under the key name log. This option lets you define an alternative name for that key. Mandatory: no
  inotify_watcher true  
   # Purpose: Set to false to use file stat watcher instead of inotify. Mandatory: no
  tag blah.logs  
   # Purpose: Set a tag with regexextract fields that will be placed on lines read. Supports tag expansion: if the tag includes an asterisk (*), that asterisk will be replaced with the absolute path of the monitored file, with slashes replaced by dots. Mandatory: no
  tag_regex (?<pod_name>[a-z0-9](?:[-a-z0-9]*[a-z0-9])?(?:\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*)_(?<namespace_name>[^_]+)_(?<container_name>.+)-(?<container_id>[a-z0-9]{64})\.log$  
    # Purpose: Set a regular expression to extract fields from the filename. Mandatory: no
  static_batch_size 50M  
    # Purpose: Set the maximum number of bytes to process per iteration for the monitored static files (files that already exist upon Fluent Bit start). Mandatory: no
  file_cache_advise on  
    # Purpose: Set the posix_fadvise in POSIX_FADV_DONTNEED mode. This reduces the usage of the kernel file cache. This option is ignored if not running on Linux. Mandatory: no
  threaded false  
    # Purpose: Indicates whether to run this input in its own thread. Mandatory: no
  Unicode.Encoding none  
    # Purpose: Set the Unicode character encoding of the file data. Requests two-byte aligned chunk and buffer sizes. Supported values: UTF-16LE, UTF-16BE, and auto. Mandatory: no
  Generic.Encoding none  
    # Purpose: Set the non-Unicode encoding of the file data. Supported values: ShiftJIS, UHC, GBK, GB18030, Big5, Win866, Win874, Win1250, Win1251, Win1252, Win1254, Win1255, Win1256. Mandatory: no
  multiline.parser docker,cri  
    # Purpose: Specify one or multiple Multiline Parser definitions to apply to the content. When using a new multiline.parser definition, you must disable the old configuration from your tail section (e.g., parser, parser_firstline, parser_N, multiline, multiline_flush, docker_mode). Mandatory: no
  multiline off  
    # Purpose: If enabled, the plugin will try to discover multiline messages and use the proper parsers to compose the outgoing messages. When this option is enabled, the parser option isn't used. Mandatory: no
  multiline_flush 4  
    # Purpose: Wait period time in seconds to process queued multiline messages. Mandatory: no
  parser_firstline none  
    # Purpose: Name of the parser that matches the beginning of a multiline message. The regular expression defined in the parser must include a group name (named capture), and the value of the last match group must be a string. Mandatory: no
  parser_1 example_parser  
    # Purpose: Optional. Extra parser to interpret and structure multiline entries. This option can be used to define multiple parsers. For example, parser_1 ab1, parser_2 ab2, parser_N abN. Mandatory: no
  docker_mode off  
    # Purpose: If enabled, the plugin will recombine split Docker log lines before passing them to any parser. This mode can't be used at the same time as Multiline. Mandatory: no
  docker_mode_flush 4  
   # Purpose: Wait period time in seconds to flush queued unfinished split lines. Mandatory: no
  docker_mode_parser none  
    # Purpose: Specify an optional parser for the first line of the Docker multiline mode. The parser name must be registered in the parsers.conf file. Mandatory: no

[OUTPUT]
  Name   stdout
  Match  *
  Format json_lines

[OUTPUT] Name stdout Match * Format json_lines

I would also suggest it is worth trying the same prompt is separate conversations/sessions to see how much variability in approach to using Fluent Bit is thrown up. If the different sessions yield very similar solutions, then great. If the approaches adopted are different, then it is worth evaluating the options, but also considering what in your prompt might need clarifying (and therefore the problem being addressed).

Generating a diagram to show the configuration is also a handy way to validate whether the flow of events through it will be as expected.

The critical thing to remember is that the LLM won’t guarantee that what it provides is good practice. The LLM is generating a configuration based on what it has seen, which can be both good and bad. Not to mention applying good practice demands that we understand why a particular practice is good, and when it is to our benefit. It is so easy to forget that LLMs are primarily exceptional probability engines with feedback loops, and their reasoning is largely probabilities, feedback loops, and some deterministic tooling.

Podding the Tunes 2

02 Sunday Nov 2025

Posted by mp3monster in General, Music, Music Resources

≈ Leave a comment

Tags

Alan Cross, bone music, Broken Record, Eras, Gary Kemp, Guy Pratt, History, John Wilson, Lengend, Malcolm Gladwell, Moby, Music, Norah Jones, podcast, reviews, Rockenteurs, Stephen Coates

Podcasts come out more frequently than we’d like sometimes, and in 2018, I blogged about some of the more interesting sources (here). Since then, we’ve discovered some new ones that we like and think are worth sharing. Most of the links are to the Podtails tracker website or Podbean, no hunting for the RSS feed. So, here we go …

Rockenteurs with Gary Kemp and Guy Pratt – Rockenteurs has built a tremendous following, its success comes from the fact they personally know have worked with (or revolve in similar circles) many of the guests. This produces an immediate familiarity and a sense you’re part of a casual group conversation and everyone is relaxed and unhurried. For those less in the know Gary Kemp was part of Spandau Ballet, but since those days has performed as guitarist for higher. Guy Pratt, while not such as house hold name, is a highly respected bassist, session musician and essentially Pink Floyd’s bassist since Roger Water’s departure.
Alan Cross gives two for the money with the Ongoing History of New Music and Uncharted crime and mayhem in the music industry – Alan Cross is a Canadian music journalist, radio presenter and pod and videocaster. His primary output is the Ongoing History of New Music, which focusses on the Indie / Rock scene. But he has a second fortnightly podcast with a wider perspective. What really works here, is the depth of his knowledge, and the love of his subject (desire to see musicians do well and share the stories behind and around the music).
Bureau of Lost Culture – I came to Stephen Coates’ podcast as a result of hearing about Bone Music and reading his book, by that title. Stephen’s podcasts tend to gravitate to all aspects of music, but his focus is ‘count culture’. The subjects can look a little academic, but the way the stories are told is very human centered and explorers the impacts his subjects have had.
BBC provides a vast library of podcasts, some are regular, some are more episodic, but all are worth checking out …
- This Cultural Life – Best described as a successor to Mastertapes, as the presenter, John Wilson, has moved on to this show. Although the podcast goes beyond music to a broader cultural portfolio of guests/subjects.
- Eras – A more mainstream look at big-name artists such as Sting, Abba, Kylie with 4-6 episodes per artist in an episodic release.
- Legend – A bit like Eras, but covers artists like Joni Mitchell and Springsteen.
Artist own podcasts can be a bit hit a miss, but these have some great episodes …
- Moby Pod – Moby’s self deprecation, and history has resulted in some fascinating podcasts, both looking at music broadly (and personally) as he is as much an interviewee as interviewer on these podcasts. His name and reputation has meant he has also had some more influential names on the podcast, but these tend to be aligned with his animal rights and vegan passions. But these aren’t presented in a preaching manner, as is Moby’s way he recognizes these are his beliefs and not everyone may agree.
- Norah Jones is Playing Along – This has been an interesting podcast as Norah talks with a musician and records material with them. After the 1st season (we saw some of those recordings released as an album). There has been a cvouple of years gap between the the first series and the second once started recently (Octob ’25).
- James Lavelle (Living In My Headphones) aka Unkle – A monthly slot on Soho Radio, this is very much a DJ mix session, but the diversity of music used is fascinating.
Broken Record – Part of Malcolm Gladwell’s growing Pushkin empire of podcasts. These can be a bit hit and miss, but when they hit – the insightful, interesting and enjoyable to listen to, and among the best there is.

So when you’re not dialed in with your latest vinyl/CD/download I’d recommend checking these out.

MCP Security

30 Thursday Oct 2025

Posted by mp3monster in AI, development, General, Technology

≈ Leave a comment

Tags

AI, artificial-intelligence, attack, attacks, cybersecurity, MCP, model context protocol, Paper, Security, Technology, vectors

MCP (Model Context Protocol) has really taken off as a way to amplify the power of AI, providing tools for utilising data to supplement what a foundation model has already been trained on, and so on.

With the rapid uptake of a standard and technology that has been development/implementation led aspects of governance and security can take time to catch up. While the use of credentials with tools and how they propagate is well covered, there are other attack vectors to consider. On the surface, it may seem superficial until you start looking more closely. A recent paper Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions highlights this well, and I thought (even if for my own benefit) to explain some of the vectors.

I’ve also created a visual representation based on the paper of the vectors described.

The inner ring represents each threat, with its color denoting the likely origin of the threat. The outer ring groups threats into four categories, reflecting where in the lifecycle of an MCP solution the threat could originate.

I won’t go through all the vectors in detail, though I’ve summarized them below (the paper provides much more detail on each vector). But let’s take a look at one or two to highlight the unusual nature of some of the issues, where the threat in some respects is a hybrid of potential attack vectors we’ve seen elsewhere. It will be easy to view some of the vectors as fairly superficial until you start walking through the consequences of the attack, at which point things look a lot more insidious.

Several of the vectors can be characterised as forms of spoofing, such as namespace typosquatting, where a malicious tool is registered on a portal of MCP services, appearing to be a genuine service — for example, banking.com and bankin.com. Part of the problem here is that there are a number of MCP registries/markets, but the governance they have and use to mitigate abuse varies, and as this report points out, those with stronger governance tend to have smaller numbers of services registered. This isn’t a new problem; we have seen it before with other types of repositories (PyPI, npm, etc.). The difference here is that the attacker could install malicious logic, but also implement identity theft, where a spoofed service mimics the real service’s need for credentials. As the UI is likely to be primarily textual, it is far easier to deceive (compared to, say, a website, where the layout is adrift or we inspect URIs for graphics that might give clues to something being wrong). A similar vector is Tool Name Conflict, where the tool metadata provided makes it difficult for the LLM to distinguish the correct tool from a spoofed one, leading the LLM to trust the spoof rather than the user.

Another vector, which looks a little like search engine gaming (additional text is hidden in web pages to help websites improve their search rankings), is Preference Manipulation Attacks, where the tool description can include additional details to prompt the LLM to select one solution over another.

The last aspect of MCP attacks I wanted to touch upon is that, as an MCP tool can provide prompts or LLM workflows, it is possible for the tool to co-opt other utilities or tools to action the malicious operations. For example, an MCP-provided prompt or tool could ask the LLM to use an approved FTP tool to transfer a file, such as a secure token, to a legitimate service, such as Microsoft OneDrive, but rather than an approved account, it is using a different one for that task. While the MCP spec says that such external connectivity actions should have the tool request approval, if we see a request coming from something we trust, it is very typical for people to just say okay without looking too closely.

Even with these few illustrations, tooling interaction with an LLM comes with deceptive risks, partially because we are asking the LLM to work on our behalf, but we have not yet trained LLMs to reason about whether an action’s intent is in the user’s best interests. Furthermore, we need to educate users on the risks and telltale signs of malicious use.

Attack Vector Summary

The following list provides a brief summary of the attack vectors. The original paper examines each in greater depth, illustrating many of the vectors and describing possible mitigation strategies. While many technical things can be done. One of the most valuable things is to help potential users understand the risks, use that to guide which MCP solutions are used, and watch for signs that things aren’t as they should be.

Continue reading →

AI to Agriculture

17 Friday Oct 2025

Posted by mp3monster in General, Oracle, Technology

≈ Leave a comment

Tags

AI, artificial-intelligence, Cloud, development, Oracle, Technology

Now that details of the product I’ve been involved with for the last 18 months or so are starting to reach the public domain (such as the recent announcement at the UN General Assembly on September 25), I can talk to a bit about what we’ve been doing. Oracle’s Digital Government Global Industry Unit has been working on a solution that can help governments address the questions of food security.

So what is food security? The World Food Programme describes it as:

Food security exists when people have access to enough safe and nutritious food for normal growth and development, and an active and healthy life. By contrast, food insecurity refers to when the aforementioned conditions don’t exist. Chronic food insecurity is when a person is unable to consume enough food over an extended period to maintain a normal, active and healthy life. Acute food insecurity is any type that threatens people’s lives or livelihoods.
World Food Programme

By referencing the World Food Programme, it would be easy to interpret this as a 3^rd world problem. But in reality, it applies to just about every nation. We can see this, with the effect the war in Ukraine has had on crops like Wheat, as reported by organizations such as CGIAR, European Council, and World Development journal. But global commodities aren’t the only driver for every nation to consider food security. Other factors such as Food Miles (an issue that perhaps has been less attention over the last few years) and national farming economics (a subject that comes up if you want to it through a humour filter with Clarkson’s Farm to dry UK government reports and US Department of Agriculture.

Looking at it from another perspective, some countries will have a notable segment of their export revenue coming from the production of certain crops. We know this from simple anecdotes like ‘for all the tea in China’, coffee variants are often referred to by their country of origin (Kenyan, Columbian etc.). For example, Palm Oil is the fourth-largest economic contributor in Malaysia (here).

So, how is Oracle helping countries?

One of the key means of managing food security is understanding food production and measuring the factors that can impact it (both positively and negatively), which range from the obvious—like weather (and its relationship to soil, water management, etc.) —to what crop is being planted and when. All of which can then be overlayed with government policies for land management and farming subsidies (paying farmers to help them diversify crops, periodically allowing fields to go fallow, or subsidizing the cost of fertilizer).

Oracle is a technology company capable of delivering systems that can operate at scale. Technology and the recent progress in using AI to help solve problems are not new to agriculture; in fact, several trailblazing organizations in this space run on Oracle’s Cloud (OCI), such as Agriscout. Before people start assuming that this is another story of a large cloud provider eating their customers’ lunch, far from it, many of these companies operate at the farm or farm cooperative level, often collecting data through aerial imagery from drones and aircraft, along with ground-based sensors. Some companies will also leverage satellite imagery for localized areas to complement these other sources. This is where Oracle starts to differentiate itself – by taking high-resolution imagery (think about the resolution level needed to differentiate Wheat and Maize, or spot rice and carrots, differentiate an orchard from a natural copse of trees). To get an idea, look at Google Earth and try to identify which crops are growing.

We take the satellite multi-spectral images from each ‘satellite over flight’ and break it down, working out what the land is being used for (ruling out roads, tracks, buildings, and other land usage). To put the effort to do this into context, the UK is 24,437,600,000 square meters and is only 78^th in the list of countries by area (here). It’s this level of scale that makes it impractical to use more localized data sources (imagine how many people and the number of drones needed to fly over every possible field in a country, even at a monthly frequency).

This only solves the 1^st step of the problem, which is to tell us the total crop growing area. It doesn’t tell us whether the crop will actually grow well and produce a good yield. For this, you’re going to need to know about weather (current, forecast, and historic trends), soil chemical composition and structure, and information such as elevation, angle, etc. Combined with an understanding of optimal crop growing needs (water levels, sun light duration, atmospheric moisture, soil types and health) – good crops can be ruined by it simply being too wet to harvest them, or store them dryly. All these factors need to be taken into account for each ‘cell’ we’re detecting, so we can calculate with any degree of confidence what can be produced.

If this isn’t hard enough, we need to account for the fact that some crops may have several growing seasons per year, or succession planting is used, where Carrots may be grown between March and June, followed by Cucumbers through to August, and so on.

Using technology

Hopefully, you can see there are tens of millions of data points being processed every day, and Oracle’s data products can handle that. As a cloud vendor, we’re able to provide the computing scale and, importantly, elasticity, so we can crunch the numbers quickly enough that users benefit from the revised numbers and can work out mitigation actions to communicate to farmers. As mentioned, this could be planning where to best use fertilizer or publishing advice on when to plant which crops for optimal growing conditions. In the worst cases recognizing there is going to be a national shortage of a staple crop and start purchasing crops from elsewhere and ensure when the crops arrive in ports they get moved out to the markets (like all large operations – as we saw with the Covid crises – if you need to react quickly, more mistakes can be made, costs grow massively driven by demand).

I mentioned AI, if you have more than the most superficial awareness of AI, you will probably be wondering how we use it, and the problems of AI hallucination – the last thing you want is a being asked to evaluate something and hallucinating (injecting data/facts that are not based on the data you have collected) to create a projection. At worst, this would mean providing an indication that everything is going well, when things are about to really go wrong. So, first, most of the AI discussed today is generative, and that is where we see issues like hallucinations. We’re have and are adopting this aspect of AI where it fits best, such as explainability and informing visualization, but Oracle is making heavy use of the more traditional ideas of AI in the form of Machine Learning and Deep Learning which are best suited to heavy numerical computational uses, that is not to say there aren’t challenges to be ddressed with training the AI.

Conclusion

When it comes to Oracle’s expertise in the specialized domains of agriculture and government, Oracle has a strong record of working with governments and government agencies from its inception. But we’ve also worked closely with the Tony Blair Institute for Global Change, which works with many national government agencies, including the agriculture sector.

My role in this has been as an architect, focused primarily on applying integration techniques (enabling scaling and operational resilience, data ingestion, and how our architecture can work as we work with more and more data sources) and on applying AI (in the generative domain). We’re fortunate to be working alongside two other architects who cover other aspects of the product, such as infrastructure needs and the presentation tier. In addition, there is a specialist data science team with more PhDs and related awards than I can count.

Oracle’s Digital Government business is more than just this agriculture use case; we’ve identified other use cases that can benefit from the data and its volume being handled here. This is in addition to bringing versions of its better-known products, such as ERP, Healthcare (digital health records management, vaccine programmes, etc.), national Energy and Water (metering, infrastructure management, etc).

For more on the agricultural product:

Fluent Bit 4.1 what to be excited about in this major release?

10 Friday Oct 2025

Posted by mp3monster in Fluentbit, General, Technology

≈ Leave a comment

Tags

4.1, Fluent Bit, Fluentbit, release

Fluent Bit version 4’s first major release dropped a couple of weeks ago (September 24, 2025). The release’s feature list isn’t as large as 4.0.4 (see my blog here). It would therefore be easy to interpret the change as less profound. But that isn’t the case. There are some new features, which we’ll come to shortly, but there are some significant improvements under the hood.

Under the hood

Fluent Bit v4.1 has introduced significant enhancements to its processing capabilities, particularly in the handling of JSON. This is really important as handling JSON is central to Fluent Bit. As many systems are handling JSON, there is at the lowest levels, continuous work happening in different libraries to make it as fast as possible. Fluent Bit has amended its processing to use a mechanism based on a yyjson, without going into the deep technical details. If you examine the benchmarking of yyjson and other comparable libraries, you’ll see that its throughput is astonishing. So accelerating processing by even a couple of percentage points can have a profound performance enhancement.

The next improvement in this area is the use of SIMD (Single Instruction, Multiple Data). This is all about how our code can exploit microprocessor architecture to achieve faster processing. As logs often need characters like new line, quotes, and other special characters encoding, and logs often carry these characters (and we tend to overlook this – consider a stack trace, where each step of the stack chain is provided on a new line, or embedding a stringified XML or JSON block in the log, which will hve multiple uses of quotes etc. As a result, any optimization of string encoding will quickly yield performance benefits. As SIMD takes advantage of CPU characteristics, not every build will have the feature. You can check to see if SIMD is being used by checking the output during startup, like tthis:

This build for Windows x86-64 as shown in the 3rd info statement doesn’t have SIMD enabled.

Other backend improvements have been made with TLS handling. The ability to collect GPU metrics with AMD hardware (significant as GPUs are becoming ever more critical not only in large AI arrays, but also for AI at the edge).

So, how do we as app developers benefit?

If Fluent Bit’s performance improves, we benefit from reduced risk of backpressure-related issues. We can either complete more work with the same compute (assuming we’re at maximum demand all the time), or we can reduce our footprint – saving money either as capital expenditure (capex) or operational expenditure (opex) (not something that most developers are typically seeking until systems are operating at hyperscale). Alternatively, we can further utilize Fluent Bit to make our operational life easier, for example, by reducing sampling slightly, implementing additional filtering/payload load logic, and streaming to detect scenarios as they occur rather than bulk processing on a downstream platform.

Functional Features

In terms of functionality, which as a developer we’re very interested in as it can make our day job easier, we have a few features.

AWS

In terms of features, there are a number of enhancements to support AWS services. This isn’t unusual; as the AWS team appears to have a very active and team of contributors for their plugins. Here the improvement is for supporting Parquet files in S3.

Supervisory Process

As our systems go ever faster, and become more complex, particularly when distributed it becomes harder to observe and intervene if a process appears to fail or worse becomes unresponsive. As a result, we need tools to have some ‘self awareness’. Fluen bit introduces the idea of an optional supervisor process. This is a small, relatively simple process that spawns the core of Fluent Bit and has the means to check the process and act as necessary. To enable the supervisor , we can add to the command line –supervisor. This feature is not available on all platforms, and the logic should report back to you during startup if you can’t use the feature. Unfortunately, the build I’m trying doesn’t appear to have the supervisor correctly wired in (it returns an error message saying it doesn’t recognize the command-line parameter).

If you want to see in detail what the supervisor is doing – you can find its core in /src/flb_supervisor.c with the supervisor_supervise_loop function, specifically.