• Home
  • Site Aliases
    • www.cloud-native.info
  • About
    • Background
    • Presenting Activities
    • Internet Profile
      • LinkedIn
    • About
  • Books & Publications
    • Log Generator
    • Logs and Telemetry using Fluent Bit
      • Fluent Bit book
      • Book Resources in GitHub
      • Fluent Bit Classic to YAML Format configurations
    • Logging in Action with Fluentd, Kubernetes and More
      • Logging in Action with Fluentd – Book
      • Fluentd Book Resources
      • Fluentd & Fluent Bit Additional stuff
    • API & API Platform
      • API Useful Resources
    • Oracle Integration
      • Book Website
      • Useful Reading Sources
    • Publication Contributions
  • Resources
    • GitHub
    • Oracle Integration Site
    • Oracle Resources
    • Mindmaps Index
    • Useful Tech Resources
      • Fluentd & Fluent Bit Additional stuff
      • Recommended Tech Podcasts
      • Official Sources for Product Logos
      • Java and Graal Useful Links
      • Python Setup & related stuff
      • DevTips
  • Music
    • Monster On Music
    • Music Listening
    • Music Reading

Phil (aka MP3Monster)'s Blog

~ from Technology to Music

Phil (aka MP3Monster)'s Blog

Tag Archives: AI

OpAMP server with MCP – aka conversational Fluent Bit control

14 Tuesday Apr 2026

Posted by mp3monster in AI, chatbots, development, Fluent Observability, General, OpAMP, Technology

≈ Leave a comment

Tags

AI, chatops, Fluent Bit, Fluentd, LangGraph, LLM, MCP, OpAMP, OpenTelemetry, OTel, OTLP

I’ve written a few times about how OpAMP (Open Agent Management Protocol) may emerge from the OpenTelemetry CNCF project, but like OTLP (OpenTelemetry Protocol), it applies to just about any observability agent, not just the OTel Collector. As a side project, giving a real-world use case work on my Python skills, as well as an excuse to work with FastMCP (and LangGraph shortly). But also to bring the evolved idea of ChatOps (see here and here).

One of the goals of ChatOps was to free us from having to actively log into specific tools to mine for information once metrics, traces, and logs reach the aggregating back ends, but being able to. If we leverage a decent LLM with Model Context Protocol tools through an app such as Claude Desktop or ChatGPT (or their mobile variants). Ideally, we have a means to free ourselves to use social collaboration tools, rather than being tied to a specific LLM toolkit.

With a UI and the ability to communicate with Fluentd and Fluent Bit without imposing changes on the agent code base (we use a supervisor model), issue commands, track what is going on, and have the option of authentication. (more improvements in this space to come).

New ChatOps – Phase 1

With the first level of the new ChatOps dynamism being through LLM desktop tooling and MCP, the following are screenshots showing how we’ve exposed part of our OpAMP server via APIs. As you can see in the screenshot within our OpAMP server, we have the concept of commands. What we have done is take some of the commands described in the OpAMP spec, call them standard commands, and then define a construct for Custom Commands (which can be dynamically added to the server and client).

interaction through Claude Desktop which has been configured with our MCP server (part of our OpAMP server) showing us what can be done

The following screenshot illustrates using plain text rather than trying to come up with structured English to get the OpAMP server to shut down a Fluentd node (in this case, as we only had 1 Fluentd node, it worked out which node to stop).

Claude Desktop showing conversation to shutdown a FLuentd node

Interesting considerations

What will be interesting to see is the LLM token consumption changes as the portfolio of managed agents changes, given that, to achieve the shutdown, the LLM will have had to obtain all the Fluent Bit & Fluentd instances being managed. If we provide an endpoint to find an agent instance, would the LLM reason to use that rather than trawl all the information?

Next phase

ChatGPT, Claude Desktop, and others already incorporate some level of collaboration capabilities if the users involved are on a suitable premium account (Team/Enterprise). It would be good to enable greater freedom and potentially lower costs by enabling the capability to operate through collaboration platforms such as Teams and Slack. This means the next steps need to look something along the lines of:

Share this:

  • Share on Facebook (Opens in new window) Facebook
  • Share on X (Opens in new window) X
  • Share on Reddit (Opens in new window) Reddit
  • Email a link to a friend (Opens in new window) Email
  • Share on WhatsApp (Opens in new window) WhatsApp
  • Print (Opens in new window) Print
  • Share on Tumblr (Opens in new window) Tumblr
  • Share on Mastodon (Opens in new window) Mastodon
  • Share on Pinterest (Opens in new window) Pinterest
  • More
  • Share on Bluesky (Opens in new window) Bluesky
  • Share on LinkedIn (Opens in new window) LinkedIn
Like Loading...

Anthropic Mythos – an LLM with potent security sting

13 Monday Apr 2026

Posted by mp3monster in AI, General, Technology

≈ Leave a comment

Tags

AI, Anthropic, Mythos, Security

There has been a rapidly growing series of articles being written about the limited launch of Mythos, a new LLM. The evolution of models has helped quickly advance AI-assisted software development. But the capabilities of Mythos and Project Glasswing that really grabbed attention and concern.

Glasswing is an initiative that allows major partner software and service vendors to access the Mythos model. This is because Mythos has made significant advances in identifying software vulnerabilities and generating exploits for them. This has been illustrated by Anthropic’s Red team – which found bugs in OpenBSD (OS) that have evaded detection for as much as 27 years. While the BSD family of operating systems isn’t as pervasive as Linux, they both share a similar open ethos and a sufficient community to keep them active and maintained. The underlying message here is that we can find and exploit such vulnerabilities, and there are certainly opportunities to do so elsewhere, in software that can affect a great many more users, such as Firefox.

Having key software vendors, such as OS and browser vendors, get access is certainly a positive step, but it doesn’t address a key consideration. Applying code fixes and releasing updates does not, by itself, equate to being safer. The true challenge is for end users and organisations to recognise the need to roll out updates quickly. This is where the source of true concern should be. The concerns …

  • Organisations don’t always release patches as soon as they’re available. There is an element of testing to ensure no adverse impact on each organisation’s setup. Even with simple browser changes, something affects the app’s behaviour.
  • Change represents risk, and organisations that experience issues during rollouts become increasingly risk-averse. Ironically, this is counterintuitive, but a very human reaction.
  • Vendors’ patching tends to prioritise the latest versions of products, which can create dependency challenges. Bringing software up to date can result in a growing infrastructure footprint (more storage, memory and CPU needed – vendors add capabilities and features to compete and meet customer feature needs, driving continuous growth). That can really add costs, particularly in highly distributed use cases, such as user desktops and IoT devices. Addressing the accumulation of patches means devices no longer have the capability to properly service the new footprint. Consider this: why do people replace smartphones? Sometimes it’s hardware features like a better camera, but often it’s simply not enough storage or not being able to run all the apps, photos, etc.
  • .Digging into some of the details from the Red Team shows that the LLM usage costs to uncover the vulnerabilities run from $50 – $20,000. This could have ramifications for smaller, more specialised software solutions where the cost of regularly rerunning the analysis outstrips potential revenue. As a result, we could suddenly see software product prices climb, or companies simply stop producing products we depend on. This may also see bad actors wanting to more quickly recoup the cost by accelerating the use of new exploits, in other words, more attacks, coming more quickly. Such considerations will create more pressure on the speed of patch cycles.
  • This level of capability suggests that we really do need to ensure people shift from boundary-style security to security at every layer of our solutions. That’s not just simply authentication, but code being defensive, validating data values it gets given and os on.

All of this means we have to change mindsets from just enough, or simply putting a front-line security layer in place, to embedding. As end users, we must start to adopt several behaviours:

  • Security conscious with our own devices – keeping software up to date and patched. I would consider my family to be above average when it comes to tech savvy, but even I am having to go in and run Windows updates on laptops, for example.
  • Start voting with our feet – many of the services we use are largely or entirely software-powered (banks, energy providers), if those providers show signs of not taking security seriously enough, time to go elsewhere before we become victims.

Keeping up

One observation that the Mythos and Project Glasswing reporting is that the advancements are significant step changes, not incremental advancements (for example, Antghropic’s Sonnet 4.6 was only released a couple of months ago, and didn’t score highly for creating exploits – although better at detection). This suggests a couple of things …

  • IT law has always played a game of catch-up, but if the advancements are going to be this large and this frequent, we have to start legislating against hypotheticals and allowing legal precedents to produce fine detail interpretations.
  • We may have to consider big-brother observation of AI use, mitigated by strong transparency rules governing the handling of findings.
  • Is the idea that we need to start looking at incorporating something like Asimov’s 3 Laws of Robotics into LLMs now looking far-fetched?
  • Do we need to start thinking about mitigating the risk of deep exploits by bringing back the possibility that systems must be air-gapped?

Hyperbole?

It would be easy to put this down to hyperbole, or wanting to be a click-baity, but this is gaining a lot of high-profile attention, just consider these examples:

  • What Is Claude Mythos—And Why Anthropic Won’t Let Anyone Use It (Forbes)
  • Anthropic’s new AI tool has implications for us all – whether we can use it or not (The Guardian UK)
  • Mythos autonomously exploited vulnerabilities that survived 27 years of human review. Security teams need a new detection playbook (Venture Beat)
  • Anthropic’s Mythos Will Force a Cybersecurity Reckoning—Just Not the One You Think (Wired)

Share this:

  • Share on Facebook (Opens in new window) Facebook
  • Share on X (Opens in new window) X
  • Share on Reddit (Opens in new window) Reddit
  • Email a link to a friend (Opens in new window) Email
  • Share on WhatsApp (Opens in new window) WhatsApp
  • Print (Opens in new window) Print
  • Share on Tumblr (Opens in new window) Tumblr
  • Share on Mastodon (Opens in new window) Mastodon
  • Share on Pinterest (Opens in new window) Pinterest
  • More
  • Share on Bluesky (Opens in new window) Bluesky
  • Share on LinkedIn (Opens in new window) LinkedIn
Like Loading...

Open Source development – growing AI challenges

10 Friday Apr 2026

Posted by mp3monster in AI, General, Technology

≈ Leave a comment

Tags

AI, artificial-intelligence, development, open-source, Technology

The software industry’s current upheavals due to AI are showing signs of unexpected and unintended victims, one of which is open-source software. Open-source foundations run very deep, from Linux to web and app servers, and even to key cryptography technologies.

While there are commercially funded open source efforts, such as chunks of Kubernetes, depending upon which reports you look at, 10-30% of the effort comes from individuals providing their own personal time for free. But we’re seeing a number of threats growing on this…

  • The number of maintainers is small on some projects. A really good example of this is the Nginx Ingress controller for Kubernetes, which is now no longer being maintained, not because it isn’t needed, but because no one was willing to step up to the plate with their own time or provide salaried engineers. This has triggered something of an outcry (see here, for example).
  • As this article Microsoft execs warn agentic AI is hollowing out the junior developer pipeline shows, AI-assisted development risks harming the flow of development skills. The issue is that if all junior engineers primarily rely on AI to code and test functionality, the hard-earned experience that teaches you what is good, bad, and where the pitfalls are, they will not gain. Meaning, the skills needed to understand and maintain very large codebases won’t be as strong.
  • GitHub has argued (here) that AI in development has made it easier for people to get involved and contribute to open-source initiatives, and I’d agree it makers it easier. The challenge, I think, is less an issue of ease than of mindset. I would argue that it is the motivation to contribute and the satisfaction of having contributed that drives open-source contributions, but this is at risk of being undermined (papers such as On Developers’ Personality in Large-scale Distributed Projects indicate open-source contributors tend towards a personality profile, which maybe more suspectable to issues that can lead them to disengage (Connection Between Burnout and Personality Types in Software Developers). While not concrete, overload OSS contributors, and they’re more likely to disengage from contributing.
  • Adding to the weaker pipeline of skills, it is shown that AI often doesn’t deliver on expectations. Several articles have cited this paper for example Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity, we going to see even more pressure on those who are maintaining software that everyone depends upon.
  • AI slop, as a result of using a poor coding model, or poor prompting, is showing us that, unwittingly (or through deliberate maliciousness we are seeing pull requests that are buggy, or junk being created at ever faster rates. This puts more work on the core developers to just manage PRs, as described in AI is burning out the people who keep open source alive (another such article at CNCF – Sustaining open source in the age of generative AI). Not to mention even worse actions, such as that described in An AI Agent Published a Hit Piece on Me. This sort of thing will affect people’s willingness to be involved, even when their time is being paid for by a company. This concern is such that InfoWorld reported GitHub are considering the ability to restrict PR velocity (see here).
  • Another side effect, with the ‘AI arms war’, increases pressure within organisations to adapt, or accelerate as a result of AI expectations. Those donating personal time are less likely to find time to support open-source initiatives, as their focus will be very much on staying secure in their day jobs.

There is no single or simple solution. But that doesn’t mean there aren’t things we can do to help. Some immediate possibilities include:

  • better messaging about what makes up and propels open-source initiatives beyond commercial contributions. This can help counter the perception that organisations like the CNCF appear to be leaning into large commercial organisations and following open-source business models. But that isn’t the case, and even in the commercial setup, the teams aren’t necessarily that large.
  • I’m not an advocate of the dual licensing model, as it can create uncertainty in user communities and potential adopters of technologies. This uncertainty can drive disruptive changes; we’ve seen this with OpenSearch and ElasticSearch, OpenELA fork of Linux, among others. It can also hamper early-stage startups. But we can do something: a low-cost entry into CNCF that can help finance the not-for-profit development setups. Use the PR process to help collect metrics and recognise organisations that contribute even a little through PRs, biasing that recognition toward projects with limited support. Not to mention recognising contributors and committers individually (just as CNCF and Linux Foundation provide recognition to conference speakers).
  • Companies employing early years engineers should implement initiatives that require some development work to be performed without AI assistance and use performance tooling. Yes, this means a short-term drop in productivity, but one thing my years in the industry and training have taught me is that understanding how things work under the hood makes it easier to address problems and recognise ‘bad smells’. Understanding this, helps understand how solutions can scale.
  • Perhaps University courses could consider awarding credits to students who support important open-source projects, or allow a level of contribution to count toward coursework. This sort of thing would also open up the open-source world a lot more. I, for one, would give credit to a graduate who has contributed to a reputable open-source initiative.

There is on thing I am certain of, though, it is the leadership and sponsors of organisations such as CNCF, Linux Foundation, Apache, Open Source Initiative that can influence the situation the most, and it is in everyone’s interest that when open-source components have to be folded that there is atleast an easier off-ramp, than the 6 months given to switch from using something like NGINX Ingress Controller.

Share this:

  • Share on Facebook (Opens in new window) Facebook
  • Share on X (Opens in new window) X
  • Share on Reddit (Opens in new window) Reddit
  • Email a link to a friend (Opens in new window) Email
  • Share on WhatsApp (Opens in new window) WhatsApp
  • Print (Opens in new window) Print
  • Share on Tumblr (Opens in new window) Tumblr
  • Share on Mastodon (Opens in new window) Mastodon
  • Share on Pinterest (Opens in new window) Pinterest
  • More
  • Share on Bluesky (Opens in new window) Bluesky
  • Share on LinkedIn (Opens in new window) LinkedIn
Like Loading...

Agentic AI, SaaS and APIs

31 Tuesday Mar 2026

Posted by mp3monster in AI, General, Technology

≈ Leave a comment

Tags

Agentic AI, AI, APIs, artificial-intelligence, chatgpt, LLM, Oracle, SaaS, Technology

There’s a growing narrative that Agentic AI and “vibe coding” (AI-assisted development is probably a better term) signal the end of SaaS, what some are calling ‘SaaS-pocalyse‘, as reflected by share price drops with some SaaS vendors.

The reality is more nuanced. SaaS vendors are being pulled in multiple directions:

  • Pressure to invest heavily in AI to accelerate productivity and efficiency
  • Fear of disruption from AI-native startups
  • Uncertainty over whether AI is a bubble
  • Broader economic caution from customers, given the wider economic disruption

Net result: share prices have been dropping rapidly. But importantly, this doesn’t necessarily reflect a collapse in demand—particularly among larger vendors. As Jakob Nielsen has suggested, what we’re more likely to see is commodification (see here) not collapse.

Jakob also pointed out AI is really disrupting approaches to UX, both in how users might approach apps and how user experience is designed.

So what happens to SaaS?

There are a few things emerging I believe …

  • Vendors incorporating AI into products as they drive to provide more clear value than vibe coding/home brewing your own solution. A route that Oracle have been taking with the Fusion SaaS products.
  • Emphasis on mechanisms to make it easier for customers to add their differentiators to the core product.
  • Some vendors are likely to retrench into pure data-platform thinking. But a lot of businesses don’t buy platforms (a platform buy is an act of faith that it can enable you to address a problem); many want to buy a solution to a problem, not a platform, and another 6 months of not knowing if there will be a fix.

So what does this mean for APIs?

Well, APIs are becoming ever more important, but in one of several ways:

Classic API value

Having good APIs with all the support resources will make it easier to bolt on customer differentiators, as a good API (not just well coded) from design to documentation, SDKs, etc., will mean that it will be easier for AI to vibe code, or to use it agentically through MCP, etc.

You’ll need the APIs even more, since they are the means by which you protect data, IP, and/or your data moat, as some have described it.

The other approach, if people retrench SaaS to a more Platform approach, is the risk of just exposing the underlying database. If you’ve worked with an organisation that has an old-school ERP (for example, E-Business Suite) where you’re allowed legitimate access to the schema, you will probably have seen one or more of the following problems:

  • Unable to upgrade because the upgrade changes the underlying schema, which might break an extension
  • There are so many extensions that trying to prove that nothing will be harmed by an upgrade is a monumental job of testing – not only on a functional level, but also performance etc. what we have also seen as once people are on this slippery slope, the fear to stop and change tack is too much, often too politically challenging, to hard to make the ROI case.
  • Feature velocity on the solution slows down because the vendor has to be very careful to ensure changes are unlikely to break a deployment. Completely undermining the SaaS value proposition.

Bottom line, these issues all revolve around the fact that, because someone is using an application schema directly, there is an impediment to change (a few examples are here). As an aside, vendors like Oracle have long provided guidance on tailoring products such as CEMLIs.

There is an argument that some may make here, that making your extensions agentic will solve that, but there are flaws to that argument we’ll come back to.

APIs to ensure data replication

The alternative approach is to provide data replication, batches if you’re old school or streaming for those who want almost immediate data to match data states. In doing so, the SaaS solution now has the freedom (within certain limits) to change its data model. We just have to ensure we can continue to meet the replication contract. This is what Fusion Data Intelligence does, and internally, there are documents that Oracle Fusion applications must adhere to. While this documentation is not a conventional API, it has all the relevant characteristics.

Using APIs for data replication doesn’t always register with people. Which is probably why, despite the popularity of technologies like Kafka, Asynchronous APIs don’t have the impact of the Open API Spec. But the transition of data from one structure to a structure that clients can access and depend upon, not to change, is still a contract.

In the world of Oracle, we would do this using a tool such as GoldenGate (Debezium is an example of an open-source product). Not only are we sharing the data, but we’re also not exposing data that might represent or illustrate how unique IP is achieved, or that is very volatile as a result of ongoing feature development.

There be dragons

Let’s step back for a moment and look at the big picture that is driving things. We want the use of AI and LLMs as they give us speed because we’re able to do things with a greater level of inherent flexibility and speed. That speed essentially comes from entrusting the LLM with the execution details, which means accepting non-determinism as the LLM may not apply the same sequence of steps every time the request is made. At the same time, any system (and particularly software) is only of help if it yields predictable in outcomes. We expect (and have been conditioned) to see consistency, if I give this input, I get this outcome – black box determinism if you like.

So, how can we achieve that deterministic black box? Let’s take a simplistic view of a real-world scenario. A hospital is our system, our deterministic behaviour expectations is sick and hurt people go in, and the system outputs healed and well people. Do we want to know how things work inside the black box? Beyond knowing the process is affordable, painless, caring and quick, then not really.

So how does a hospital do this? We invest heavily in training the tools (medical staff, etc.). We equip them with clearly understood, purposeful services (a theatre, patient monitors, and data on medications with clearly defined characteristics). The better the hospital understands how to use the services and data, the better the output. We can change how a hospital works, through its processes, training and equipment. Executed poorly, and we’ll see an uptick in problems

There is no escaping the fact that providing any API requires thought. Letting your code dictate the API can leave you boxed into a corner with a solution that can’t evolve, and even small changes to the API specification can break your API contract and harm people’s ability to consume it.

It is true that an LLM prompt can be tolerant of certain changes. But, it cuts both ways, poor API changes (e.g attributes and descriptions mismatching, attribute names are too obscure to extract meaning) can result in the LLM failing to interpret the intent from the provider side, or worse the LLM has been producing the expected results, but for unexpected reasons, as a result of small changes this may cause the LLM to start getting it wrong.

This leads to the question of what this means for application APIs? It’s an interesting question, and it’s easy to jump to the assumption that APIs aren’t needed. But, in that direction lie dragons, as the expression goes.

If we approach things from an API first strategy, the API and its definition are less susceptible to change, whether the API definition is implemented using an agent, vibe coded or traditionally developed, the contract will give us some of that determinism.

APIs further benefits

With the challenges and uncertainties mentioned in the world of SaaS, having good APIs can offer additional value, aside from the typical integration value, a good API Gateway setup, and if customers are vibe coding their own UIs from your APIs you’ll be able analyse patterns of usage which will still give some clues as to customer use cases, and which parts of the product are most valuable, just as good UI embedded analytics and trace data can reveal.

Final thought

If there is an existential threat to SaaS, it won’t be solved by abandoning structure. It will be addressed by:

  • making data accessible
  • enabling extension
  • and doubling down on well-designed APIs

In an agentic world, APIs aren’t obsolete. They’re the thing that stops everything from falling apart.

Share this:

  • Share on Facebook (Opens in new window) Facebook
  • Share on X (Opens in new window) X
  • Share on Reddit (Opens in new window) Reddit
  • Email a link to a friend (Opens in new window) Email
  • Share on WhatsApp (Opens in new window) WhatsApp
  • Print (Opens in new window) Print
  • Share on Tumblr (Opens in new window) Tumblr
  • Share on Mastodon (Opens in new window) Mastodon
  • Share on Pinterest (Opens in new window) Pinterest
  • More
  • Share on Bluesky (Opens in new window) Bluesky
  • Share on LinkedIn (Opens in new window) LinkedIn
Like Loading...

OpAMP with Fluent Bit – Observability and ChatOps

23 Monday Mar 2026

Posted by mp3monster in Fluent Observability, Fluentbit, General, OpAMP

≈ 1 Comment

Tags

AI, artificial-intelligence, Cloud, Fluentbit, Fluentd, LLM, observability, OpAMP, Technology

With KubeCon Europe happening this week, it felt like a good moment to break cover on this pet project.

If you are working with Fluent Bit at any scale, one question keeps coming up: how do we consistently control and observe all those edge agents, especially outside a Kubernetes-only world?

This is exactly the problem the OpAMP specification is trying to solve. At its core, OpAMP defines a standard contract between a central server and distributed agents/supervisors, so status, health, commands, and config-related interactions follow one protocol instead of ad-hoc integration per tool.

That is where this project sits. We’re implementing the OpAMP specification to support Fluent Bit (and later Fluentd).

In this implementation, we have:

  • a provider (the OpAMP server), and
  • a consumer acting as a supervisor to manage Fluent Bit deployments.

Right now, we are focused on Fluent Bit first. That is deliberate: it keeps scope practical while we validate the framework. The same framework is being shaped so it can evolve to support Fluentd as well.

The repository for the implementation can be found at https://github.com/mp3monster/fluent-opamp

Quick summary

The provider/server is the control plane endpoint. It tracks clients, accepts status, queues commands, and returns instructions using OpAMP payloads over HTTP or WebSocket.

The consumer/supervisor handles the local execution and reporting. It launches Fluent Bit, polls local health/status endpoints, sends heartbeat and metadata to the provider, and handles inbound commands (including custom ones). The server and supervisor can be deployed independently, which is important for real-world rollout patterns.

Because they follow the OpAMP protocol model, clients and servers can be interchanged with other OpAMP-compliant implementations (although we’ve not yet tested this aspect of the development).

Together, they give us a manageable, spec-aligned path to coordinating distributed Fluent Bit nodes without hard-coding one-off control logic into every environment.

Deployment options and scripts

There are a few practical ways to get started quickly:

  • Deploy just the server/provider using scripts/run_opamp_server.sh (or scripts/run_opamp_server.cmd on Windows).
  • Deploy just the client/supervisor using scripts/run_supervisor.sh (or scripts/run_supervisor.cmd on Windows).
  • Run both components either together in a single environment or independently across different hosts.

The scripts will set up a virtual environment and retrieve the necessary dependencies.

If you want an initial MCP client setup as part of your workflow, there are helper scripts for that too:

  • mcp/configure-codex-fastmcp.sh and mcp/configure-codex-fastmcp.ps1
  • mcp/configure-claude-desktop-fastmcp.sh and mcp/configure-claude-desktop-fastmcp.ps1

Server screenshots

Here is a first server view we can include in the post:

The Server Console with a single Agent
The Server Console with a single Agent
Basic agent summary view

The UI is still evolving, but this gives a concrete picture of the provider side control plane we are discussing.

What the OpAMP server (provider) does

The provider is responsible for the shared view of fleet state and intent.

Today it provides:

  • OpAMP transport endpoints (/v1/opamp) over HTTP and WebSocket.
  • API and UI endpoints to inspect clients and queue actions.
  • In-memory command queueing per client.
  • Emission of standard command payloads (for example, restart).
  • Emission of custom message payloads for custom capabilities.
  • Discovery and publication of custom capabilities supported by the server side command framework.

Operationally, this means we can queue intent once at the server and let the next client poll/connection cycle deliver that action in protocol-native form.

What the supervisor (consumer) does for Fluent Bit

The supervisor is the practical glue between OpAMP and Fluent Bit:

  • Starts Fluent Bit as a local child process.
  • Parses Fluent Bit config details needed for status polling.
  • Polls Fluent Bit local endpoints on a heartbeat loop.
  • Builds and sends AgentToServer messages (identity, capabilities, health/status context).
  • Receives ServerToAgent responses and dispatches commands.
  • Handles custom capabilities and custom messages through a handler registry.

So for Fluent Bit specifically, the supervisor gives us a way to participate in OpAMP now, even before native in-agent OpAMP support is universal.

And to be explicit: this is the current target. Fluentd support is a planned evolution of this same model, not a separate rewrite.

Where ChatOps fits

ChatOps is where this gets interesting for day-2 operations.

In this implementation, ChatOps commands are carried as OpAMP custom messages (custom capability org.mp3monster.opamp_provider.chatopcommand). The provider queues the custom command, and the supervisor’s ChatOps handler executes it by calling a local HTTP endpoint on the configured chat_ops_port.

That gives us a cleaner control path:

  • Chat/user intent can go to the central server/API.
  • The server routes to the right node through OpAMP.
  • The supervisor performs the local action and can return failure context when local execution fails.

This is a stronger pattern than directly letting chat tooling call every node individually, and it opens the door to better auditability and policy controls around who can trigger what.

Reality check: we are still testing

This is important: we are still actively testing functionality.

Current status is intentionally mixed:

  • Core identity, sequencing, capabilities, disconnect handling, and heartbeat/status pathways are in place.
  • Some protocol fields are partial, todo, or long-term backlog.
  • Custom capabilities/message pathways are implemented as a practical extension point and are still being hardened with test coverage and real-world runs.

So treat this as a working framework with proven pieces, not a finished all-capabilities implementation.

What is coming next (based on docs/features.md)

Near-term priorities include:

  • stricter header/channel validation,
  • heartbeat validation hardening,
  • payload validation against declared capabilities,
  • server-side duplicate websocket connection control behaviour.

Broader roadmap themes include:

  • authentication/security model for APIs and UI,
  • persistence in the provider,
  • richer UI controls for node/global polling and multi-node config push,
  • certificate and signing workflows,
  • packaging improvements.

And yes, a key strategic direction is evolving the framework abstraction so it can support Fluentd in due course, not only Fluent Bit. Some feature areas (like package/status richness) make even more sense in that broader collector ecosystem.

Why this matters

OpAMP gives us a standard envelope for control-plane interactions; the server/supervisor split gives us pragmatic deployment flexibility; and ChatOps provides a human-friendly control surface.

Put together, this becomes a useful pattern for managing telemetry agents in real environments where fleets are mixed, rollout velocity matters, and “just redeploy everything” is not always an option.

If you are evaluating this right now, the right mindset is: useful today, promising for tomorrow, and still under active verification as we close feature gaps.

Share this:

  • Share on Facebook (Opens in new window) Facebook
  • Share on X (Opens in new window) X
  • Share on Reddit (Opens in new window) Reddit
  • Email a link to a friend (Opens in new window) Email
  • Share on WhatsApp (Opens in new window) WhatsApp
  • Print (Opens in new window) Print
  • Share on Tumblr (Opens in new window) Tumblr
  • Share on Mastodon (Opens in new window) Mastodon
  • Share on Pinterest (Opens in new window) Pinterest
  • More
  • Share on Bluesky (Opens in new window) Bluesky
  • Share on LinkedIn (Opens in new window) LinkedIn
Like Loading...

Returning to Chat Ops

03 Tuesday Mar 2026

Posted by mp3monster in chatbots, Fluent Observability, Fluentbit, Fluentd, General, OpAMP, Technology

≈ Leave a comment

Tags

AI, chatops, development, Fluent Bit, IncidentFox, innovation, LLM, OpAMP, OTel, runbooks, SRE

A couple of years ago, we wrote about the idea of Chat Ops, why the idea is valuable and interesting (see Fluent Bit – Powering Chat Ops, Fluent Bit with Chat Ops, for example). The essence of the idea was:

  • Using a collaboration or chat platform like Slack could ease and even accelerate the response to operational issues (as systems process more data and faster).
  • Conversational collaborative platforms are pervasive, usable across many devices, while still being able to enforce security. Taking away the need to log in to laptops, signing in to portals before you can even start considering what the problem is.
  • The quicker we can understand and resolve things, the better, and the less potential damage that may need to be addressed. Using collaboration platforms does this through collaborative work, and the fact that we can see content quickly and easily because of the push model of tools like Slack.
  • Any Observability agent tool that can detect issues as data is moved to a backend aggregation and analytics tool gives you a head start. This is something Fluent Bit can do easily.

We illustrated it this way:

The ChatOps deployment looked like:

ChatOps deployed using Slack to power collaboration, with Fluent Bit for ops execution

How do we advance this?

The demo was very light-touch to illustrate an idea, but we’ve since heard people pursuing the concept. So the question becomes: what has happened in the observability space that could make it easier to industrialize (e.g., scale, resilience, security, manageability)? One of the key advances being driven by the OpenTelemetry working groups within the CNCF is OpAMP. OpAMP provides a protocol for standardizing the management and control of agents and collectors such as Fluent Bit. I won’t go into the details of OpAMP here, as we covered that in a separate blog (here). But what it means is that we can either tap into the OpAMP protocol more directly as it provides a means to deliver custom commands to agents, or better still simplify by extending the management service so that we can talk to that, and it is responsible to sending on the command to the relevant agents and talk back to us if its knowledge about agent capabilities tells us its not possible.

This improves security (our bot only needs to talk to a single point in our infrastructure). Our agents, like Fluent Bit, can have a trust relationship with a central point. Effectively, we have introduced a better multifaceted trust layer into the answer. Not only that, as the OpAMP server has visibility into more of the agent deployment, we can also leverage its information and ask it to deploy our custom Fluent Bit configurations that help us observe and execute remediation processes.

Let’s look deeper

While industrialization is good, it occurs to me that there is more to collaboration and conversational (or chat) interfaces than we first envisaged.

Collaboration, to a large extent, is not about bringing multiple ideas to the table and choosing the best one; it is actually about knowledge mining. Our ideas and insights are built from individual knowledge banks, or as we might put it, experience. Conversational interfaces also allow us to see what has gone before (in effect, harvesting information to improve our knowledge). So perhaps we should be asking, within the context of a platform that allows us to easily access, share, and interpret knowledge for a specific context (i.e., our current problem), what can we do?

Knowledge is unstructured or semistructured information, and information is a semistructured composition of context and data. So how do we find and leverage information if not knowledge, particularly at 2am when we’re the only person on call? The answer is to facilitate enhanced information retrieval, which could be more Slack bots that we can use to provide structured commands to retrieve relevant data from our metrics tooling, traces, etc. Organizations that follow more ITIL-guided processes will most likely have runbooks, a knowledge base with error-code information, and previous incident logs documenting resolutions. All of which can bring together a wealth of information. Doing this in a collaborative conversational tool like Slack and MS Teams will save time and effort, as you will not have to sign in to different portals to track down the details.

But we can go better, with the rise of LLMs (large language models, Gen AI if you prefer), combined with techniques such as RAG (Retrieval Augmented Generation), MCP (Model Context Protocol), and further accelerate things as we no longer have to make our Slack bot requests use what structured commands. We use natural language, we can easily paste parts of the information we’re given as part of the request – all of which means that while a single LLM may take longer to execute a single request, it is more likely to surface the details faster because we’re not having to worry about getting the notation for the bot request correct, we’re not going to have to repeat requests to narrow data immediately. With agentic techniques, you can have the LLM pull data from multiple systems in a single go.

All of which is very achievable; many vendors (and open-source teams) have been seeking a competitive edge and exposing their APIs through MCP tools. There is the possibility that some of this is ‘AI washing’, but the frameworks to support MCP development are well progressed, so you can always refine or create your own MCP server. Here are just a few examples:

  • Prometheus
  • Loki
  • Grafana
  • OpenTelemetry extension OpenLLMetry
  • OTEL Collector
  • OpenSearch

In addition to these, which align with widely used open-source solutions, there are MCP servers for vendor-specific offerings, such as Honeycomb and Chronosphere.

The use of MCP raises an interesting possibility – as it becomes very possible to have a universal MCP client, which can be easily interfaced into any conversational tool – after all, we just need to send the text that is addressed to the client, i.e., the adaptor to Slack, MS Teams, etc. is pretty simple. This means we can start to consider things like the same tooling, supporting tools like Claude Desktop, and its mobile equivalent.

If we bring AI into the equation, consider ML and small models as well, we can start looking at exploiting pattern recognition in the occurrences of issues. This would mean the value diagram becomes more like:

We should also keep in mind the development of the new OpAMP spec from the CNCF OpenTelemetry project that provides a mechanism to centrally track and interact with all our agents, such as Fluent Bit (not to mention Fluentd, OTel Collectors, and others).

Bringing all of this together isn’t a small task, and what of these ideas already exist? Are there building blocks we can lean on to show that the ideas expressed here are achievable? This brought us to IncidentFox.ai (GitHub link – IncidentFox). IncidentFox already exploits MCP servers from several products, such as Grafana, and supports a knowledge base geared towards intelligent searching that can also tap into common sources of operational knowledge, such as Confluence.

But IncidentFox has taken the use of LLMs further with its agentic approach, recommending actions, and once it has access to source code, LLMs can be used to start looking for root causes. To do this, IncidentFox has harnessed the improvements in Gen AI reasoning and the selection and use of the MCP-enabled tools we mentioned, which enable the extraction of data and information from mainstream observability tools.

IncidentFox isn’t the only organization heading in this direction (Resolve.ai, Randoli, and Kloudfuse are building AI SRE tools, but with a fully proprietary business model), but it is supporting an open-source core. All of these solutions come under the banner of AI SRE. Interestingly, a new LLM benchmark has emerged called SRESkillsBench, in addition to the models such as Bird and Spider, which specifically evaluate LLMs against SRE tasks. This kind of information certainly is going to create some interesting debates – do you use the best LLM for specific tasks, or the LLM the vendor has worked with and optimized their prompting to get the best from it?

A lot of these tools focus on detection and diagnostics, and steer you to the relevant runbooks, there seems to be less said about how runbooks can be translated into remediation actions that can be executed. The ability to translate runbooks into executable steps will require either tooling via MCP or custom models with suitable embedded functions. Whichever approach is adopted, having a well-understood control plane, such as Kubernetes, will make it easier. But not everything is native K8s. Lots of organizations still use just virtualization, or even bare metal. We also see this with cloud vendors that offer bare-metal services due to the need for maximum compute horsepower. To address these scenarios from a centralized control, either needs to allow remote access (ssh tunnels) to all the nodes, or a means to work with a distributed tooling mechanism, which is what we had with our original chatops idea and the possibilities offered through OpAMP.

Of course, introducing LLMs opens up another set of challenges in the observability space, which OpenLLMetry (driven by TraceLoop) and LangFuse are working to address. But that’s for another blog.

What does this mean to our ChatOps proof of concept?

Sticking with an open-source/standards-based approach means there is a clear direction of enabling AI Agents to be part of the OTLP (Open Telemetry Protocol) pipeline and leverage the MCP tooling of backend observability platforms. IncidentFox doesn’t yet provide an inline OTLP capability (although some of the proprietary options have gone this way), but that is less of a concern, as Fluent Bit provides a lot of capability to help identify issues (such as timeseries/frequency measurement, error identification, etc.), which can be combined with IncidentFox’ means to instruct it to initiate an incident. We could do this with direct API calls, but what would be more interesting and flexible is that, when our Fluent Bit process notifies the Ops team in Slack of an issue, it also ensures that the IncidentFox Slack agent picks up the message, allowing it to start its own analysis. In effect, a collaboration platform can be as impactful as an integration platform.

Furthermore, now that we have early warning of an issue via Fluent Bit, we can interact with IncidentFox through its tooling to start interrogating the diverse range of tools well before the new problem’s details have been fully ingested into the backend tooling and been detected as an issue. So we can now get a jump start by prompting IncidentFox, which can mine for information and recommendations. This is where, at 2am, we can still work collaboratively, albeit with a collaborator that is not another person but an LLM.

As I’ve mentioned, the challenge is making the runbook executable in environments that don’t provide a strong control plane, such as K8s. Here, we can further develop our ChatOps concept. By providing a set of MCP tools for Fluent Bit (IncidentFox supports providing your own tools), the runbook can describe remediation steps, and the use of the LLM ensures the Fluent Bit tool is invoked appropriately.

This would still mean that the central point of Incident Fox would need to talk with each Fluent Bit deployment, better than just opening up SSH directly. But we could be smarter. OpAMP provides a central coordination point, and through its Supervisor or embedded logic in an observability client, the protocol’s support for custom commands could be exploited. So we expose the way we interact with Fluent Bit via the custom command mechanism, and we have a robust, controlled mechanism. Furthermore, the concept could be extended to exploit the aggregated knowledge of agents and their configuration (or pushing out new configurations to solve operational problems as OpAMP has been designed to enable), we can automate the determination of whether the remediation action needs to be executed across operational nodes.

Let’s walk through a hypothetical (but plausible) scenario. We have manually deployed systems across a load-balanced set of servers in a pre-production environment used for load testing, each with TLS certificates. As it is a pre-production environment, we use self-signed certificates. We are starting to see errors because the certificates have expired, and someone forgot to recycle them. Fluent Bit has been forwarding the various error log messages to Loki, but the high frequency of the errors causes it to send details to Slack for the Ops team. This, in turn, nudges IncidentFox to act, determines the root issue, and finds remediation in a runbook that points to running a script on the host to provide a fresh self-signed certificate. As we have a Fluent Bit ops pipeline that can trigger that script, and it is deemed low risk, we allow IncidentFox to execute the runbook unsupervised. To execute the run book, it uses the OpAMP server to send the custom command to Fluent Bit. But the runbook says other nodes should be checked. As a result, IncidentFox works with the OpAMP server via its MCP tooling to identify which other Fluent Bit deployments are monitoring other nodes and to direct the custom command to those nodes as well.

To achieve this, we need an architecture along the following lines, in addition to IncidentFox and the associated MCP tools, we would need to have the OpAMP server, either the OpAMP client embeddable into Fluent Bit or an OpAMP supervisor that understands how to pass custom commands, something it perhaps could do by using another MCP interface to Fluent Bit directly.

Conclusion

As you can see, there has been significant advancement in how LLMs and MCP tooling can be used to enable operational support activities. To turn this into a reality, the core Fluent Bit committers need to ideally press on with supporting OpAMP. A standardized MCP tooling arrangement for an open-source OpAMP server will also make a significant advancement. Although we could get things working directly with MCP using Fluent Bit.

IncidentFox being able to function as an OpAMP server itself would really enable it shift the narrative, particularly if it provided a means to plug into a supervisor to handle custom actions.

Root cause analysis – understanding deployment changes

Looking at all of this, one dimension of operational issues is the ability to correlate an issue with environmental changes. That means understanding not only software versions, but also the live configuration. Again, not too difficult when your control plane is K8s. But trickier outside of that ecosystem. You need to identify when the change was defined and when it was applied. Is it time for some functionality that understands when details such as file stamps for configurations or certain binary files change? Perhaps a proxy service that can spot API calls to configuration endpoints and file system changes to configuration files? Then have that information logged centrally so that problems can be correlated to a system configuration or deployment problem.

Updates

Since writing this, we came across Mezmo, which, like Resolve.ai, is developing a proprietary solution but is looking to make an open-source offering. As part of that journey, they have sponsored an O’Reilly Report on Context Engineering for Observability.

Share this:

  • Share on Facebook (Opens in new window) Facebook
  • Share on X (Opens in new window) X
  • Share on Reddit (Opens in new window) Reddit
  • Email a link to a friend (Opens in new window) Email
  • Share on WhatsApp (Opens in new window) WhatsApp
  • Print (Opens in new window) Print
  • Share on Tumblr (Opens in new window) Tumblr
  • Share on Mastodon (Opens in new window) Mastodon
  • Share on Pinterest (Opens in new window) Pinterest
  • More
  • Share on Bluesky (Opens in new window) Bluesky
  • Share on LinkedIn (Opens in new window) LinkedIn
Like Loading...

Fluent Bit and Otel Collectors at scale

26 Thursday Feb 2026

Posted by mp3monster in Fluentbit, General, Technology

≈ 1 Comment

Tags

AI, artificial-intelligence, Cloud, LLM, OpAMP, Open Telemetry, OTel, Protobuf, Spec, Technology

Fluent Bit and OpenTelemetry’s Collector (as well as many other observability tools) are designed to use a distributed/agent model for deployment. This model can pose challenges, including ensuring that all agents are operating healthily and correctly configured. This is particularly true outside of a Kubernetes ecosystem. But even within a Kubernetes ecosystem, more than basic insights are required (for example, is it running flat out or over-resourced). Fluent Bit exposes its own metrics and logs so that you can either configure Fluent Bit to forward the metrics and logs to an endpoint (or allow Prometheus to scrape the metrics).

As we’re usually using Fluent Bit to collect data and route it to tools like Prometheus, Grafana, etc., or perhaps a more commercial product. So it makes sense that Fluent Bit is also sharing its own status and health.

When it comes managing the configuration of Fluent Bit, we have lots of options for Kubernetes deployments (from forcing pod replacement, to sharing configurations via persistent volume claims, and Fluent Bit reloading configs), but given that Observability needs to operate on simple virtualized and bare metal scenarios, and not everything can be treated as dynamically replaceable more general strategies such as GITOps and potentially using Istio (yes, there really good use cases for Istio outside of Kubernetes) are available. But there is also more advanced tooling, such as Puppet, Chef, and Ansible.

The challenge is non of these tools provides out-of-the-box capabilities to fully exploit the control surface that OTel Collectors and Fluent Bit offer. So the OpenTelemetry community has elected to develop a new standard called OpAMP, which fits snuggly into the OTel ecosystem.

OpAMP defines an agent/client and server model in which the central server provides control, measurement, etc., to all Collectors. The agent/client side can be deployed in two ways: wired directly into a collector or via a separate supervisor tool. Integrating the client-side directly into the client is great, as it avoids introducing a new local process. The heart of OpAMP is the message exchange, which we’ll look at more in a moment.

Today, Fluent Bit would need to use the Supervisor model without modification (although GitHub shows a feature request to support the protocol, we haven’t heard whether or when it will be implemented). But it is early days, and some aspects of the protocol are still classified as ‘development’. That said, it is worth looking more closely at OpAMP as it offers some interesting opportunities, particularly around how we could easily evolve ideas such as chatOps.

While I’m not a fan of having a peer process for Fluent Bit, on the basis that we now have two distinct processes to support observability, we could experiment with the supervisor spawning Fluent Bit as a child process, which would let the supervisor easily spot Fluent Bit failing. At the sametime the supervisor can communicate with Fluent Bit using the localhost loopback adapter and the usual APIs.

Understanding the OpAMP Protocol and what it offers

Lets start with what the OpAMP protocol offers, firstly very little of it is mandatory (this is both good and bad – it means we can build compatibility in a more incremental manner, if certain behaviours are provided elsewhere, then the agent doesn’t have to offer a capability it is also tolerant of what an agent or collector can and can’t do):

  • Heatbeat and announce the agent’s existence to the server, along with what the agent is capable of/allowed to do regarding OpAMP features.
  • Status information, including:
    • environmental information
    • configuration being run
    • modules being used
  • Perform updates to resources, including:
    • Agent configuration
    • TLS certificate rotation
    • Credentials management
    • module or even an entire agent installation
  • issuing of custom commands.
  • directing the agent’s own telemetry to specific services/endpoints.

The heart of this protocol is the contract between the client (agent/collector) and the server, defined using Protobuf3. This means you can easily create the code skeleton to handle the payload objects, which will be transmitted in binary form (giving network traffic efficiency and the price of not being humanly readable or dynamically processable, insofar as you need to know the Protobuf definition to extract any meaning).

In addition to the Protobuf definitions, there are rules for handling messages, specifying when a message or response is required, message sequence numbering, and the default heartbeat frequency. But there aren’t any complex exchanges involved.

The binary payloads are exchanged over Sockets (allowing full-duplex exchanges where the server can send requests at any time) or over HTTP (providing half-duplex, aka polling/client check-in, at which point the server can respond with an instruction) – a strategy that is becoming increasingly common today.

The benefits we see

Regardless of whether you’re in a Kubernetes environment, the ability to ask agents/collectors to quickly tweak their configuration is attractive. If you start suspecting a service or application is not behaving as expected, if Fluent Bit is filtering your logs or sampling traces, you can quickly push a config change out to allow more through, providing further insight into what is going on – you don’t need to wait for a Kubernetes scheduler to roll through with replacing pods.

With the rise of AI, having a central point of contact makes it easier to wrap a central server as an MCP tool and have a natural language command interface. Along with the possibility that you can potentially send custom commands to the agent (or supervisor) to initiate a task. This was part of the functionality we effectively implemented in our original chatops showcase. The problem was that in the original solution, we deployed a small Slack bot that directed HTTP calls to Fluent Bit. With the OpAMP framework, we can direct the request to the server, which will route the command to the correct Fluent Bit node through a more trustworthy channel.

Implementation

The OpAMP protocol is likely to be widely adopted by commercial service providers (Bindplane, OneUpTime, for example), as it allows them to start working with additional agents/collectors that are already deployed (for example, a supervisor could be used to manage a Fluentd fleet, where there isnt the appetite to refactor all the configuration to Fluent Bit or an OTel Collector). Furthermore, it has the potential to simplify by standardizing functionality.

In terms of resource richness in the OpenTelemetry GitHub repo, I suspect (and hope) there will be more to come (a UI and richer details on how a base server can be extended, for example). At the time of writing, within the OpenTelemetry GitHub repository, we can see:

  • The published spec
  • Go implementation of the protocol: the server accepts and responds to messages, and the client includes some test functionality to populate messages.
  • A supervisor implementation that, through configuration, can be pointed at an agent to observe, and the configuration so that the supervisor is uniquely identifiable to the server. This is also implemented in Go.
  • Open Telemetry Collector extension

Share this:

  • Share on Facebook (Opens in new window) Facebook
  • Share on X (Opens in new window) X
  • Share on Reddit (Opens in new window) Reddit
  • Email a link to a friend (Opens in new window) Email
  • Share on WhatsApp (Opens in new window) WhatsApp
  • Print (Opens in new window) Print
  • Share on Tumblr (Opens in new window) Tumblr
  • Share on Mastodon (Opens in new window) Mastodon
  • Share on Pinterest (Opens in new window) Pinterest
  • More
  • Share on Bluesky (Opens in new window) Bluesky
  • Share on LinkedIn (Opens in new window) LinkedIn
Like Loading...

MCP Security

30 Thursday Oct 2025

Posted by mp3monster in AI, development, General, Technology

≈ Leave a comment

Tags

AI, artificial-intelligence, attack, attacks, cybersecurity, MCP, model context protocol, Paper, Security, Technology, vectors

MCP (Model Context Protocol) has really taken off as a way to amplify the power of AI, providing tools for utilising data to supplement what a foundation model has already been trained on, and so on.

With the rapid uptake of a standard and technology that has been development/implementation led aspects of governance and security can take time to catch up. While the use of credentials with tools and how they propagate is well covered, there are other attack vectors to consider. On the surface, it may seem superficial until you start looking more closely. A recent paper Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions highlights this well, and I thought (even if for my own benefit) to explain some of the vectors.

I’ve also created a visual representation based on the paper of the vectors described.

The inner ring represents each threat, with its color denoting the likely origin of the threat. The outer ring groups threats into four categories, reflecting where in the lifecycle of an MCP solution the threat could originate.

I won’t go through all the vectors in detail, though I’ve summarized them below (the paper provides much more detail on each vector). But let’s take a look at one or two to highlight the unusual nature of some of the issues, where the threat in some respects is a hybrid of potential attack vectors we’ve seen elsewhere. It will be easy to view some of the vectors as fairly superficial until you start walking through the consequences of the attack, at which point things look a lot more insidious.

Several of the vectors can be characterised as forms of spoofing, such as namespace typosquatting, where a malicious tool is registered on a portal of MCP services, appearing to be a genuine service — for example, banking.com and bankin.com. Part of the problem here is that there are a number of MCP registries/markets, but the governance they have and use to mitigate abuse varies, and as this report points out, those with stronger governance tend to have smaller numbers of services registered. This isn’t a new problem; we have seen it before with other types of repositories (PyPI, npm, etc.). The difference here is that the attacker could install malicious logic, but also implement identity theft, where a spoofed service mimics the real service’s need for credentials. As the UI is likely to be primarily textual, it is far easier to deceive (compared to, say, a website, where the layout is adrift or we inspect URIs for graphics that might give clues to something being wrong). A similar vector is Tool Name Conflict, where the tool metadata provided makes it difficult for the LLM to distinguish the correct tool from a spoofed one, leading the LLM to trust the spoof rather than the user.

Another vector, which looks a little like search engine gaming (additional text is hidden in web pages to help websites improve their search rankings), is Preference Manipulation Attacks, where the tool description can include additional details to prompt the LLM to select one solution over another.

The last aspect of MCP attacks I wanted to touch upon is that, as an MCP tool can provide prompts or LLM workflows, it is possible for the tool to co-opt other utilities or tools to action the malicious operations. For example, an MCP-provided prompt or tool could ask the LLM to use an approved FTP tool to transfer a file, such as a secure token, to a legitimate service, such as Microsoft OneDrive, but rather than an approved account, it is using a different one for that task. While the MCP spec says that such external connectivity actions should have the tool request approval, if we see a request coming from something we trust, it is very typical for people to just say okay without looking too closely.

Even with these few illustrations, tooling interaction with an LLM comes with deceptive risks, partially because we are asking the LLM to work on our behalf, but we have not yet trained LLMs to reason about whether an action’s intent is in the user’s best interests. Furthermore, we need to educate users on the risks and telltale signs of malicious use.

Attack Vector Summary

The following list provides a brief summary of the attack vectors. The original paper examines each in greater depth, illustrating many of the vectors and describing possible mitigation strategies. While many technical things can be done. One of the most valuable things is to help potential users understand the risks, use that to guide which MCP solutions are used, and watch for signs that things aren’t as they should be.

Continue reading →

Share this:

  • Share on Facebook (Opens in new window) Facebook
  • Share on X (Opens in new window) X
  • Share on Reddit (Opens in new window) Reddit
  • Email a link to a friend (Opens in new window) Email
  • Share on WhatsApp (Opens in new window) WhatsApp
  • Print (Opens in new window) Print
  • Share on Tumblr (Opens in new window) Tumblr
  • Share on Mastodon (Opens in new window) Mastodon
  • Share on Pinterest (Opens in new window) Pinterest
  • More
  • Share on Bluesky (Opens in new window) Bluesky
  • Share on LinkedIn (Opens in new window) LinkedIn
Like Loading...

AI to Agriculture

17 Friday Oct 2025

Posted by mp3monster in General, Oracle, Technology

≈ Leave a comment

Tags

AI, artificial-intelligence, Cloud, development, Oracle, Technology

Now that details of the product I’ve been involved with for the last 18 months or so are starting to reach the public domain  (such as the recent announcement at the UN General Assembly on September 25), I can talk to a bit about what we’ve been doing.  Oracle’s Digital Government Global Industry Unit has been working on a solution that can help governments address the questions of food security.

So what is food security?  The World Food Programme describes it as:

Food security exists when people have access to enough safe and nutritious food for normal growth and development, and an active and healthy life. By contrast, food insecurity refers to when the aforementioned conditions don’t exist. Chronic food insecurity is when a person is unable to consume enough food over an extended period to maintain a normal, active and healthy life. Acute food insecurity is any type that threatens people’s lives or livelihoods.

World Food Programme

By referencing the World Food Programme, it would be easy to interpret this as a 3rd world problem. But in reality, it applies to just about every nation. We can see this, with the effect the war in Ukraine has had on crops like Wheat, as reported by organizations such as CGIAR, European Council, and World Development journal. But global commodities aren’t the only driver for every nation to consider food security. Other factors such as Food Miles (an issue that perhaps has been less attention over the last few years) and national farming economics (a subject that comes up if you want to it through a humour filter with Clarkson’s Farm to dry UK government reports and US Department of Agriculture.

Looking at it from another perspective, some countries will have a notable segment of their export revenue coming from the production of certain crops.  We know this from simple anecdotes like ‘for all the tea in China’, coffee variants are often referred to by their country of origin (Kenyan, Columbian etc.). For example, Palm Oil is the fourth-largest economic contributor in Malaysia (here). 

So, how is Oracle helping countries?

One of the key means of managing food security is understanding food production and measuring the factors that can impact it (both positively and negatively), which range from the obvious—like weather (and its relationship to soil, water management, etc.) —to what crop is being planted and when. All of which can then be overlayed with government policies for land management and farming subsidies (paying farmers to help them diversify crops, periodically allowing fields to go fallow, or subsidizing the cost of fertilizer).

Oracle is a technology company capable of delivering systems that can operate at scale. Technology and the recent progress in using AI to help solve problems are not new to agriculture; in fact, several trailblazing organizations in this space run on Oracle’s Cloud (OCI), such as Agriscout. Before people start assuming that this is another story of a large cloud provider eating their customers’ lunch, far from it, many of these companies operate at the farm or farm cooperative level, often collecting data through aerial imagery from drones and aircraft, along with ground-based sensors.  Some companies will also leverage satellite imagery for localized areas to complement these other sources. This is where Oracle starts to differentiate itself – by taking high-resolution imagery (think about the resolution level needed to differentiate Wheat and Maize, or spot rice and carrots, differentiate an orchard from a natural copse of trees). To get an idea, look at Google Earth and try to identify which crops are growing.

We take the satellite multi-spectral images from each ‘satellite over flight’ and break it down, working out what the land is being used for (ruling out roads, tracks, buildings, and other land usage).  To put the effort to do this into context, the UK is 24,437,600,000 square meters and is only 78th in the list of countries by area (here).  It’s this level of scale that makes it impractical to use more localized data sources (imagine how many people and the number of drones needed to fly over every possible field in a country, even at a monthly frequency).

This only solves the 1st step of the problem, which is to tell us the total crop growing area.  It doesn’t tell us whether the crop will actually grow well and produce a good yield.  For this, you’re going to need to know about weather (current, forecast, and historic trends), soil chemical composition and structure, and information such as elevation, angle, etc. Combined with an understanding of optimal crop growing needs (water levels, sun light duration, atmospheric moisture, soil types and health) – good crops can be ruined by it simply being too wet to harvest them, or store them dryly.  All these factors need to be taken into account for each ‘cell’ we’re detecting, so we can calculate with any degree of confidence what can be produced.

If this isn’t hard enough, we need to account for the fact that some crops may have several growing seasons per year, or succession planting is used, where Carrots may be grown between March and June, followed by Cucumbers through to August, and so on.

Using technology

Hopefully, you can see there are tens of millions of data points being processed every day, and Oracle’s data products can handle that. As a cloud vendor, we’re able to provide the computing scale and, importantly, elasticity, so we can crunch the numbers quickly enough that users benefit from the revised numbers and can work out mitigation actions to communicate to farmers. As mentioned, this could be planning where to best use fertilizer or publishing advice on when to plant which crops for optimal growing conditions. In the worst cases recognizing there is going to be a national shortage of a staple crop and start purchasing crops from elsewhere and ensure when the crops arrive in ports they get moved out to the markets  (like all large operations – as we saw with the Covid crises – if you need to react quickly, more mistakes can be made, costs grow massively driven by demand).

I mentioned AI, if you have more than the most superficial awareness of AI, you will probably be wondering how we use it, and the problems of AI hallucination – the last thing you want is a being asked to evaluate something and hallucinating (injecting data/facts that are not based on the data you have collected) to create a projection.  At worst, this would mean providing an indication that everything is going well, when things are about to really go wrong.  So, first, most of the AI discussed today is generative, and that is where we see issues like hallucinations.  We’re have and are adopting this aspect of AI where it fits best, such as explainability and informing visualization, but Oracle is making heavy use of the more traditional ideas of AI in the form of Machine Learning and Deep Learning which are best suited to heavy numerical computational uses, that is not to say there aren’t challenges to be ddressed with training the AI.

Conclusion

When it comes to Oracle’s expertise in the specialized domains of agriculture and government, Oracle has a strong record of working with governments and government agencies from its inception. But we’ve also worked closely with the Tony Blair Institute for Global Change, which works with many national government agencies, including the agriculture sector.

My role in this has been as an architect, focused primarily on applying integration techniques (enabling scaling and operational resilience, data ingestion, and how our architecture can work as we work with more and more data sources) and on applying AI (in the generative domain). We’re fortunate to be working alongside two other architects who cover other aspects of the product, such as infrastructure needs and the presentation tier. In addition, there is a specialist data science team with more PhDs and related awards than I can count.

Oracle’s Digital Government business is more than just this agriculture use case; we’ve identified other use cases that can benefit from the data and its volume being handled here. This is in addition to bringing versions of its better-known products, such as ERP, Healthcare (digital health records management, vaccine programmes, etc.), national Energy and Water (metering, infrastructure management, etc).

For more on the agricultural product:

  • Government Data Intelligence for Agriculture
  • Agriculture on the Digital Government page
  • TBI on Food Security
  • iGrow News
  • Oracle Launches AI Platform to Strengthen Government Led Agricultural Resilience – AgroTechSpace

Share this:

  • Share on Facebook (Opens in new window) Facebook
  • Share on X (Opens in new window) X
  • Share on Reddit (Opens in new window) Reddit
  • Email a link to a friend (Opens in new window) Email
  • Share on WhatsApp (Opens in new window) WhatsApp
  • Print (Opens in new window) Print
  • Share on Tumblr (Opens in new window) Tumblr
  • Share on Mastodon (Opens in new window) Mastodon
  • Share on Pinterest (Opens in new window) Pinterest
  • More
  • Share on Bluesky (Opens in new window) Bluesky
  • Share on LinkedIn (Opens in new window) LinkedIn
Like Loading...

Challenges of UX for AI

21 Sunday Sep 2025

Posted by mp3monster in General, Technology

≈ 1 Comment

Tags

AI, UI, UX

AI, specifically Generative AI, has been dominating IT for a couple of years now. If you’re a software vendor with services that interact with users, you’ll probably have been considering how to ensure you’re not left behind, or perhaps even how to use AI to differentiate yourself. The answer to this can be light-touch AI to make the existing application a little easier to use (smarter help documentation, auto formatting, and spelling for large text fields). Then, at the other end of the spectrum, is how do we make AI central to our application? This can be pretty radical. Both ends of the spectrum carry risks – light touch use can be seen as ‘AI whitewashing’ – adding something cosmetic so you can add AI enablement to the product marketing. At the other end of the spectrum, rejecting chunks of traditional menus and form-based UI that allow users in a couple of quick clicks or keystrokes to access or create content can result in increasing the product cost (AI consumes more compute cycles, thereby incurring a cost along the way) for at best a limited gain.

While AI whitewashing is harmful and can impact a brand image, at least the features can be ignored by the user. However, the latter requires a significant investment and can easily lead to the perception that he product isn’t as capable as it could/should be.

At the heart of this are a couple of basic considerations that UX design has identified for a long time:

  • For a user to get the most out of a solution, they need a mental model of the capabilities your product can provide and the data it has. These mental models come from visual hints – those hints come from menus, right-click operations, and other visual clues. UI specialists don’t do eye tracking studies just for the research grant money.
  • UI best practices provide simple guidance stating that there should be at least three ways to use an application, supporting novice users, the average user, and the power user. We can see this in straightforward things, such as multiple locations for everyday tasks (right-click menus, main menu, ribbon with buttons), not to mention keyboard shortcuts. Think I’m over-stating things? I see very knowledgeable, technically adept users still type and then navigate to the menu ribbon to embolden text (rather than simply use the near-universal Ctrl+B). Next time you’re on a Zoom/Teams call, working with someone on a document, just watch how people are using the tools. On the other end of the spectrum, some tools allow us to configure accelerator key combinations to specific tasks, so power users can complete actions very quickly.
  • Users are impatient – the technology industry has prided itself on making things quicker, faster, more responsive (we see this with Moore’s law with computer chips to mobile networks … Edge, 3G … 5G (and 6G in development). So if things drop out of those norms, there is an exponential chance of the user abandoning an action (or worse, trying to make it happen again, multiplying the workload). AI is computationally expensive, so by its nature, it is slower.
  • Switching input devices incurs a time cost when transitioning between devices, such as a keyboard and mouse. Over the years, numerous studies have been conducted on this topic, identifying ways to reduce or eliminate such costs. Therefore, we should minimize such switching. Simple measures, such as being able to table through UI widgets, can help achieve this.
  • User tolerance to latency has been an ongoing challenge – we’re impatient creatures. There are well-researched guidelines on this topic, and if you take a moment to examine some of the techniques available in UI, particularly web UIs, you will see that they reflect this. For example, prefetching content, particularly images, rendering content as it is received, and infinite scrolling.

All of this could be interpreted as being anti-AI, and even as someone wanting to protect jobs by advocating that we continue the old way. Far from it, AI can really help, and I have been a long-standing advocate of the idea that AI could significantly simplify tasks such as report generation in products that rely heavily on structured data capture. Why, well, using structured form capture processes will help with a mental model of the data held, the relationships, and the terminology in the system, enabling us to formulate queries more effectively.

The point is, we should empower users to use different modes to achieve their goals. In the early days of web search, the search engines supported the paradigm of navigating using cataloguing of websites. Only as the understanding of search truly became a social norm did we see those means to search disappear from Yahoo and Google because the mental models of using search engines established themselves. But even now, if you look, those older models of searching/navigating still exist. Look at Amazon, particularly for books, which still offers navigation to find books by classification. This isn’t because Amazon’s site is aging, anything but. It is a recognition that to maximize sales, you need to support as many ways of achieving a goal as are practical.

A sidebar menu displaying categories of historical books, including various time periods and regions.
Navigation categories for historical books, demonstrating various time periods and regions – Amazon.

If there is a call to arms here, it is this – we should complement traditional UX with AI, not try to replace it. When we look at an AI-driven interaction, we use it to enable users to solve problems faster, solve problems that can’t be easily expressed with existing interactions and paradigms. For example, replacing traditional reporting tools that require an understanding of relational databases or reducing/removing the need to understand how data is distributed across systems.

Some of the better uses of AI as part of UX are subtle – for example, the use of Grammarly, Google’s introduction to search looks a lot like an oversized search result. But we can, and should consider the use of AI, not just as a different way to drive change into traditional UX, but to open up other interaction models – enabling their use in new ways, for example rather than watching or reading how to do something, we can use AI to translate to audio, and talk us through a task as we complete it. For example, a mechanical engineering task requires both hands to work with the necessary tools. Burt is also using different interaction models to help overcome disabilities.

Don’t take my word for it; here are some useful resources:

  • Neilsen Norman Group – article about the adverse impact AI can have
  • AI is reshaping UI
  • Designing with AI
  • AI for disabilities – UN Report
  • AI won’t kill UX – we will
  • NeuroNav blog

Share this:

  • Share on Facebook (Opens in new window) Facebook
  • Share on X (Opens in new window) X
  • Share on Reddit (Opens in new window) Reddit
  • Email a link to a friend (Opens in new window) Email
  • Share on WhatsApp (Opens in new window) WhatsApp
  • Print (Opens in new window) Print
  • Share on Tumblr (Opens in new window) Tumblr
  • Share on Mastodon (Opens in new window) Mastodon
  • Share on Pinterest (Opens in new window) Pinterest
  • More
  • Share on Bluesky (Opens in new window) Bluesky
  • Share on LinkedIn (Opens in new window) LinkedIn
Like Loading...
← Older posts

    I work for Oracle, all opinions here are my own & do not necessarily reflect the views of Oracle

    • About
      • Internet Profile
      • Music Buying
      • Presenting Activities
    • Books & Publications
      • Logging in Action with Fluentd, Kubernetes and More
      • Logs and Telemetry using Fluent Bit
      • Oracle Integration
      • API & API Platform
        • API Useful Resources
        • Useful Reading Sources
    • Mindmaps Index
    • Monster On Music
      • Music Listening
      • Music Reading
    • Oracle Resources
    • Useful Tech Resources
      • Fluentd & Fluent Bit Additional stuff
        • Logging Frameworks and Fluent Bit and Fluentd connectivity
        • REGEX for BIC and IBAN processing
      • Formatting etc
      • Java and Graal Useful Links
      • Official Sources for Product Logos
      • Python Setup & related tips
      • Recommended Tech Podcasts

    Oracle Ace Director Alumni

    TOGAF 9

    Logs and Telemetry using Fluent Bit


    Logging in Action — Fluentd

    Logging in Action with Fluentd


    Oracle Cloud Integration Book


    API Platform Book


    Oracle Dev Meetup London

    Blog Categories

    • App Ideas
    • Books
      • Book Reviews
      • manning
      • Oracle Press
      • Packt
    • Enterprise architecture
    • General
      • economy
      • ExternalWebPublications
      • LinkedIn
      • Website
    • Music
      • Music Resources
      • Music Reviews
    • Photography
    • Podcasts
    • Technology
      • AI
      • APIs & microservices
      • chatbots
      • Cloud
      • Cloud Native
      • Dev Meetup
      • development
        • languages
          • java
          • node.js
          • python
      • drone
      • Fluent Observability
        • Fluentbit
        • Fluentd
        • OpAMP
      • logsimulator
      • mindmap
      • OMESA
      • Oracle
        • API Platform CS
          • tools
        • Helidon
        • ITSO & OEAF
        • Java Cloud
        • NodeJS Cloud
        • OIC – ICS
        • Oracle Cloud Native
        • OUG
      • railroad diagrams
      • TOGAF
    • xxRetired
    • AI
    • API Platform CS
    • APIs & microservices
    • App Ideas
    • Book Reviews
    • Books
    • chatbots
    • Cloud
    • Cloud Native
    • Dev Meetup
    • development
    • drone
    • economy
    • Enterprise architecture
    • ExternalWebPublications
    • Fluent Observability
    • Fluentbit
    • Fluentd
    • General
    • Helidon
    • ITSO & OEAF
    • java
    • Java Cloud
    • languages
    • LinkedIn
    • logsimulator
    • manning
    • mindmap
    • Music
    • Music Resources
    • Music Reviews
    • node.js
    • NodeJS Cloud
    • OIC – ICS
    • OMESA
    • OpAMP
    • Oracle
    • Oracle Cloud Native
    • Oracle Press
    • OUG
    • Packt
    • Photography
    • Podcasts
    • python
    • railroad diagrams
    • Technology
    • TOGAF
    • tools
    • Website
    • xxRetired

    Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 2,616 other subscribers

    RSS

    RSS Feed RSS - Posts

    RSS Feed RSS - Comments

    April 2026
    M T W T F S S
     12345
    6789101112
    13141516171819
    20212223242526
    27282930  
    « Mar    

    Twitter

    Tweets by mp3monster

    History

    Speaker Recognition

    Open Source Summit Speaker

    Flickr Pics

    Boxer Rebellion @ Brixton ElectricBoxer Rebellion @ Brixton ElectricBoxer Rebellion @ Brixton ElectricBoxer Rebellion @ Brixton Electric
    More Photos

    Social

    • View @mp3monster’s profile on Twitter
    • View philwilkins’s profile on LinkedIn
    • View mp3monster’s profile on GitHub
    • View mp3monster’s profile on Flickr
    • View mp3muncher’s profile on WordPress.org
    • View philmp3monster’s profile on Twitch
    Follow Phil (aka MP3Monster)'s Blog on WordPress.com

    Blog at WordPress.com.

    • Subscribe Subscribed
      • Phil (aka MP3Monster)'s Blog
      • Join 228 other subscribers
      • Already have a WordPress.com account? Log in now.
      • Phil (aka MP3Monster)'s Blog
      • Subscribe Subscribed
      • Sign up
      • Log in
      • Report this content
      • View site in Reader
      • Manage subscriptions
      • Collapse this bar
     

    Loading Comments...
     

    You must be logged in to post a comment.

      Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
      To find out more, including how to control cookies, see here: Our Cookie Policy
      %d