AI-supported development bringing us back to requirements-led development?

27 Monday Apr 2026

Posted by mp3monster in AI, General, Technology

Tags

AI, artificial-intelligence, development, LLM, programming, SDD, spec driven development, Technology, TOGAF

Let me start by clarifying some terminology.

an informal noun referring to the mood, atmosphere, or aura produced by a particular person, thing, or place that is sensed or felt

This is deeply at odds with the idea of software engineering, where the OED describes engineering as:

the activity of applying scientific and mathematical knowledge to the design, building, and control of structures, machines, systems, and processes

While there is a place for vibing – to explore and help test ideas, when it comes to enterprise solutions with icy, typically have large footprints, or will grow to have large footprints and high data volumes, therefore need a more disciplined approach to ensure all those non-functional considerations can be addressed, and sustained. Put it another way, would you take an artesian approach to building and maintaining a petrochemical refinery?

This is why I try to separate the idea of vibe coding from a more disciplined AI-assisted development. A name that doesn’t roll off the tongue well, but conveys the idea that the engineer is in control and can impose discipline to drive the NFRs.

Hopefully, this also helps address nuance, which is often missing in discussions about the use of AI in software engineering, which is definitely polarising viewpoints (like many things today).

Spec-driven development

Spec Driven Development (SDD) is a growing topic in the A.I. assisted development space, and growing as a reflection of the fact that LLMs are improving rapidly, best illustrated at the moment with Mythos. The basis of SDD is to help drive consistency, structure, sustainability and rigour into the AI dev process (back to vibe coding). Consistency and structure allow us to start to easily agentify or tool aspects of development.

Getting a consistent, clear explanation of what constitutes SDD isn’t necessarily straightforward, but the best definition is in an article by Birgitta Böckeler on Martin Fowler’s website. The article dives into not just a basic explanation, but also characterises the differing approaches. The article teased out three versions of the idea, which paraphrasing are:

Spec First – very much like the old-fashioned, here are the requirements that are used to generate a first iteration of the code base. Then subsequent refinements, improvements and general evolution are introduced through successive direct code changes, and/or direct prompting of the LLM to modify different pieces, and add functionality.
Spec Anchored – the Spec is retained for ongoing reference and maintained.
Spec as Source – we don’t really care bout the code, we want a change, we only edit the spec. Code is almost a form of conversation memory, which prevents the LLM from recreating from scratch and producing an answer that looks a bit different, potentially resulting in API names that differ, etc.

This evolution, particularly as people move or are pushed by leadership fearing losing a competitive edge through perceived lower development velocity, increasingly towards a spec-only approach, left me thinking about the agile manifesto and its declaration:

‘we value working code over documentation‘.

While this still has to be true, as ultimately, working code delivers the value. But the heading for the documentation has to be clear, concise, and sized for LLMs’ working documentation, as that is how we get to working code. This isn’t just to bash out some instructions and unleash the LLM; it does need to be refined and iterated on (in many ways, just like a book). We should prompt the LLM to seek clarification rather than let it make assumptions. Furthermore, we need the documentation to be accurate because an LLM will exhibit childlike trust, and if it is working with misaligned content, you’re in a 50/50 position. Unleashing an LLM on your codebase may lead to the wrong outcome. Perhaps, we need to extend the Agile manifesto, with a statement like:

we value correct, accurate, clear and concise documentation over any documentation

In other words, when using an LLM in your development context, it is better to get the LLM to reverse engineer the code to create documentation of your current state (even if that is at the price of losing the original context, design ideals, requirements, etc.) than to allow the LLM to see inaccurate and poor documentation. If this new principle is true, then we need to move away from Spec first to atleast Spec anchored approach.

Given this, we should see the heart of an engineering process looking something like:

This is what we should expect with a Spec Anchored or Spec as Source. Whereas with Spec First, the return flow will never happen.

With Spec First, our process is more like this: once the code for the first iteration is generated, we just iterate on it.

I think one of the challenges with the view of everything is that, as the Spec lead, there is an expectation that, to do it, we go from a very high-level definition straight to code. The reality is that we need the process to be more human-like. We use the LLM to take requirements and drive a high-level design. We then use the LLM to break the HLD into multiple LLDs. Importantly, we iterate on the process, until the decomposition of detail is right. The LLM cycle focuses on just one output at a time. We can certainly then use the LLM to determine consistency and integrity across all the LLDs.

From Requirements to Architectural Views

There is a natural extension to this. If we are to swing back to a document-led approach (albeit with a very different journey from document to working code), could we see increased adoption of TOGAF and other architectural frameworks? Many in the past have used such frameworks as part of the argument as to why things should be code first, as often the framework artefacts are seen as the end, rather than the process and techniques as a means to an end (i.e. we do architecture, therefore I must create a large document set, rather than we do architecture to ensure we get the details we need from code correct).

Certainly, using an LLM to help with the creation and maintenance of architectural views, including making it easier to search for and address inconsistencies across different viewpoints, without necessarily needing very prescriptive, complex, and expensive toolsets.

The document flow if we start with architectural frameworks, from Zachman, TOGAF, C4 etc. Note the return flow needed for Spec Anchored or Spec as Source is rarely happens.

A step in this direction may well be projects such as Common Architecture Language Model (CALM), which is supported by the Fintech Open Source Foundation (FINOS), a child organisation of the Linux Foundation. While I haven’t investigated CALM very deeply, the essence is to define the architectural building blocks in a structured manner, which means that, from the definitions, more detailed diagrams can be generated and AI can be used to analyse the artefacts, etc. This sounds like a potential stepping stone between the organisation/enterprise models of Zachman and TOGAF, which aim to describe how both businesses operate and the underlying technology.

Could we see a time when docs and code stay aligned?

My experience has shown that when a spec has been involved in the process, it has exhibited the characteristics of the Spec First approach, and that the most consistently accurate documents are the user manuals, purely because they have to be created from what the code does. But such documents aren’t meant to tell you about the inner workings of a solution. This is true to the point that organisations have abandoned their architectural models, as they can’t be trusted as an as-is reflection and must start from scratch.

But to achieve the value of Spec Anchored or Spec as Source, we have to ensure that the feedback loop is working: the LLM feeds a backup stream with any changes, and downstream inputs, such as the impact of tool selection, can shift the solution. While the feedback loop should be a lot easier, it still requires commitment and effort to ensure that flow happens (certainly, since it is typically not a regularly practised behaviour).

Flies in the ointment

Trying to drive even a Spec Anchored philosophy is going to be difficult if the LLMs aren’t so great at generating quality code, or quality low-level designs that lead to the code generation. These factors are going to be dependent on choice of LLM being used, how the LLM is prompted, and most crucially the target programming languages (A.I. Codex does well with Python and Java, but I doubt it would make a good job of something like Erlang or Lisp).

The second problem is that there is a common error of people wanting to jump in and cut code (or documents), which often comes from:

Rather than stopping to ask the question, has this problem been solved before, and in a way I can leverage? We plough on creating new unproven code.
The view that the only place where a solution can come from is within the engineering team.

While it will be easy to blame the LLM for problems coming from these actions, are very much human.

Conclusion

As we’ve worked through much of this picture, the irony is that, in many respects, we’re no further forward. We can still make the same mistakes (failing to work through the NFRs properly, failing to define what should happen when something is wrong – aka ‘unhappy paths’, which make recovery simpler). We just have coding and document writing speed shift from 30-40Hz (the speed of a keyboard warrior) to GHz. The same problems can occur because influential decisions are still human (and remember, LLMs are, at their heart, just a computational representation of common thinking (wisdom of crowds, you might say) and therefore still vulnerable).

Going faster means mistakes happen more quickly, and uncorrected mistakes create more mess. To use an analogy, if you crash a car into a wall at 10mph, you’ll damage the bodywork, but it won’t be catastrophic. For many men, the biggest damage will be to the ego. You have the same crash at 100mph, and the outcome will be fatal. While the ability (or lack of) to absorb the energy is what will be the killer, it is actually the fact that you no longer have the time to think and change direction that is the true cause.

Perhaps what we should be seeking from AI is not to get to the end faster, but to use the acceleration to create time to consider what it is we want to achieve and how we continue building on our long-term, more sustainable achievements. This isn’t anti-agile. But it is anti ‘fail fast, fail frequently’ which has been a conflation of ideas without full understanding, and becoming more regularly challenged (like this Forbes Article)

References

OpAMP server with MCP – aka conversational Fluent Bit control

14 Tuesday Apr 2026

Posted by mp3monster in AI, chatbots, development, Fluent Observability, General, OpAMP, Technology

≈ Leave a comment

Tags

AI, chatops, Fluent Bit, Fluentd, LangGraph, LLM, MCP, OpAMP, OpenTelemetry, OTel, OTLP

I’ve written a few times about how OpAMP (Open Agent Management Protocol) may emerge from the OpenTelemetry CNCF project, but like OTLP (OpenTelemetry Protocol), it applies to just about any observability agent, not just the OTel Collector. As a side project, giving a real-world use case work on my Python skills, as well as an excuse to work with FastMCP (and LangGraph shortly). But also to bring the evolved idea of ChatOps (see here and here).

One of the goals of ChatOps was to free us from having to actively log into specific tools to mine for information once metrics, traces, and logs reach the aggregating back ends, but being able to. If we leverage a decent LLM with Model Context Protocol tools through an app such as Claude Desktop or ChatGPT (or their mobile variants). Ideally, we have a means to free ourselves to use social collaboration tools, rather than being tied to a specific LLM toolkit.

With a UI and the ability to communicate with Fluentd and Fluent Bit without imposing changes on the agent code base (we use a supervisor model), issue commands, track what is going on, and have the option of authentication. (more improvements in this space to come).

New ChatOps – Phase 1

With the first level of the new ChatOps dynamism being through LLM desktop tooling and MCP, the following are screenshots showing how we’ve exposed part of our OpAMP server via APIs. As you can see in the screenshot within our OpAMP server, we have the concept of commands. What we have done is take some of the commands described in the OpAMP spec, call them standard commands, and then define a construct for Custom Commands (which can be dynamically added to the server and client).

interaction through Claude Desktop which has been configured with our MCP server (part of our OpAMP server) showing us what can be done

The following screenshot illustrates using plain text rather than trying to come up with structured English to get the OpAMP server to shut down a Fluentd node (in this case, as we only had 1 Fluentd node, it worked out which node to stop).

Claude Desktop showing conversation to shutdown a FLuentd node

Interesting considerations

What will be interesting to see is the LLM token consumption changes as the portfolio of managed agents changes, given that, to achieve the shutdown, the LLM will have had to obtain all the Fluent Bit & Fluentd instances being managed. If we provide an endpoint to find an agent instance, would the LLM reason to use that rather than trawl all the information?

Next phase

ChatGPT, Claude Desktop, and others already incorporate some level of collaboration capabilities if the users involved are on a suitable premium account (Team/Enterprise). It would be good to enable greater freedom and potentially lower costs by enabling the capability to operate through collaboration platforms such as Teams and Slack. This means the next steps need to look something along the lines of:

Anthropic Mythos – an LLM with potent security sting

13 Monday Apr 2026

Posted by mp3monster in AI, General, Technology

≈ 1 Comment

Tags

AI, Anthropic, Mythos, Security

There has been a rapidly growing series of articles being written about the limited launch of Mythos, a new LLM. The evolution of models has helped quickly advance AI-assisted software development. But the capabilities of Mythos and Project Glasswing that really grabbed attention and concern.

Glasswing is an initiative that allows major partner software and service vendors to access the Mythos model. This is because Mythos has made significant advances in identifying software vulnerabilities and generating exploits for them. This has been illustrated by Anthropic’s Red team – which found bugs in OpenBSD (OS) that have evaded detection for as much as 27 years. While the BSD family of operating systems isn’t as pervasive as Linux, they both share a similar open ethos and a sufficient community to keep them active and maintained. The underlying message here is that we can find and exploit such vulnerabilities, and there are certainly opportunities to do so elsewhere, in software that can affect a great many more users, such as Firefox.

Having key software vendors, such as OS and browser vendors, get access is certainly a positive step, but it doesn’t address a key consideration. Applying code fixes and releasing updates does not, by itself, equate to being safer. The true challenge is for end users and organisations to recognise the need to roll out updates quickly. This is where the source of true concern should be. The concerns …

Organisations don’t always release patches as soon as they’re available. There is an element of testing to ensure no adverse impact on each organisation’s setup. Even with simple browser changes, something affects the app’s behaviour.
Change represents risk, and organisations that experience issues during rollouts become increasingly risk-averse. Ironically, this is counterintuitive, but a very human reaction.
Vendors’ patching tends to prioritise the latest versions of products, which can create dependency challenges. Bringing software up to date can result in a growing infrastructure footprint (more storage, memory and CPU needed – vendors add capabilities and features to compete and meet customer feature needs, driving continuous growth). That can really add costs, particularly in highly distributed use cases, such as user desktops and IoT devices. Addressing the accumulation of patches means devices no longer have the capability to properly service the new footprint. Consider this: why do people replace smartphones? Sometimes it’s hardware features like a better camera, but often it’s simply not enough storage or not being able to run all the apps, photos, etc.
.Digging into some of the details from the Red Team shows that the LLM usage costs to uncover the vulnerabilities run from $50 – $20,000. This could have ramifications for smaller, more specialised software solutions where the cost of regularly rerunning the analysis outstrips potential revenue. As a result, we could suddenly see software product prices climb, or companies simply stop producing products we depend on. This may also see bad actors wanting to more quickly recoup the cost by accelerating the use of new exploits, in other words, more attacks, coming more quickly. Such considerations will create more pressure on the speed of patch cycles.
This level of capability suggests that we really do need to ensure people shift from boundary-style security to security at every layer of our solutions. That’s not just simply authentication, but code being defensive, validating data values it gets given and os on.

All of this means we have to change mindsets from just enough, or simply putting a front-line security layer in place, to embedding. As end users, we must start to adopt several behaviours:

Security conscious with our own devices – keeping software up to date and patched. I would consider my family to be above average when it comes to tech savvy, but even I am having to go in and run Windows updates on laptops, for example.
Start voting with our feet – many of the services we use are largely or entirely software-powered (banks, energy providers), if those providers show signs of not taking security seriously enough, time to go elsewhere before we become victims.

Keeping up

One observation that the Mythos and Project Glasswing reporting is that the advancements are significant step changes, not incremental advancements (for example, Antghropic’s Sonnet 4.6 was only released a couple of months ago, and didn’t score highly for creating exploits – although better at detection). This suggests a couple of things …

IT law has always played a game of catch-up, but if the advancements are going to be this large and this frequent, we have to start legislating against hypotheticals and allowing legal precedents to produce fine detail interpretations.
We may have to consider big-brother observation of AI use, mitigated by strong transparency rules governing the handling of findings.
Is the idea that we need to start looking at incorporating something like Asimov’s 3 Laws of Robotics into LLMs now looking far-fetched?
Do we need to start thinking about mitigating the risk of deep exploits by bringing back the possibility that systems must be air-gapped?

Hyperbole?

It would be easy to put this down to hyperbole, or wanting to be a click-baity, but this is gaining a lot of high-profile attention, just consider these examples:

Open Source development – growing AI challenges

10 Friday Apr 2026

Posted by mp3monster in AI, General, Technology

≈ Leave a comment

Tags

AI, artificial-intelligence, development, open-source, Technology

The software industry’s current upheavals due to AI are showing signs of unexpected and unintended victims, one of which is open-source software. Open-source foundations run very deep, from Linux to web and app servers, and even to key cryptography technologies.

While there are commercially funded open source efforts, such as chunks of Kubernetes, depending upon which reports you look at, 10-30% of the effort comes from individuals providing their own personal time for free. But we’re seeing a number of threats growing on this…

The number of maintainers is small on some projects. A really good example of this is the Nginx Ingress controller for Kubernetes, which is now no longer being maintained, not because it isn’t needed, but because no one was willing to step up to the plate with their own time or provide salaried engineers. This has triggered something of an outcry (see here, for example).
As this article Microsoft execs warn agentic AI is hollowing out the junior developer pipeline shows, AI-assisted development risks harming the flow of development skills. The issue is that if all junior engineers primarily rely on AI to code and test functionality, the hard-earned experience that teaches you what is good, bad, and where the pitfalls are, they will not gain. Meaning, the skills needed to understand and maintain very large codebases won’t be as strong.
GitHub has argued (here) that AI in development has made it easier for people to get involved and contribute to open-source initiatives, and I’d agree it makers it easier. The challenge, I think, is less an issue of ease than of mindset. I would argue that it is the motivation to contribute and the satisfaction of having contributed that drives open-source contributions, but this is at risk of being undermined (papers such as On Developers’ Personality in Large-scale Distributed Projects indicate open-source contributors tend towards a personality profile, which maybe more suspectable to issues that can lead them to disengage (Connection Between Burnout and Personality Types in Software Developers). While not concrete, overload OSS contributors, and they’re more likely to disengage from contributing.
Adding to the weaker pipeline of skills, it is shown that AI often doesn’t deliver on expectations. Several articles have cited this paper for example Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity, we going to see even more pressure on those who are maintaining software that everyone depends upon.
AI slop, as a result of using a poor coding model, or poor prompting, is showing us that, unwittingly (or through deliberate maliciousness we are seeing pull requests that are buggy, or junk being created at ever faster rates. This puts more work on the core developers to just manage PRs, as described in AI is burning out the people who keep open source alive (another such article at CNCF – Sustaining open source in the age of generative AI). Not to mention even worse actions, such as that described in An AI Agent Published a Hit Piece on Me. This sort of thing will affect people’s willingness to be involved, even when their time is being paid for by a company. This concern is such that InfoWorld reported GitHub are considering the ability to restrict PR velocity (see here).
Another side effect, with the ‘AI arms war’, increases pressure within organisations to adapt, or accelerate as a result of AI expectations. Those donating personal time are less likely to find time to support open-source initiatives, as their focus will be very much on staying secure in their day jobs.

There is no single or simple solution. But that doesn’t mean there aren’t things we can do to help. Some immediate possibilities include:

better messaging about what makes up and propels open-source initiatives beyond commercial contributions. This can help counter the perception that organisations like the CNCF appear to be leaning into large commercial organisations and following open-source business models. But that isn’t the case, and even in the commercial setup, the teams aren’t necessarily that large.
I’m not an advocate of the dual licensing model, as it can create uncertainty in user communities and potential adopters of technologies. This uncertainty can drive disruptive changes; we’ve seen this with OpenSearch and ElasticSearch, OpenELA fork of Linux, among others. It can also hamper early-stage startups. But we can do something: a low-cost entry into CNCF that can help finance the not-for-profit development setups. Use the PR process to help collect metrics and recognise organisations that contribute even a little through PRs, biasing that recognition toward projects with limited support. Not to mention recognising contributors and committers individually (just as CNCF and Linux Foundation provide recognition to conference speakers).
Companies employing early years engineers should implement initiatives that require some development work to be performed without AI assistance and use performance tooling. Yes, this means a short-term drop in productivity, but one thing my years in the industry and training have taught me is that understanding how things work under the hood makes it easier to address problems and recognise ‘bad smells’. Understanding this, helps understand how solutions can scale.
Perhaps University courses could consider awarding credits to students who support important open-source projects, or allow a level of contribution to count toward coursework. This sort of thing would also open up the open-source world a lot more. I, for one, would give credit to a graduate who has contributed to a reputable open-source initiative.

There is on thing I am certain of, though, it is the leadership and sponsors of organisations such as CNCF, Linux Foundation, Apache, Open Source Initiative that can influence the situation the most, and it is in everyone’s interest that when open-source components have to be folded that there is atleast an easier off-ramp, than the 6 months given to switch from using something like NGINX Ingress Controller.

OpAMP Implementation to support Fluent Bit and Fluentd

02 Thursday Apr 2026

Posted by mp3monster in Fluent Observability, General, Technology

≈ Leave a comment

Tags

API, Cloud, development, observability, OpAMP, OTel, Technology

Time to share a short update on our OpAMP project to support Fluent Bit and Fluentd in a supervisor model. We’ve just put a V0.3 label on the GitHub repo (https://github.com/mp3monster/fluent-opamp). The trigger for this has been the refactoring so that the framework on the client side is as reusable as possible for both Fluentd and Fluent Bit (the benefit of implementing Opamp using a supervisor model)

As OpAMP defines what happens between the Client and Server rather than how the client, server and agent must behave as well as the protocol we’ve introduced some features not mandated by the standard but can be delivered using the OpAMP framework. Such as shutting the agent down completely.

The following sections summarize what has recently been incorporated.

What’s New in Our OpAMP Supervisor Stack

Over the last set of releases, we focused on three areas that matter in day-to-day operations:

Better multi-agent support through a cleaner client architecture for both Fluent Bit and Fluentd.
Optional bearer-token authentication that can be enabled in production and disabled for fast local development and tests.
Clear, predictable rules for when a client sends a full state refresh back to the server.

This post is a walkthrough of what changed and why.

Client Architecture Refactor: Fluent Bit and Fluentd as First-Class Implementations

We restructured the consumer so Fluent Bit and Fluentd are now explicit concrete implementations built on a shared abstract client.

Why this matters

Before this work, behavior could drift toward Fluent Bit defaults in places where Fluentd needed different handling. The new structure makes those differences deliberate and visible.

What we changed

Shared logic is centralised in a typical abstract client and reusable mixins.
Fluent Bit remains the default implementation with shared runtime behaviour.
Fluentd overriding functionality for Fluentd-specific behaviour, including:
- monitor agent config parsing from fluentd.conf/YAML
- monitor agent endpoint usage for version and health
- Fluentd-specific health parsing and service type handling

Operational improvements

We’ve created scripts to make it easy to get things started quickly. The startup scripts were standardized:

scripts/run_fluentbit_supervisor.sh|cmd
scripts/run_fluentd_supervisor.sh|cmd
scripts/run_all_supervisors.sh|cmd

Optional Bearer-Token Authentication (With a Fast Disable Switch)

We added optional bearer-token auth for the UI and MCP end points in the server. The OPAMP spec points to different authentication strategies that need to be addressed. For bearer-token-managed endpoints (where you can direct the server to do things that are potentially much more harmful), the design goal is to keep development and unit testing simple, so we have some controllable modes..

Modes

Authentication is controlled by environment variables:

disabled (default): no auth checks.
static: bearer token checked against a configured shared token.
jwt: JWT bearer validation via JWKS (for example with Keycloak).

Why this model works

Production can enforce auth with static or JWT validation.
Local development and endpoint unit tests can run with auth disabled, avoiding unnecessary test harness complexity.
The same app can move between dev/staging/prod by environment configuration, without code changes.

Protected endpoints

Protection is prefix-based (for example /tool, /sse, /messages, /mcp) and configurable.

This means teams can gradually expand the scope of protection over time by updating path prefixes, rather than doing an all-or-nothing rollout.

Auth observability

Authorisation rejections are logged with mode, method, path, source, and reason, making failed requests easier to troubleshoot.

Full-State Refresh Rules: More Predictable and Easier to Reason About

A major part of OpAMP behavior is deciding when to send compact updates versus a fuller state snapshot. We now fully observe the approach defined by OpAMP, but also have explicit controller-driven rules to provide robustness to the solution.

Core mechanism

The client tracks reporting flags for optional outbound sections, such as:

agent_description
capabilities
custom_capabilities
health

If a flag is set, that section is included on send. After inclusion, the flag resets. Controllers determine when those flags are re-enabled for a future full refresh.

Controller strategies

We support three controller types:

AlwaysSendre-enable all report flags after each successful send.
SentCount: re-enable all report flags after N successful sends (fullResendAfter).
TimeSend: re-enable all report flags after a configured elapsed interval.

Important behaviour detail

Controller updates happen after a successful send. This means a controller schedules what the next message should include; it does not mutate the already-transmitted message.

Server-driven override

If the server sets ReportFullState in ServerToAgent.flags, the client immediately re-enables all reporting flags so the next outbound message contains full reportable state.

This gives operators a direct way to request state re-synchronization when needed.

Security + Developer Experience Balance

A recurring theme in this work was avoiding “security vs usability” tradeoffs:

Auth can be strict in production.
Auth can be disabled in local/test workflows.
Endpoint protection scope is configurable and incremental.
Rejection logging is explicit for troubleshooting.

That same principle guided client behavior:

Shared behavior is centralized.
Agent-specific behavior is explicit where required.
Full refresh rules are deterministic and configurable.

What This Enables Next

This foundation makes the next iterations easier:

extending JWT/IdP deployment patterns (for example, broader Keycloak automation)
adding more agent variants with fewer regressions
improving configuration and rollout safety for mixed Fluent Bit + Fluentd estates

If you’re running both Fluent Bit and Fluentd, this release should make the platform easier to operate, easier to secure, and easier to reason about under change.