Agentic AI on Alfero Chingono

How I Run SonarQube in My Own CI Pipeline (And Let AI Fix What It Finds)

Thu, 05 Mar 2026 09:00:00 +0000

I wrote in 2024 about automating OWASP scan reports in Azure DevOps because I wanted security scanning to become part of the delivery flow instead of an afterthought.

This post is the next step in that same direction.

The thing I wanted from SonarQube was not another dashboard full of guilt. I wanted a loop that could actually create work, route it, fix it, and come back cleaner on the next scan.

That changed the design completely.

The real goal was not “run SonarQube”

Running SonarQube is easy.

Turning findings into a useful engineering loop is the hard part.

The pattern I have found most practical looks like this:

run the scan on a schedule
translate findings into issues with enough structure to act on
let AI or agents handle the obvious remediation work
keep human review as the merge gate
rescan and repeat

That is what I have been doing across FireFly and CueMarshal.

The FireFly version: temporary SonarQube, durable issues

In FireFly, the workflow is intentionally self-contained.

The scheduled GitHub Action spins up a SonarQube Community service container, sets the admin password, creates the project, generates an analysis token, runs the scanner in Docker, and then uses the SonarQube API to fetch open issues.

From there, the workflow does something I think is more useful than just failing the pipeline: it turns findings into GitHub issues with meaningful labels.

The labels encode both issue type and severity:

sonar
sonar: bug
sonar: vulnerability
sonar: security hotspot
sonar: blocker
sonar: critical
sonar: major

That small step matters a lot. Once the findings live as first-class issues in the repo, they stop being hidden inside a scan report and start participating in the normal engineering workflow.

The FireFly workflow also keeps the body format clean: key, severity, type, rule, file, line, and the actual message. That makes the issue understandable without forcing someone to click back into SonarQube every time.

The CueMarshal version: findings re-enter the agent loop

CueMarshal takes the pattern further.

There, SonarQube is not just a quality gate. It is a signal source for the self-improvement system.

The scan runs on a schedule, the quality gate is checked, and when issues remain, they are picked up by the self-improvement workflow. That workflow runs deterministic scanners, produces a findings JSON file, and lets AI select the high-value, automation-friendly items to turn into actual repository work.

At that point the flow becomes very CueMarshal-like:

finding becomes issue
issue gets labels such as self-improvement and source:sonar
developer agent works the task
reviewer agent reviews it
human still controls the merge

That is the part I care about most. Static analysis becomes part of an operational loop instead of a reporting loop.

What AI actually fixed

This pattern became more convincing to me once I could see it in the commit history instead of just in a diagram.

In FireFly, the SonarQube-driven fixes moved through recognizable stages:

critical auth and data exposure issues
medium-severity issues in the LLM, tracer, and execution paths
blocker and critical tracer problems
remaining major issues in non-UI files

In CueMarshal, the same loop showed up in a different form:

bug-class findings resolved
cognitive-complexity hotspots refactored
scan-flow issues fixed so the SonarQube pipeline itself became more reliable

That is the detail that made the whole approach feel real to me. The AI was not “doing security” in some theatrical sense. It was participating in a bounded remediation loop with concrete input, reviewable output, and a cleaner next scan.

What I still keep human

I do not think static analysis findings should all be auto-fixed blindly.

Some changes affect security-sensitive behavior. Some touch core orchestration logic. Some need architectural judgment more than mechanical cleanup.

That is why I still care so much about review gates, protected areas, and explicit pull requests. AI can do triage. AI can do a surprising amount of repair work. But the system becomes trustworthy only when people retain approval authority over the consequential parts.

This is the same design instinct behind CueMarshal more broadly: automate aggressively, but make the control points obvious.

Why I like this pattern

The more repositories I maintain, the less patience I have for passive quality tooling.

If a scan only tells me what is wrong, it is useful. If a scan creates the next actionable task, it is much more useful. If that task can be routed through an AI-assisted workflow and still land in a human-reviewed PR, then the tool has become part of delivery rather than commentary on delivery.

That is the threshold I care about now.

I still think DAST and pipeline security automation matter deeply; that earlier OWASP post still reflects that. But SonarQube plus an AI remediation loop feels like the next generation of the same idea: make quality signals operational, not ornamental.

If you want the broader architecture around this, Designing Multi-Agent Systems: Lessons from Building an 8-Agent Engineering Orchestra covers the orchestration side, and Why I Started Building My Own DevOps Platform covers the bigger motivation.

References:

Designing Multi-Agent Systems: Lessons from Building an 8-Agent Engineering Orchestra

Thu, 28 Aug 2025 09:00:00 +0000

A lot of “multi-agent” demos are really one agent wearing different hats.

The names change. The prompts change. Sometimes the avatars change. But the authority model, the memory model, and the execution model are all still basically the same. That is fine for a demo. It is much less convincing when you are trying to build a system that can do real engineering work.

Building CueMarshal made that distinction impossible for me to ignore.

What I wanted was not eight personalities for marketing. I wanted a working system where planning, coding, review, testing, DevOps, documentation, and quality control could be separated cleanly enough to be trustworthy.

That is how the “engineering orchestra” idea emerged.

The roles mattered because the boundaries mattered

CueMarshal’s cast eventually became:

Marshal for orchestration
Ava for architecture
Dave for implementation
Reese for review
Tess for testing
Devin for DevOps
Dot for documentation
Linton for linting

What made that useful was not the naming. It was the fact that the roles had different responsibilities, different tool access, and different default model tiers.

That is the first lesson I would pass on to anyone designing a multi-agent system:

1. Different roles need different authority

If your reviewer can rewrite production code, your reviewer is not really a reviewer.

In CueMarshal, least privilege is deliberate. The reviewer is configured without write/edit permissions. The docs agent is restricted from shell access. The linter acts like a gate, not a developer with nicer manners.

That kind of restriction sounds limiting until you realize it is what gives each role meaning. Boundaries are not friction here. They are the mechanism that creates trust.

A good multi-agent system is not just a cluster of competencies. It is a set of constrained responsibilities.

2. Coordination needs durable state outside the model

One of the reasons I anchored CueMarshal in Git is that I did not want coordination to depend on hidden model memory.

Tasks become issues. Work becomes branches. Proposals become pull requests. Reviews become durable comments and approvals.

The Conductor receives webhooks, uses Redis and BullMQ to manage asynchronous flow, and dispatches work through Gitea Actions. The runners themselves stay stateless; they rebuild context from the repository, the issue, and the tool layer every time.

That has been a much better trade than magical continuity.

Models forget. Git does not.

3. Identity is part of the architecture

Another thing I underestimated early on was how important identity separation would be.

Each CueMarshal agent has its own account, token, and audit trail. That means the Git history shows who planned, who implemented, who reviewed, and who approved. Even when the “who” is an AI agent, the distinction still matters.

This has two benefits.

First, it improves explainability. The system becomes easier to inspect when actions are attributable.

Second, it changes how you think about safety. Once every agent has a clear identity and permission scope, you stop designing from a vague “assistant” mindset and start designing from explicit operational roles.

That shift is subtle, but it is foundational.

4. The tool layer is what makes the orchestra playable

This is where MCP became important for CueMarshal.

All of the agents connect to a structured tool layer instead of improvising raw integrations on the fly. The same Gitea, Conductor, and System capabilities can be used by the runner agents over stdio and by the orchestration layer over HTTP/SSE.

That matters because multi-agent systems are not only about reasoning. They are about coordination through reliable interfaces.

If the tools are vague, agents collide. If the permissions are sloppy, trust collapses. If the transports are inconsistent, reuse gets expensive.

The protocol is not the whole story, but it is the difference between a collection of prompts and a real system surface.

I wrote more about that in MCP in Practice, because it deserves its own treatment.

5. Model routing is architecture, not optimization

Another lesson I came away with: not every role deserves the same model.

Architecture work is more expensive and more consequential than documentation cleanup. Review often needs stronger reasoning than linting. Mechanical work should not burn premium tokens if a cheaper tier can do it reliably.

CueMarshal’s tiered routing reflects that reality:

heavy reasoning for architecture
balanced capability for implementation, review, testing, and DevOps
lighter-weight models for docs and linting

That is not just a cost decision. It is part of how the system stays sustainable.

Too many agent systems treat model choice as an afterthought. I think it belongs in the design doc.

6. Closed loops beat hero agents

The more I build these systems, the less I believe in the “super-agent” story.

What works better is a closed loop:

detect work
route it clearly
execute with constrained roles
review it
merge it with human control
feed the next signal back into the system

CueMarshal’s self-improvement workflow made this even clearer to me. Once SonarQube findings, scanners, issues, PRs, and agent roles all started participating in the same loop, the system became more useful than any single agent inside it.

That is why I think orchestration matters more than agent count.

My current takeaway

If you are building a multi-agent system, start with these questions:

What roles genuinely need to be different?
What permissions should each role have?
Where does coordination state live?
How are actions attributed?
What is the closed loop that turns outputs into the next inputs?

If you cannot answer those, adding more agents will mostly add more noise.

If you can answer them, the number of agents becomes much less important than the quality of the structure around them.

That has been the real lesson for me. The point of the orchestra is not to have more instruments. The point is to make the handoffs musical instead of chaotic.

If you want the adjacent pieces, Why I Started Building My Own DevOps Platform covers the motivation, and How I Run SonarQube in My Own CI Pipeline (And Let AI Fix What It Finds) shows what this architecture looks like when the feedback loop closes on itself.

References:

MCP in Practice: What Anthropic's Model Context Protocol Actually Means for Developers

Thu, 20 Mar 2025 09:00:00 +0000

When Anthropic announced the Model Context Protocol, the most interesting part to me was not “LLMs can call tools.” We already knew that. The interesting part was that someone was finally trying to standardize the connection.

That may sound like a small distinction, but it is the difference between a clever demo and an architecture you can actually build on.

For developers, MCP matters because it turns tool access into something more portable, more inspectable, and less bespoke. Instead of wiring every model to every internal system in a slightly different way, you get a shared protocol for secure, two-way connections between AI clients and the systems where work actually lives.

In other words: fewer one-off connectors, fewer weird wrappers, and less glue code pretending to be strategy.

The real problem MCP solves

Without a protocol, most AI integrations end up with the same shape:

custom JSON formats
hand-rolled function schemas
transport logic mixed into business logic
a different adapter for every new client

You can absolutely ship systems that way. Many people already have. But you pay for it later in duplication, debugging, and lock-in.

Anthropic’s framing resonated with me because it describes a problem I had already been running into while building CueMarshal. I did not need agents that could merely “use tools.” I needed a stable way for different parts of the system to use the same tools in different contexts.

That is where MCP becomes practical.

What it changed in my own thinking

In CueMarshal, I ended up with three MCP servers:

a Gitea MCP server for issues, pull requests, repositories, workflows, and search
a Conductor MCP server for task coordination and agent state
a System MCP server for costs, runners, and health

That split was not arbitrary. It reflected a design choice: organize tool access around bounded responsibilities instead of dumping everything into one giant catch-all toolbox.

Even more important, the same MCP server code supports two transports:

stdio for agent runners
HTTP/SSE for the long-running chat/orchestration layer

This is the part I think many developers will underestimate. The value is not just that the model can invoke a tool. The value is that your tool layer stops being trapped inside one execution model.

The CueMarshal runners can spawn MCP servers directly as child processes. The Conductor can hold long-lived connections to those same tool surfaces over the network. Same capability, different runtime, no duplicated tool logic.

That is not just elegant. It is operationally useful.

MCP is really about interface discipline

One thing building AI systems teaches very quickly is that “prompting” gets too much credit for problems that are really interface problems.

If the tool schema is vague, the model will behave vaguely.

If the permissions are broad, the behavior will feel risky.

If the transport is brittle, the whole system looks flaky even when the reasoning is fine.

What I like about MCP is that it nudges teams toward better engineering habits:

Typed tools instead of implied behavior
Separation between protocol and implementation
Reusable tool layers across multiple clients
Clearer permission boundaries

That discipline matters even if you never use Anthropic’s stack directly.

What developers should actually do with it

My advice is to treat MCP less like a product feature and more like a systems design decision.

If you are building AI-assisted software delivery, internal automation, or even just richer developer tools, start by asking:

What are the real systems my assistant needs to access?
Which of those interactions deserve typed, validated interfaces?
Which capabilities should be shared across chat, automation, and background agents?
Where do I want auditability and permission scoping to live?

That line of thinking will produce a better architecture whether you adopt MCP tomorrow or not.

In my own work, it pushed me away from raw curl-driven integration and toward a universal tool layer. Once I made that shift, a lot of downstream problems became easier: orchestration, reuse, security boundaries, and even explanation. It is easier to trust a system when you can say, very plainly, “here are the tools it has, here is what they do, and here is how they are invoked.”

What MCP does not solve

MCP does not magically make an agent reliable.

It does not fix poor workflow design.

It does not remove the need for human review.

And it definitely does not turn vague prompts into good engineering.

What it does is give you a cleaner control plane for connecting models to real systems. That is already a meaningful improvement.

For me, that is why MCP feels important. Not because it adds more AI theater, but because it reduces architectural friction in a place where friction compounds very fast.

If you are curious how that idea plays out in a larger system, I wrote more about the broader coordination problem in Why I Started Building My Own DevOps Platform and the orchestration lessons in Designing Multi-Agent Systems: Lessons from Building an 8-Agent Engineering Orchestra.

References: