For a while I was running Blackreach, a Claude Code session, and a couple other tools at the same time. None of them knew the others existed. Blackreach would finish a research task and everything it found died when the session closed. 847 inscriptions downloaded, all the metadata extracted, gone. The next coding session started completely blind, no memory of what the research agent had found. I had to manually copy results from one context to another, like passing notes between people who aren't allowed in the same room.
I looked at the multi-agent frameworks. LangGraph, CrewAI, AutoGen. They all want to solve this but they all want to own the infrastructure too. You rewrite your agents around their SDK, their abstractions, their way of thinking about the problem. If you want to use Claude Code alongside a custom Python agent alongside an Ollama session, you're mostly out of luck since none of them are designed for that combination. And none of them run fully local.
I didn't want to rewrite everything around someone else's abstractions. I wanted a layer that existing tools just pass through without knowing it's there.
How the proxy works
Velqua sits between any app and its LLM provider. It intercepts every API call, injects persistent memory into the context, and forwards the request. The app doesn't know Velqua is there. You change one port number and that's it.
# Before: agent calls provider directly
curl http://localhost:11434/api/chat # Ollama
# After: change one port number
curl http://localhost:11435/api/chat # Velqua proxy > Ollama
# memory injected automatically
The Mesh layer extends this. Instead of injecting one agent's personal memory, it injects shared knowledge from a pool that all agents read and write to. Blackreach finishes a task and writes its findings to the pool. The next agent to make a request gets those findings in its context automatically. It doesn't know Blackreach exists. It just knows what Blackreach found.
The memory model
Every conversation gets summarized and stored. Before each new request, Velqua retrieves the most relevant memories and injects them into the system prompt. Recent context, relevant findings, active task state. The agent picks up where it left off without being told to.
The tricky part is relevance. Injecting everything would just pollute the context with noise. Velqua scores memories by recency, explicit tags, and semantic similarity to the current message. A research agent asking about Linear A sign frequency gets the Linear A research history injected. A coding agent working on a Python module doesn't get that context because it isn't relevant.
The summaries are intentionally compressed. Raw conversation logs grow unbounded and get expensive fast. Compressed summaries stay small. You lose some detail but you gain persistence, and persistent memory that's slightly lossy is more useful than perfect memory that disappears when the session closes.
Agent identity without config
The first problem was figuring out which agent is making each request without requiring any changes in the agent itself. If you require agents to identify themselves, you've already broken the "zero code changes" goal.
Velqua Mesh uses a detection chain: first it checks for an X-Velqua-Agent
header, which agents can optionally set if they want explicit identity. Then the
user-agent string. Then the port number. If none of those work it assigns an anonymous
ID based on connection fingerprint. Most of the time the port number alone is enough.
One port per agent is a convention, not a requirement, but it's the easiest way to
do it.
The noteboard
Shared memory is passive by design. Agents write to it and others pick it up when they make their next request. The noteboard is more intentional. An agent can leave a structured note for a specific agent, or broadcast to whoever picks up the next task.
POST /mesh/notes
{
"from": "blackreach",
"to": "any",
"content": "847 Linear A inscriptions downloaded. Saved to /data/linear_a/.
HT 31 shows unusual sign clustering worth looking at.",
"tags": ["research", "complete"]
}
The receiving agent doesn't need to poll for notes. The next time it makes any LLM request, Velqua checks for pending notes addressed to it and injects them alongside the memory context. It just shows up with the information already there.
Where it's at
The proxy and memory engine are built and working. I've been running them locally for a while and the basic memory injection is solid. The Mesh coordination layer is what I'm actively building now: the shared pool, the relevance scoring, the noteboard.
When there's a real demo worth showing I'll update this with the actual implementation and some recorded sessions. The goal is simple: one port number, no SDK, no cloud dependency, all your agents sharing context without any of them needing to know the others exist.
If this is useful for something you're building, reach out.