Velqua

The transparent memory proxy for local LLMs. Point any Ollama app at :11435 instead of :11434. That is the whole integration. Your AI remembers you now.

645tests passing
85%coverage
localnothing phones home
PyPIpip install velqua
You: What do you know about me?

Without Velqua:   "I don't have any information about you."
With Velqua:      "You're building AI infrastructure. You prefer Python.
                   You run everything on local hardware."

Why a proxy

Every other memory layer wants integration work. Mem0 wants API calls. Zep wants an SDK. OpenMemory wants Docker, Postgres, and Qdrant before it says hello. I did not want to rewrite my tools around somebody's abstractions. I wanted memory that existing apps pass through without knowing it is there.

any ollama app (open webui, continue.dev, a script...) │ zero code changes. you change a port number. ▼ :11435 velqua │ 1. identify the agent ── header? user-agent? port? fingerprint │ 2. retrieve memories ── scored by recency + tags + similarity │ 3. inject into the system prompt │ 4. forward the request, untouched otherwise ▼ :11434 ollama ── your models, unchanged │ └─ response flows back, conversation gets summarized + stored for next time. compressed on purpose. lossy memory that persists beats perfect memory that dies with the session

The hard part is relevance

Injecting everything would just pollute the context. Velqua scores memories by recency, explicit tags, and semantic similarity to the current message. A research agent asking about Linear A sign frequency gets the Linear A history. A coding agent working on a Python module does not, because it is not relevant. That scoring is where most of the engineering lives.

Seed it from your history

Velqua imports your existing chat history. Export from Claude or ChatGPT, drag the JSON in, and it extracts personal facts so the memory is useful from day one instead of starting cold. API keys are encrypted at rest. Binds to 127.0.0.1 by default. Nothing phones home.

The mesh layer

The Mesh extension turns one agent's memory into a shared pool. Blackreach finishes a research task and writes what it found. The next agent to make any LLM request gets those findings injected automatically. It does not know Blackreach exists. It just knows what Blackreach found. Full architecture in the write-up: multi-agent coordination without an SDK →

Velqua is the first link in my memory and persistence line. The long game is agents that actually persist and think past a single LLM call. This is the infrastructure end of that thread.

pip install velqua, run velqua-server, open localhost:8765. The setup wizard does the rest. Open source on GitHub. Built for people who run their own AI.