I build agent infrastructure that ships.
Solo contractor specializing in AI agent systems, neuro-symbolic tooling, browser automation, and LLM infrastructure. 20 public repos, 41 stars on GitHub. Self-taught in Toronto. Available now for remote engagements.
What I build
Autonomous browser agents that survive the real web.
Blackreach
is the production web agent. Playwright + stealth, Cloudflare bypass, JS rendering,
rate-limit awareness, and resumable sessions. A DOM walker compresses 200k-token pages into
2k-token observations. MCP-wrapped as blackreach-mcp and exposed as a tool
backend to my own agent runtime. Tried to add a Rust sibling for adversarial web work but
shelved it. Blackreach + Playwright covers what I need.
AI agent infrastructure from scratch. Nüwa is a private clean-room agent CLI: a personal AI harness with chat, 23 tools, memory, skills, checkpoints, MCP, and cron. ~414k LoC plus ~543k LoC of tests, ~25.5k tests green across 1,450+ test files. Replaces the Hermes fork and proves the thesis: AI-augmented solo dev ships what teams used to. Hermes Agent is the runtime: FTS5 session memory over ~30000 messages, skills, plugins, subagent delegation, and the cron pipeline running this very page.
Neuro-symbolic systems.
Neuro-Symbolic Worker Classification
is the #1 build candidate: a legal auditor for employee-vs-contractor classification.
Symbolic layer applies jurisdiction-specific legal factors (Canada/Ontario + CA/USA);
neural layer planned for NL intake. CLI runner shipped with both interactive and
--answers JSON modes plus a stable canada_ontario_worker_status_v0
schema output (classification, confidence, score, factor audit, missing facts, sources).
15 tests green, $0 MVP, 4-week timeline. No NeSy product exists specifically for
employment-law worker classification. The broader policy-compliance-review lane now
has a peer (PolicyGuard, June 2026), so the claim narrows to employment law and the
still-open first credible public benchmark for employee/contractor classification.
Recent 2026 legal-AI papers add tailwinds: an evidentiary-adequacy criterion for
factor-level audit trails, small local LLMs paired with symbolic verifiers beating
multi-call self-consistency on constrained reasoning tasks, and abstention as a
first-class output.
Personal AI lab.
rynvyr
is the publishing arm of Josii's home AI systems: a dual-track model that pairs a
private rynvyr-wild (unhinged capability research for the home stack)
with a public rynvyr-public (open weights, methodology papers).
Pre-launch as of June 2026. First model (rynvyr-mythos-0, 7B, mythology-trained)
in cycle 0 within the month. Method is documented at
rynvyr.ai;
$300–500 every two weeks on H200s funds 4–6 cycles per month.
Agent evaluation and drift research. Rigr is the regression toolkit that freezes baselines and catches drift before production. DriftBench is an open benchmark for how AI agents degrade over long runs. Four metrics fold into a single Drift Severity score, verified by real shell commands, never the agent's own report. Build the tool and publish the paper.
Native terminal tooling. Mimir is a Linux-native terminal multiplexer built for AI coding agents. Rust + GTK4 + libadwaita, Wayland-first, agentic browser pane backed by Blackreach. Custom VT500 parser, grid, renderer, and input stack built from scratch, all pinned by golden fixtures and VTE-as-oracle differential tests. 437 tests (across 8 files), Phase 8 (rich media + standalone hardening) actively underway.
Memory, scraping, and local-first infra.
Velqua is
a transparent memory proxy. Point any Ollama app at :11435 instead of
:11434 and your AI remembers who you are across sessions, models, and tools.
Zero code changes.
Huginn is
the self-hosted scraping API. Firecrawl alternative with no per-page tax. Seven custom MCP
servers (Blackreach browser, BlackCrawl scraping, email, YouTube, LinkedIn, Goldmine crypto,
Grayson job pipeline) built from scratch as tool backends for the agent runtime.
Local-first voice and audio. claude-voice (⭐23, top public repo) is a local TTS voice mode for Claude Code with word highlighting. Zero cloud APIs, Python + Kokoro. claud-ear is the local Whisper audio-understanding MCP for the other half of the loop.
ML research that shipped. AMP Discovery is antimicrobial-peptide discovery via fine-tuning Meta's ESM-2 protein language model with LoRA. 88.3% F1, leakage-free, screened 1,980 NCBI sequences. RLVR / GRPO Lab is an inspectable post-training harness with verifiable math rewards and strict answer-contract evals. 3B and 7B runs on full GSM8K splits with paired-bootstrap evidence.
Recent work
| Project | Status | Stack |
|---|---|---|
Nüwa: clean-room agent CLI (nuwa-redo v0.1.0, personal project) |
Active build, ~414k LoC, ~25.5k tests, 23 tools, MCP, cron, checkpoints | Python, asyncio, MCP |
| Mimir: native terminal multiplexer for AI agents | Phase 8 active (rich media + standalone hardening), 437 tests (across 8 files), VT500 from scratch | Rust, GTK4, libadwaita, Wayland |
| Neuro-Symbolic Worker Classification: legal auditor | CLI shipped (interactive + --answers JSON), stable v0 schema, 15 tests green |
Python, rule engine + planned embeddings |
| rynvyr: personal AI lab, dual-track (public + private) | Pre-launch, rynvyr-mythos-0 cycle 0 within the month | PyTorch + H200 runpod, open weights |
| Blackreach: autonomous browser agent | Production on phnixbox (port 7432), MCP-wrapped, stealth Playwright | Python, Playwright, TypeScript wrapper |
| Hermes Agent: runtime / OS for AI agents | Production daily driver, ~30000 FTS5-indexed messages | Python, SQLite FTS5, MCP, cron |
| Rigr: agent regression / evaluation toolkit | Shipped, public, ⭐2 | Python |
| DriftBench: agent drift benchmark | Harness complete, paper in progress | Python, Docker, shell-verification |
| Cold pitch campaign | 14-pitch HN campaign Jun 19; Grayson pipeline at 144 leads, follow-ups ongoing | HN-targeted, solo-contractor framing |
How I work
Solo shipper, AI-augmented. I build like a team of three because I've built the tools that make that true. Claude Code and Codex for the main work, Gemini as a second opinion, my own MCP servers for tool backends, a local GPU (RTX 4060), and a workflow refined across 20 shipped public repos. Multi-agent coordination is real: long-running sessions for the daily-driver loop, parallel builds for independent components, separate sessions for review and coding. I don't need a PM to scope or a QA to catch regressions; my eval harnesses do both.
Local-first by default. Everything runs on my own hardware in Toronto. Cloud is for training runs that earn it. No vendor lock-in, no per-token anxiety, no "why is the API down" postmortems. Tools I own end to end. The homelab (phnixbox: AMD 5700X + RTX 4060 + 31GB RAM, CachyOS + Hyprland) runs Docker services for Open WebUI, Vaultwarden, Pi-hole, FlareSolverr, the Sonarr → Jellyfin media stack, Ollama, and the Goldmine paper-trade crypto agent, all behind NFTables killswitch + Tailscale mesh.
Engagement model. Fixed-bid by scope, typically 4–8 weeks. I scope alongside you, deliver milestone by milestone, and hand off with docs, tests, and an eval harness. The deliverable isn't code. It's a system you can verify keeps working after I'm gone. 16 frontier AI research specs in my back pocket (continual learning, test-time compute, neuro-symbolic reasoning, mamba SSMs, mechanistic interpretability, agent drift), so I can read a paper and ship against it.
What I look for. Projects where agent infrastructure, neuro-symbolic systems, browser automation, LLM ops, or evaluation tooling is the bottleneck. If you're coordinating agents, building classification systems with teeth, surviving the adversarial web, running inference without a cloud dependency, or measuring whether your agent still behaves after 100 steps, that's the area. I'm not your CRUD app contractor.
Rates
Pricing is anchored to the current HN contractor market (June 2026). The closest comparable profile is solo full-stack, AI-augmented, Python/FastAPI/Postgres/Docker, posted at $85/hr (davidolverson, June 12). Senior AI engineers with FAANG/YC pedigree post at $100–$130/hr (davedx $110/hr; ReadiFinancial CAD $130–180/hr ≈ USD $95–$130). Aggregate median of explicit AI/Python rates: ~$97.5/hr. I price by engagement, not by hour. The number below is the effective hourly at the listed scope.
| Engagement | Duration | Effective rate | Market position |
|---|---|---|---|
| $25K | 4–8 weeks | $78–$156/hr | Senior mid-market (25th–50th percentile). Matches davidolverson ($85/hr). Conservative for a scoped AI workflow build. |
| $35K | 6–8 weeks | $109–$146/hr | Modal close. Matches davedx ($110/hr) senior AI engineer. Right price for multi-component build with integration, evals, and handoff. |
| $50K | 8+ weeks | $156/hr | Aspirational but defensible (top decile). Full system with eval harness, docs, handoff. Reachable with the private agent-CLI work, Blackreach, DriftBench, and the MCP suite behind it. |
What each tier includes: scoping doc, milestone deliverables, eval harness, documentation, and a handoff session. Fixed-bid means you know the total before we start. I don't bill for Slack time or "alignment meetings."
Smaller starts. For teams that want to validate fit before committing to a full engagement: a $5K–$10K diagnostic or 1–2 week contained build (scoped agent workflow, eval harness for an existing system, or a targeted research spike). Same rigor, smaller surface area.
Contact
Available now for remote contract engagements. Toronto-based, open to US timezone alignment. USD/CAD.
contact@phnix.dev · résumé ↓ · github ↗
If you're working on agent infrastructure, neuro-symbolic systems, browser automation, LLM evaluation, or local-first AI tooling, say hi. Cold emails welcome. I respond within 24 hours.