Hire me | phnix.dev

What I build

Autonomous browser agents that survive the real web. Blackreach is the production web agent. Playwright + stealth, Cloudflare bypass, JS rendering, rate-limit awareness, and resumable sessions. A DOM walker compresses 200k-token pages into 2k-token observations. MCP-wrapped as blackreach-mcp and exposed as a tool backend to my own agent runtime. Tried to add a Rust sibling for adversarial web work but shelved it. Blackreach + Playwright covers what I need.

AI agent infrastructure from scratch. Nüwa is a private clean-room agent CLI: a personal AI harness with chat, 23 tools, memory, skills, checkpoints, MCP, and cron. ~414k LoC plus ~543k LoC of tests, ~25.5k tests green across 1,450+ test files. Replaces the Hermes fork and proves the thesis: AI-augmented solo dev ships what teams used to. Hermes Agent is the runtime: FTS5 session memory over ~30000 messages, skills, plugins, subagent delegation, and the cron pipeline running this very page.

Neuro-symbolic systems. Neuro-Symbolic Worker Classification is the #1 build candidate: a legal auditor for employee-vs-contractor classification. Symbolic layer applies jurisdiction-specific legal factors (Canada/Ontario + CA/USA); neural layer planned for NL intake. CLI runner shipped with both interactive and --answers JSON modes plus a stable canada_ontario_worker_status_v0 schema output (classification, confidence, score, factor audit, missing facts, sources). 15 tests green, $0 MVP, 4-week timeline. No NeSy product exists specifically for employment-law worker classification. The broader policy-compliance-review lane now has a peer (PolicyGuard, June 2026), so the claim narrows to employment law and the still-open first credible public benchmark for employee/contractor classification. Recent 2026 legal-AI papers add tailwinds: an evidentiary-adequacy criterion for factor-level audit trails, small local LLMs paired with symbolic verifiers beating multi-call self-consistency on constrained reasoning tasks, and abstention as a first-class output.

Personal AI lab. rynvyr is the publishing arm of Josii's home AI systems: a dual-track model that pairs a private rynvyr-wild (unhinged capability research for the home stack) with a public rynvyr-public (open weights, methodology papers). Pre-launch as of June 2026. First model (rynvyr-mythos-0, 7B, mythology-trained) in cycle 0 within the month. Method is documented at rynvyr.ai; $300–500 every two weeks on H200s funds 4–6 cycles per month.

Agent evaluation and drift research. Rigr is the regression toolkit that freezes baselines and catches drift before production. DriftBench is an open benchmark for how AI agents degrade over long runs. Four metrics fold into a single Drift Severity score, verified by real shell commands, never the agent's own report. Build the tool and publish the paper.

Native terminal tooling. Mimir is a Linux-native terminal multiplexer built for AI coding agents. Rust + GTK4 + libadwaita, Wayland-first, agentic browser pane backed by Blackreach. Custom VT500 parser, grid, renderer, and input stack built from scratch, all pinned by golden fixtures and VTE-as-oracle differential tests. 437 tests (across 8 files), Phase 8 (rich media + standalone hardening) actively underway.

Memory, scraping, and local-first infra. Velqua is a transparent memory proxy. Point any Ollama app at :11435 instead of :11434 and your AI remembers who you are across sessions, models, and tools. Zero code changes. Huginn is the self-hosted scraping API. Firecrawl alternative with no per-page tax. Seven custom MCP servers (Blackreach browser, BlackCrawl scraping, email, YouTube, LinkedIn, Goldmine crypto, Grayson job pipeline) built from scratch as tool backends for the agent runtime.

Local-first voice and audio. claude-voice (⭐23, top public repo) is a local TTS voice mode for Claude Code with word highlighting. Zero cloud APIs, Python + Kokoro. claud-ear is the local Whisper audio-understanding MCP for the other half of the loop.

ML research that shipped. AMP Discovery is antimicrobial-peptide discovery via fine-tuning Meta's ESM-2 protein language model with LoRA. 88.3% F1, leakage-free, screened 1,980 NCBI sequences. RLVR / GRPO Lab is an inspectable post-training harness with verifiable math rewards and strict answer-contract evals. 3B and 7B runs on full GSM8K splits with paired-bootstrap evidence.

Recent work

Project	Status	Stack
Nüwa: clean-room agent CLI (`nuwa-redo` v0.1.0, personal project)	Active build, ~414k LoC, ~25.5k tests, 23 tools, MCP, cron, checkpoints	Python, asyncio, MCP
Mimir: native terminal multiplexer for AI agents	Phase 8 active (rich media + standalone hardening), 437 tests (across 8 files), VT500 from scratch	Rust, GTK4, libadwaita, Wayland
Neuro-Symbolic Worker Classification: legal auditor	CLI shipped (interactive + `--answers JSON`), stable v0 schema, 15 tests green	Python, rule engine + planned embeddings
rynvyr: personal AI lab, dual-track (public + private)	Pre-launch, rynvyr-mythos-0 cycle 0 within the month	PyTorch + H200 runpod, open weights
Blackreach: autonomous browser agent	Production on phnixbox (port 7432), MCP-wrapped, stealth Playwright	Python, Playwright, TypeScript wrapper
Hermes Agent: runtime / OS for AI agents	Production daily driver, ~30000 FTS5-indexed messages	Python, SQLite FTS5, MCP, cron
Rigr: agent regression / evaluation toolkit	Shipped, public, ⭐2	Python
DriftBench: agent drift benchmark	Harness complete, paper in progress	Python, Docker, shell-verification
Cold pitch campaign	14-pitch HN campaign Jun 19; Grayson pipeline at 144 leads, follow-ups ongoing	HN-targeted, solo-contractor framing

How I work

Solo shipper, AI-augmented. I build like a team of three because I've built the tools that make that true. Claude Code and Codex for the main work, Gemini as a second opinion, my own MCP servers for tool backends, a local GPU (RTX 4060), and a workflow refined across 20 shipped public repos. Multi-agent coordination is real: long-running sessions for the daily-driver loop, parallel builds for independent components, separate sessions for review and coding. I don't need a PM to scope or a QA to catch regressions; my eval harnesses do both.

Local-first by default. Everything runs on my own hardware in Toronto. Cloud is for training runs that earn it. No vendor lock-in, no per-token anxiety, no "why is the API down" postmortems. Tools I own end to end. The homelab (phnixbox: AMD 5700X + RTX 4060 + 31GB RAM, CachyOS + Hyprland) runs Docker services for Open WebUI, Vaultwarden, Pi-hole, FlareSolverr, the Sonarr → Jellyfin media stack, Ollama, and the Goldmine paper-trade crypto agent, all behind NFTables killswitch + Tailscale mesh.

Engagement model. Fixed-bid by scope, typically 4–8 weeks. I scope alongside you, deliver milestone by milestone, and hand off with docs, tests, and an eval harness. The deliverable isn't code. It's a system you can verify keeps working after I'm gone. 16 frontier AI research specs in my back pocket (continual learning, test-time compute, neuro-symbolic reasoning, mamba SSMs, mechanistic interpretability, agent drift), so I can read a paper and ship against it.

What I look for. Projects where agent infrastructure, neuro-symbolic systems, browser automation, LLM ops, or evaluation tooling is the bottleneck. If you're coordinating agents, building classification systems with teeth, surviving the adversarial web, running inference without a cloud dependency, or measuring whether your agent still behaves after 100 steps, that's the area. I'm not your CRUD app contractor.

Rates

Pricing is anchored to the current HN contractor market (June 2026). The closest comparable profile is solo full-stack, AI-augmented, Python/FastAPI/Postgres/Docker, posted at $85/hr (davidolverson, June 12). Senior AI engineers with FAANG/YC pedigree post at $100–$130/hr (davedx $110/hr; ReadiFinancial CAD $130–180/hr ≈ USD $95–$130). Aggregate median of explicit AI/Python rates: ~$97.5/hr. I price by engagement, not by hour. The number below is the effective hourly at the listed scope.

Engagement	Duration	Effective rate	Market position
$25K	4–8 weeks	$78–$156/hr	Senior mid-market (25th–50th percentile). Matches davidolverson ($85/hr). Conservative for a scoped AI workflow build.
$35K	6–8 weeks	$109–$146/hr	Modal close. Matches davedx ($110/hr) senior AI engineer. Right price for multi-component build with integration, evals, and handoff.
$50K	8+ weeks	$156/hr	Aspirational but defensible (top decile). Full system with eval harness, docs, handoff. Reachable with the private agent-CLI work, Blackreach, DriftBench, and the MCP suite behind it.

What each tier includes: scoping doc, milestone deliverables, eval harness, documentation, and a handoff session. Fixed-bid means you know the total before we start. I don't bill for Slack time or "alignment meetings."

Smaller starts. For teams that want to validate fit before committing to a full engagement: a $5K–$10K diagnostic or 1–2 week contained build (scoped agent workflow, eval harness for an existing system, or a targeted research spike). Same rigor, smaller surface area.

Contact

Available now for remote contract engagements. Toronto-based, open to US timezone alignment. USD/CAD.

contact@phnix.dev · résumé ↓ · github ↗

If you're working on agent infrastructure, neuro-symbolic systems, browser automation, LLM evaluation, or local-first AI tooling, say hi. Cold emails welcome. I respond within 24 hours.