← All projects
Active · v5.0.0-beta.1

Blackreach

Autonomous web research agent. Runs overnight. Handles Cloudflare, dynamic JS, login walls, and rate limits. Built because every other option failed the moment it touched anything real.

2,904
test assertions
v5.0
current version
27
stars
MIT
license
live session

The problem

Every autonomous web agent I tried had the same issue. It worked on the demo site and fell apart on anything real. Cloudflare caught it immediately. JavaScript that rendered table content after a two second delay was invisible to it. Rate limit responses came back as 200 OK with an error page body and the agent reported success, saved garbage, and moved on.

I needed something that could run actual research tasks overnight. Full academic database downloads, multi-step navigation, sites that actively resist automation. Nothing I found was built for that. So I built Blackreach.

How it works

At the core is a ReAct loop. The agent receives a task, thinks through what action to take, executes it, observes the result, and repeats. The hard part is the observation space. Most agents dump raw HTML into the context. A typical page is 50k to 500k tokens of noise. Blackreach uses a DOM walker instead.

Thought: I need the inscription table on this page
Action: navigate("https://sigla.phis.me/")
Observation: Page loaded. Nav: [About, Database, Signs].
  Main: table, 847 rows, columns [ID, Site, Text, Image].
  Interactive: pagination controls, export button.
Thought: extract all rows and handle pagination
Action: extract_table(selector=".inscription-table", paginate=True)
...

The DOM walker extracts semantic structure instead of raw markup. Visible text, interactive elements, navigation landmarks, ARIA roles. A 200k token page becomes a 2k token observation the model can actually reason about.

Stealth Playwright

Standard Playwright gets caught immediately. The tells are well documented: navigator.webdriver = true, missing browser extensions, CDP artifacts, headless viewport signatures, unnatural input timing.

Blackreach patches these before any page loads. Human-like timing on mouse moves and keypresses. Viewport sizes pulled from real device distributions. User-agent rotation with matching headers. JS injection to clear the automation fingerprint.

Not undetectable. Nothing is. But it passes Cloudflare basic bot detection and most IP-based rate limiters on the sites I needed. That's good enough.

Architecture

Task input
    |
    v
[ ReAct Loop                              ]
[   Think (LLM) <---> DOM Walker         ]
[         |                              ]
[         v                              ]
[   Act ------> Stealth Playwright       ]
    |
    v
Output + task log

What the test suite is for

Autonomous agents fail silently. The agent says it succeeded, the file is there, you don't find out until hours later when you open it and it's a 403 error page saved as HTML.

Every one of the 2,904 tests came from a real failure. Rate limits returning 200 OK. JavaScript rendering table content after a two second delay. Session tokens expiring mid-task. CAPTCHAs showing up on page 3 but not pages 1 or 2. Login walls that only trigger from non-residential IPs.

When it runs at 3am collecting data, I need to know it fails loud. Not silent. That's what the test suite is for.

Key capabilities

DOM Semantic Walker
Reduces 200k token pages to 2k structured observations. The model sees what matters.
Stealth Playwright
Patches automation fingerprints before page load. Passes basic bot detection on most sites.
ReAct Loop Engine
Think, act, observe, repeat. Handles multi-step tasks with branching logic and error recovery.
Loud Failure Mode
2,904 test assertions covering real failure modes. Silent success is not acceptable.

Open source at github.com/Null-Phnix/Blackreach. Production-grade for research tasks. Issues and PRs welcome.