Every autonomous web agent I tried had the same problem. It worked on the demo site and fell apart on anything real. Cloudflare caught it in seconds. JavaScript rendered after the initial load was invisible to it. Rate limit responses came back as 200 OK with an error page body and the agent reported success, saved garbage, moved on.
I needed something that could actually run research tasks overnight. Full academic database downloads, multi-step navigation, sites that actively resist automation. Nothing I found was built for that. So I built Blackreach.
The ReAct loop
At the core is a ReAct loop. Reasoning and acting, alternating. The agent gets a task, thinks through what action to take, executes it, observes the result, repeats. Simple in theory. The hard part is the observation space.
Thought: I need the inscription table on this page
Action: navigate("https://sigla.phis.me/")
Observation: Page loaded. Nav: [About, Database, Signs].
Main: table, 847 rows, columns [ID, Site, Text, Image].
Interactive: pagination controls, export button.
Thought: extract all rows and handle pagination
Action: extract_table(selector=".inscription-table", paginate=True)
...
Most agents dump raw HTML into the context. A typical page is 50k to 500k tokens of noise. The model gets lost. Blackreach uses a DOM walker that extracts semantic structure instead. Visible text, interactive elements, navigation landmarks, ARIA roles. A 200k token page becomes a 2k token observation the LLM can actually reason about.
Stealth Playwright
Standard Playwright gets caught immediately. The tells are well documented.
navigator.webdriver = true, missing browser extensions, CDP artifacts,
headless viewport signatures, unnatural input timing.
Blackreach patches these before any page loads. Human-like timing on mouse moves and keypresses. Viewport sizes pulled from real device distributions. User-agent rotation with matching headers. JS injection to clear the automation fingerprint.
Why 2,904 tests
Autonomous agents fail silently. The agent says it succeeded, the file is there, you don't find out until hours later when you open it and it's a 403 error page saved as HTML.
Every test came from a real failure. Rate limits returning 200 OK. JavaScript rendering table content after a 2 second delay. Session tokens expiring mid-task. CAPTCHAs showing up on page 3 but not pages 1 or 2. Login walls that only trigger from non-residential IPs.
2,904 tests means 2,904 things the world tried and I caught. When it runs at 3am collecting data I need to know it fails loud, not silent. That's what the test suite is for.
Architecture
Task input
|
v
[ ReAct Loop ]
[ Think (LLM) <---> DOM Walker ]
[ | ]
[ v ]
[ Act ------> Stealth Playwright ]
|
v
Output + task log
Open source at github.com/Null-Phnix/Blackreach. v5.0.0-beta.1. Production-grade for research tasks. Issues and PRs welcome.