Huginn · self-hosted web collection API

Why I built it

I needed a predictable collection service that I could run locally, inspect, and call from other tools without paying for every page. Huginn separates deterministic collection work from open-ended browser-agent reasoning.

What the project demonstrates

explicit jobs

Longer work has visible state and progress instead of disappearing behind a request timeout.

structured results

Pages can return useful text, metadata, links, and extraction results through a consistent interface.

recovery

Retries, caching, and durable state are designed around real interruption and failure.

local operation

The service can be deployed and inspected without depending on a metered hosted product.

How I verify it

Tests cover request contracts, job lifecycle, extraction behavior, persistence, error reporting, and the difference between a successful network request and a useful result. Live checks are kept separate from deterministic test fixtures.

The public boundary

The repository documents the public API and implementation. This page does not publish private deployment topology, browsing environments, network routes, source-specific workarounds, or operational weaknesses.

Huginn is the deterministic half of my browser work: collect the page, preserve the state, and return evidence another system can verify.

Open source as Huginn on GitHub.