Research

Bifrost

Research project exploring whether small, purpose-trained models can match large general models on structured prediction tasks. Analyzing historical texts to understand how political systems change over time.

234K+

corpus files

100+

curated gold examples

civilization regions

PyTorch

research stack

What it is

Bifrost is a research project asking a specific question: can a small, purpose-trained model match a large general model on structured prediction? The domain is analyzing historical texts to study how political systems transition between states. What makes a regime stable? What happens after a coup? How do institutions respond to crisis?

The core bet is that a model trained specifically for this task, on carefully curated data, can compete with models orders of magnitude larger. The data comes from primary historical sources across multiple civilization regions.

Why this matters

Most AI research assumes bigger is better. More parameters, more data, more compute. Bifrost explores the other direction. What if the quality of the data and the specificity of the training objective matter more than scale?

If a small model can match a large one on a well-defined structured prediction task, that changes the economics of AI deployment. Not every problem needs a frontier model. Some problems need the right model.

How it works

Curated data

Gold-labeled examples extracted from primary historical texts. Every label backed by exact source quotes. No synthetic data. No heuristics.

Structured prediction

The model predicts how multiple political dimensions change simultaneously. Not classification. Not generation. Structured state transitions.

Rigorous evaluation

Frozen baselines. Leakage prevention. Audit packs. The same evaluation protocol that powers Rigr was born here.

Multiple civilizations

Near Eastern, Japanese, Indian, and expanding. Training on diverse political systems teaches the model patterns that generalize across cultures.

Current status

The research model is currently within striking distance of the large-model comparator on validation. The primary bottleneck is data volume, not model architecture. Every expansion of the training set improves performance. Active development continues on broadening civilization coverage and hardening the evaluation protocol.

What it is not

Not a chatbot. Not a general-purpose language model. Not a history trivia database. It is a focused research instrument for studying how systems change under pressure. The corpus and tools are private, but the methodology is visible through the evaluation framework.

Related: Rigr evaluation framework