How the Bureau scores tools and classifies news
This page covers the complete methodology in two parts. Jump to the section you need.
StackScore Tools™
Every AI tool is scored across 4 intelligence layers by a team of specialist agents. No single agent decides. Rank applies a fixed formula and Pulse audits every run before a score goes live.
Rank is the only agent that writes the final stackscore. The weights below are fixed — no agent can override them.
stackscore = ROUND( operational_score × 0.40 // 40% — Can it improve real workflows? trust_score × 0.25 // 25% — Can it be trusted operationally? market_score × 0.20 // 20% — Does it matter in the ecosystem? infrastructure_score × 0.15 // 15% — Can it anchor a durable AI stack? )
Does it work in real workflows?
Can you trust it with real data?
Does it matter in the ecosystem?
Can it anchor a durable stack?
The four dimensions above form the base StackScore — a stable rating that only changes when Rank re-evaluates a tool. On top of that, Sage applies a small, bounded momentum layer so the published score reflects what just happened in the news.
published_stackscore = CLAMP(base_stackscore + momentum, 0, 100)
momentum = CLAMP( ROUND(prev_momentum × 0.6 + impulse), −5, +5 )
impulse = Σ signed_impact(story → tool) // over the last ~26h of news
signed_impact ∈ [−2 … +2] // −2 clearly bad … +2 major winEach AI news story is classified for its direction on every tool it genuinely affects — a lawsuit, outage or pricing backlash pushes a tool down; a major launch, funding round or marquee customer pushes it up. Momentum is capped at ±5 points and decays ~40% per day, so a single story fades within a few days and can never override the underlying rating. On a quiet news day, momentum simply drifts back toward zero — movement is only ever shown when it is real. This is what powers the homepage Biggest Movers / Under Pressure board and the live ticker.
Every score has a confidence value (0–1). Low confidence = fewer sources, high variance, or missing data. Rank cannot inflate confidence — it can only cap it down.
base = average(operational_conf, trust_conf, market_conf, infra_conf) penalties: −0.10 if ANY dimension evidence_count < 3 −0.15 if score spread (max_dim − min_dim) > 35 −0.05 if ANY dimension confidence < 0.60 −0.08 if total evidence_count < 10 bonuses: +0.05 if ALL dimension evidence_count >= 8 +0.03 if ALL dimension confidence >= 0.75 floor: 0.40 ceiling: 0.97 (cannot exceed 0.90 unless evidence_count ≥ 12)
Rank sets these badges on each tool after every evaluation. They appear on tool pages.
9 agents run in order for every tool evaluation — Insta directs, 8 specialists execute. Each step writes to agent_runs.
StackScore News™
Every AI news article is classified by Flash before it appears on the site. Flash must fetch the full article, find corroborating sources, and assign a credibility label. Flash cannot score from a headline alone.
Flash assigns exactly one label to every article. The label appears as a badge on the news feed and on each article page.
Flash scores every article across 7 dimensions. Credibility is the primary driver of the narrative label; Signal-to-Hype Ratio captures the ratio of operational evidence to promotional language. All 7 scores are stored and displayed on article pages.