Skip to main content
Full Methodology

How the Bureau scores tools and classifies news

This page covers the complete methodology in two parts. Jump to the section you need.

⚙️ Part 1 — StackScore Tools™⚡ Part 2 — StackScore News™
Part 1 of 2

StackScore Tools™

Every AI tool is scored across 4 intelligence layers by a team of specialist agents. No single agent decides. Rank applies a fixed formula and Pulse audits every run before a score goes live.

The Formula

Rank is the only agent that writes the final stackscore. The weights below are fixed — no agent can override them.

stackscore = ROUND(
  operational_score    × 0.40   // 40% — Can it improve real workflows?
  trust_score          × 0.25   // 25% — Can it be trusted operationally?
  market_score         × 0.20   // 20% — Does it matter in the ecosystem?
  infrastructure_score × 0.15   // 15% — Can it anchor a durable AI stack?
)
Operational40%

Does it work in real workflows?

Trust25%

Can you trust it with real data?

Market20%

Does it matter in the ecosystem?

Infrastructure15%

Can it anchor a durable stack?

Confidence Formula

Every score has a confidence value (0–1). Low confidence = fewer sources, high variance, or missing data. Rank cannot inflate confidence — it can only cap it down.

base = average(operational_conf, trust_conf, market_conf, infra_conf)

penalties:
  −0.10  if ANY dimension evidence_count < 3
  −0.15  if score spread (max_dim − min_dim) > 35
  −0.05  if ANY dimension confidence < 0.60
  −0.08  if total evidence_count < 10

bonuses:
  +0.05  if ALL dimension evidence_count >= 8
  +0.03  if ALL dimension confidence >= 0.75

floor: 0.40   ceiling: 0.97   (cannot exceed 0.90 unless evidence_count ≥ 12)
Dynamic Indicators

Rank sets these badges on each tool after every evaluation. They appear on tool pages.

rising_momentumScore ≥ 5 points higher than previous entry
reliability_decliningScore ≥ 5 points lower than previous entry
hype_riskTrust score is 20+ points below operational score
enterprise_breakoutTrust ≥ 85 AND Infrastructure ≥ 85
new_entryNo prior row in stackscore_history for this tool
verifiedConfidence ≥ 0.85 AND total evidence_count ≥ 10
evidence_gapTotal evidence_count < 6 across all dimensions
DIM 1Operational Intelligence
40%QUILL

Core question: Can this tool improve real workflows?

1.1Core Task Utility30% of dimension
85–100Core capability confirmed in 8+ independent reviews matching primary use case. No major caveats.
65–84Core capability present and working. Consistent but non-blocking caveats in reviews.
40–64Mixed signal. Capability exists but frequently falls short of product page claims.
0–39Core capability absent, broken, or failing in majority of reviews.
Sources: Product homepage, G2 reviews (min 10)
1.2Workflow Integration Depth25% of dimension
85–10010+ native integrations AND listed in Zapier or Make AND public API documented
65–845–9 native integrations OR API-accessible with clear documentation
40–642–4 integrations OR API in beta or minimally documented
0–39Standalone only. No integrations. No API.
Sources: Integrations page, Zapier/Make directory, G2 integrations tab
1.3Output Reliability25% of dimension
85–100Zero or near-zero reliability complaints across 10+ independent sources.
65–84Occasional issues documented but not dominant theme.
40–64Reliability issues in 20–40% of sources discussing output quality.
0–39Hallucination or serious inaccuracy failures are primary complaint.
Sources: G2, Reddit r/artificial, Hacker News, X (last 90 days)
1.4ROI Accessibility12% of dimension
85–100Meaningful free tier OR G2 value score ≥ 4.5/5
65–84Paid only, G2 value 4.0–4.4, value defended in majority of reviews
40–64Expensive relative to alternatives per reviews, G2 value 3.5–3.9
0–39No pricing transparency OR G2 value < 3.5 OR price cited as dealbreaker
1.5Learning Curve8% of dimension
85–100G2 ease ≥ 4.5 AND "up in minutes" confirmed in multiple reviews.
65–84G2 ease 4.0–4.4. Some setup complexity but manageable.
40–64Significant learning curve per reviews.
0–39G2 ease < 3.5 OR no onboarding docs OR complexity is a primary barrier.
AUTOMATIC PENALTIES
G2 overall rating < 3.0−15 pts
Fewer than 5 G2 reviews exist−10 pts
No documented feature list on product page−8 pts
No demo, tour, or product video findable−5 pts
Evidence minimum: ≥5 sources including ≥3 independent user reviews. If not met → confidence capped at 0.60.
DIM 2Trust Intelligence
25%FORGE

Core question: Can this tool be trusted operationally?Owners: Forge (2.1, 2.2, 2.4, 2.5) · Quill (2.3)

2.1Data Privacy Posture30% of dimension
85–100Explicit opt-out from AI training. GDPR + DPA available. No data selling.
65–84Privacy policy present and readable. GDPR mentioned. Ambiguity on training data.
40–64Policy exists but vague on training data. No opt-out mentioned.
0–39No privacy policy OR policy explicitly allows training with no opt-out.
Hard rule: If privacy policy cannot be fetched → Trust score cannot exceed 50.
2.2Security Certification25% of dimension
85–100SOC 2 Type II confirmed AND one additional cert (ISO 27001, HIPAA, FedRAMP)
65–84SOC 2 Type II confirmed OR SOC 2 Type I with detailed security page
40–64Security page with claims but no third-party certification confirmed
0–39No security documentation found
2.3Output Accuracy / Hallucination Rate25% of dimension
85–100No hallucination or accuracy complaints across 10+ reviews. Accuracy specifically praised.
65–84Occasional accuracy issues, not dominant theme.
40–64Accuracy issues in 20–40% of output-quality reviews.
0–39Hallucination failures are primary complaint OR benchmarks contradict company claims.
2.4Company / Operational Stability12% of dimension
85–100Series B+ from recognizable VCs in last 18 months OR profitable public company
65–84Series A OR tier-1 seed in last 24 months with active hiring
40–64Bootstrapped with revenue signals OR small raise >24 months ago
0–39Recent layoffs OR pivot signals OR funding runway appears exhausted
2.5Incident Transparency8% of dimension
85–100Public status page with 12+ months of history. Transparent postmortems.
65–84Status page exists with limited history. No known major incidents.
40–64No status page but no known incidents in search.
0–39Confirmed security breach in last 24 months with poor public response.
AUTOMATIC PENALTIES
Confirmed data breach in last 24 months−20 pts
User data used for training, no opt-out−15 pts
No privacy policy found−10 pts
Privacy policy last updated > 3 years ago−8 pts
No security page of any kind−8 pts
DIM 3Market Intelligence
20%SCOUT

Core question: Does this tool matter in the AI ecosystem right now?Owners: Scout (3.1, 3.2, 3.3) · Flash (3.4)

3.1Adoption Velocity35% of dimension
85–100G2 reviews growing >20% QoQ OR 500+ total reviews with active recent posting
65–84Steady growth 5–20% QoQ OR 100–499 G2 reviews with recent activity
40–64Flat or slow growth. 20–99 G2 reviews. Limited community presence.
0–39Declining review activity OR fewer than 20 G2 reviews total
3.2Funding and Investment Signal30% of dimension
85–100$10M+ from recognizable VCs (a16z, Sequoia, YC) in last 18 months OR profitable public company
65–84$1M–$10M raised OR tier-1 seed in last 24 months
40–64Bootstrapped with revenue signals OR small raise >24 months ago
0–39No funding data, no revenue signals, or last raise >36 months ago
3.3Ecosystem Integration Signals20% of dimension
85–100Listed in 2+ major platform marketplaces AND named recognizable enterprise customers
65–841 major marketplace listing OR 3+ recognizable partner logos
40–64Partners page exists, companies unrecognizable or small
0–39No marketplace, no partners, no enterprise customer signals
3.4Narrative Quality / Signal vs Hype15% of dimension
85–100Tier-1 tech press with analytical substance. Active technical blog. Social is informative.
65–84Consistent press coverage, some analytical pieces. Blog active.
40–64Primarily promotional. Press = company-issued only.
0–39Hype-dominant with unverifiable superlatives. No independent coverage.
FLASH HYPE TRIGGERS
Unverifiable superlatives · Benchmark claims without methodology links · Multiple 5-star G2 reviews posted same day · Press that reads as paid placement
→ hype_score > 70 = automatic −10 pts on market score
AUTOMATIC PENALTIES
Last funding round >36 months ago, no revenue signals−15 pts
Flash hype_score > 70−10 pts
G2 review count declined QoQ−8 pts
Company blog not updated in >6 months−5 pts
DIM 4Infrastructure Intelligence
15%FORGE

Core question: Can this tool become part of a durable AI operating stack?

4.1API Maturity30% of dimension
85–100Versioned API (v1+), complete auth docs, rate limits with numbers, OpenAPI spec downloadable
65–84API versioned and documented with good auth docs, some gaps in reference
40–64API exists but unversioned, rate limits absent, or auth docs incomplete
0–39No public API OR API exists with no documentation
4.2Development Activity25% of dimension
85–100GitHub commits in last 30 days AND changelog updated in last 60 days
65–84Commits in last 90 days. Changelog updated in last 90 days.
40–64Last activity 91–180 days ago.
0–390 commits or changelog in 180+ days OR no public repository
4.3SDK and Developer Experience20% of dimension
85–100Official SDKs for Python AND JavaScript. Code examples on every major docs page. Quickstart < 10 minutes.
65–84SDK for 1 language + REST API with clear code examples
40–64REST API only. Minimal code examples. No official SDK.
0–39No SDK. No code examples. Docs describe endpoints in prose only.
4.4Orchestration Readiness15% of dimension
85–100Webhooks fully documented WITH streaming/async API AND ≥1 AI framework integration (LangChain, LlamaIndex, or MCP)
65–84Webhooks documented OR streaming API supported.
40–64Polling-only. No webhooks. No streaming.
0–39No async support. No orchestration path viable.
4.5Platform Durability10% of dimension
85–10099.9%+ SLA documented. Status page with 12+ months clean history. Deprecation policy stated.
65–84SLA mentioned OR status page with 6+ months clean history.
40–64No SLA but no known outage history.
0–39Known significant outages in last 12 months OR breaking API changes with no advance notice.
AUTOMATIC PENALTIES
No public API exists at all−20 pts
GitHub shows 0 commits in last 180 days−15 pts
No changelog ever published−10 pts
Rate limits completely undocumented−8 pts
API docs not updated in 12+ months−8 pts
Agent Execution Sequence

9 agents run in order for every tool evaluation — Insta directs, 8 specialists execute. Each step writes to agent_runs.

1.scoutDiscovers tool, fetches metadata, seeds evidence_sources
2.forgeFetches API docs + GitHub, scores infrastructure + trust
3.quillFetches reviews + pricing, scores operational + trust accuracy
4.flashScores market narrative quality, runs hype detection
5.scoutScores market adoption + funding + ecosystem signals
6.rankReads all dimension scores, computes composite, sets confidence, writes history
7.pulseValidates run integrity, checks for anomalies, writes governance report
8.instaSynthesises bureau notes for tool page display (max 3, 1 sentence each)
Continue to Part 2 — StackScore News™ ↓
Part 2 of 2

StackScore News™

Every AI news article is classified by Flash before it appears on the site. Flash must fetch the full article, find corroborating sources, and assign a credibility label. Flash cannot score from a headline alone.

What Flash Must Do Before Scoring
01Fetch the full article text at the source URL — not the title or snippet alone
02Find 1–2 corroborating sources via web search to confirm the claim
03Check prior coverage — was this claim previously reported as speculation vs confirmed?
The 5 Narrative Labels

Flash assigns exactly one label to every article. The label appears as a badge on the news feed and on each article page.

VERIFIED
credibility_score ≥ 85 AND secondary source confirmed
Multiple independent primary sources corroborate the claim. High author and publication credibility.
LIKELY
credibility_score 65–84, limited secondary confirmation
Credible source with strong track record. Secondary confirmation exists but is limited.
SPECULATIVE
credibility_score 40–64, primary source unconfirmed
Claim is plausible but unverified. Primary source lacks independent corroboration.
PROMOTIONAL
Source is company-originated, no independent verification
Content originated from the company itself. No independent third-party verification found.
HYPE ALERT
credibility_score < 40 OR hype patterns detected
Signal-to-hype ratio is critically low. Unverifiable claims, coordinated amplification, or fabricated benchmarks detected. Do not amplify.
7 Dimensions Scored Per Article (each 0–100)

Flash scores every article across 7 dimensions. Credibility is the primary driver of the narrative label; Signal-to-Hype Ratio captures the ratio of operational evidence to promotional language. All 7 scores are stored and displayed on article pages.

Credibilitycredibility_score
Primary source quality, author track record, publication tier. This is the main driver of the narrative label.
Signal-to-Hype Ratiosignal_to_hype_ratio
Ratio of operational evidence to promotional language. High score = evidence-dominant, reproducible, sourced. Low score = unverifiable superlatives or coordinated amplification.
Enterprise Relevanceenterprise_relevance
How meaningful is this to enterprise or professional AI adoption? Not just developer curiosity.
Infrastructure Impactinfrastructure_impact
Does this change how AI systems are built or integrated? High score = architects need to know this.
Workflow Disruptionworkflow_disruption
Does this materially change how workflows are executed today? Practical impact, not theoretical.
Narrative Longevitynarrative_longevity
Will this matter in 3 months? Or just 3 days? Low score = ephemeral news cycle story.
Ecosystem Velocityecosystem_velocity
Does this accelerate or decelerate a category trend? High score = it changes the trajectory of a space.
Credibility Score → Label Mapping
85–100VERIFIEDAND secondary source confirmed
65–84LIKELYlimited secondary confirmation
40–64SPECULATIVEprimary source unconfirmed
< 40HYPE ALERTOR hype patterns detected
anyPROMOTIONALcompany-originated content, score overridden
↑ Back to Tools methodology← StackIndex™ overview
See StackIndex™ in action
Browse tools scored by the engine, or read classified news.
Browse Tools →AI News →