Verdict logoVerdict

Command Menu

Search markets and navigate

How It Works

Scoring Methodology

How Verdict evaluates prediction market resolution quality across six weighted dimensions. Deterministic, documented, and applied equally to every market.

Overview

The score is built from six weighted criteria. Each criterion captures a distinct dimension of resolution quality that affects a participant's ability to predict how a dispute would be resolved.

The model is deterministic: it applies the same rules to every market without external data lookup. It operates on the question text, description, resolution source field, outcomes list, and end date.

Markets with low scores are not necessarily fraudulent. A low score indicates that the written resolution criteria are underspecified relative to the complexity of the event being measured.

Note

This is a heuristic v1 model intended to be improved over time with human review, community feedback, and AI-assisted analysis. Scores are not legal or financial advice.

Scoring Criteria

01

Time Clarity

Max 20 pts

Evaluates whether the market specifies when resolution occurs and in which timezone. Markets that use time-sensitive language without timezone references are penalized.

  • +6 — End date is specified
  • +5 — Question or description references temporal terms (before, by, after, between, until, or an explicit date)
  • +5 — Description includes a timezone (ET, UTC, GMT, EST, EDT)
  • +4 — Description distinguishes between event time and announcement / report / disclosure time
  • −5 — Time-sensitive language is present but no timezone is mentioned

02

Resolution Source

Max 20 pts

Evaluates whether the resolution source is named, authoritative, and hierarchically structured. Vague sourcing language ("credible reporting", "consensus") is penalized.

  • +10 — A resolution source URL or named authority is provided
  • +5 — Description references official or institutional sources (government, SEC, FIFA, Federal Reserve, on-chain, etc.)
  • +5 — Description defines a source hierarchy for conflict resolution
  • −5 — Resolution relies on undefined "credible reporting", "reliable sources", or "substantial evidence"

03

Outcome Definition

Max 20 pts

Evaluates whether the question is concise and unambiguous, whether outcomes are binary YES/NO, and whether the description defines what constitutes a YES resolution.

  • +8 — Question is under 180 characters
  • +6 — Outcomes are binary YES/NO
  • +6 — Description explicitly defines what must happen for YES resolution
  • −5 — Question or description uses ambiguous qualifiers (significant, major, reportedly, effectively, confirmed)

04

Evidence Standard

Max 15 pts

Evaluates whether the market defines what types of evidence are acceptable, what is excluded, and whether specific evidence formats are named.

  • +5 — Description specifies what evidence counts toward resolution
  • +5 — Description specifies what evidence does not count
  • +5 — Named evidence types are referenced (filings, on-chain data, official statements, published reports)
  • −5 — Resolution depends on undefined "confirmation" or "credible sources"

05

Edge Case Handling

Max 15 pts

Evaluates whether the market addresses common resolution edge cases: delays, revisions, cancellations, postponements, and events reported after the deadline.

  • +5 — Description addresses delays
  • +5 — Description addresses revisions, corrections, cancellations, postponements, or disputes
  • +5 — Description explains what happens if events are reported after the deadline
  • −5 — Market is time-sensitive but does not address late reporting or delayed disclosure

06

Post-Trade Risk

Max 10 pts

Starts at full score and is reduced based on structural weaknesses: short descriptions, missing resolution sources, and timing-ambiguous confirmation language.

  • Starts at 10
  • −4 — Description is under 250 characters
  • −3 — No resolution source is specified
  • −3 — Resolution relies on confirmed/announced/reported without a defined timing constraint
  • Minimum: 0

Risk Levels

Low

80–100 pts

Well-specified resolution criteria with low post-trade dispute risk.

Medium

60–79 pts

Some ambiguity present. Human review recommended before large positions.

High

40–59 pts

Significant rule clarity concerns. Material dispute potential exists.

Critical

0–39 pts

Substantially underspecified. High dispute risk for all participants.

Version History

v1.2

Jun 2025

  • Raised all baseline scores — most markets now receive 40–70 (previously 20–50).
  • Decoupled timezone bonus from time-anchor requirement in scoreTimeClarity.
  • Fixed operator-precedence bug in scoreEvidenceStandard that caused near-universal penalty.
  • scoreEdgeCaseHandling no longer penalises markets simply for containing deadline language.

v1.1

May 2025

  • Added dimensionDetails field: per-dimension explanation strings exposed in UI and API.
  • MAX_EVENTS increased from 200 to 500 for broader market coverage.
  • Resolved markets separated into dedicated /markets/resolved endpoint.

v1.0

Apr 2025

  • Initial public release of the Verdict scoring model.
  • Six-dimension heuristic framework: timeClarity, resolutionSource, outcomeDefinition, evidenceStandard, edgeCaseHandling, postHocRisk.
  • Risk thresholds: Low (75+), Medium (55–74), High (38–54), Critical (<38).

Known Limitations & Confidence

Verdict is a transparent heuristic, not a precise science. We believe stating its limits openly makes the score more useful, not less. Here is exactly where the model can be wrong.

Treat each score as accurate to roughly ±8 points. A market scoring 62 and one scoring 68 are effectively equivalent. The risk bands (Low / Medium / High / Critical) are more meaningful than the exact number.

False positives (scores too high)

  • Verbose padding: a description stuffed with boilerplate legal language scores well on length-based signals without actually being clearer.
  • Keyword gaming: naming an "official source" earns points even if that source does not actually adjudicate the specific question.
  • Template reuse: well-structured templates inherit a high baseline even when the specific event is genuinely ambiguous.

False negatives (scores too low)

  • Self-evident questions: "Will BTC be above $100k on Jan 1?" needs little prose, yet is penalised for a short description.
  • External rulebooks: markets that link to a comprehensive off-site rulebook are under-credited because the text itself looks thin.
  • Non-English or symbol-heavy phrasing can suppress keyword matches that would otherwise add points.

The model does not evaluate the quality of the underlying event, the financial integrity of the platform, the on-chain dispute history, or the track record of similar markets. It evaluates only the written clarity of the resolution criteria as published by the market creator.

Because the engine is rule-based, every score is fully reproducible and auditable — the per-market scoring trace on each market page shows exactly which terms triggered each adjustment. There is no black box.

Verdict is operated independently and has no affiliation with Polymarket or any other prediction market platform. Scores are not legal or financial advice.