Methodology

How we measure 13 dimensions of human judgement — what each dimension means, how it is scored, and the theoretical basis behind it. This page exists to be defensible: if a sceptical chair, academic, or data scientist challenges the tool, the answer should be on this page.

Why scenario-based, not self-report

Traditional psychometrics rely on Likert-scale self-reports ("I value fairness: 1–5"). These are heavily contaminated by social desirability bias and the gap between how people think they behave and how they actually behave.

Agonora places you inside branching dilemmas and measures revealed choices, including under time pressure, framing variation, and social pressure. The unit of measurement is a decision, not a claim.

The 13 dimensions

Ethical Core

Fairness

Equal treatment regardless of in-group / out-group status.

What we measure:
Choice patterns across resource-allocation scenarios where only the affected group changes.
Scoring:
Weighted average of choice values across fairness-tagged dilemmas, normalised to 0–100.
Theoretical anchor:
Rawls, A Theory of Justice (veil of ignorance); Haidt's moral foundations.

Consistency

Same principles applied across contexts regardless of surface framing.

What we measure:
Delta between responses to structurally identical scenarios ('mirror pairs') with different surface cues.
Scoring:
1 − mean(|delta|) across paired items, normalised to 0–100.
Theoretical anchor:
Kahneman, Thinking Fast and Slow (framing effects); behavioural consistency literature.

Empathy

Genuine engagement with human impact behind abstract decisions.

What we measure:
Differential response to scenarios with named human narratives vs. abstract statistics.
Scoring:
Weighted average of empathy-tagged choice values.
Theoretical anchor:
Batson's empathy–altruism hypothesis; Bloom, Against Empathy (for guardrails).

Moral Courage

Willingness to act on principles at personal / reputational cost.

What we measure:
Choices that carry personal downside but uphold stated values, especially under social pressure.
Scoring:
Weighted average across courage-tagged items, with social-desirability discount.
Theoretical anchor:
Rushworth Kidder, Moral Courage; Kohlberg stage 5–6 reasoning.

Integrity

Alignment between stated values and behaviour under pressure.

What we measure:
Gap between self-reported values (pre-assessment) and revealed choices (in-assessment).
Scoring:
Weighted average of integrity-tagged choices minus stated-revealed value gap.
Theoretical anchor:
Self-concordance theory; cognitive dissonance literature.

Self-Awareness

Accuracy of self-perception about your own scores.

What we measure:
You predict your score on each dimension before seeing results; we compare.
Scoring:
1 − (mean(|predicted − actual|) / 100).
Theoretical anchor:
Dunning–Kruger calibration research.

Cognitive Sovereignty

Epistemic Humility

Calibrated confidence — knowing what you don't know.

What we measure:
You answer factual questions and state confidence (0–100%); we score calibration.
Scoring:
Brier score: 1 − mean((confidence/100 − correct)²).
Theoretical anchor:
Tetlock, Superforecasting; Brier 1950.

Paradox Tolerance

Holding contradictory truths without forcing false resolution.

What we measure:
Open-text responses to dilemmas with no clean answer; Claude classifies resolution strategy.
Scoring:
AI classification mapped to {0.2, 0.6, 0.8, 1.0}.
Theoretical anchor:
Cameron & Quinn, Competing Values Framework; Rothenberg on janusian thinking.

Embodied Intuition

Accuracy of gut-level pattern recognition under time pressure.

What we measure:
Snap-decision accuracy on timed scenarios vs. deliberated accuracy on untimed variants.
Scoring:
0.6 × snap_accuracy + 0.4 × calibration_of_snap_confidence.
Theoretical anchor:
Gary Klein, Sources of Power; dual-process theory.

Meaning-Making

Constructing a coherent narrative from ambiguous inputs.

What we measure:
Open-response scoring on coherence, depth, originality, and integration.
Scoring:
0.3 × coherence + 0.25 × depth + 0.25 × originality + 0.2 × integration.
Theoretical anchor:
Bruner, Acts of Meaning; narrative identity theory.

Relational & Creative Intelligence

Relational Intelligence

Reading power dynamics, subtext, and trust structures.

What we measure:
Multiple-choice interpretation of ambiguous social scenes; trust allocations.
Scoring:
0.3 × subtext_accuracy + 0.3 × power_reading + 0.4 × trust_calibration.
Theoretical anchor:
Goleman, Social Intelligence; theory-of-mind research.

Creative Divergence

Breaking given frames to invent new options.

What we measure:
Scenarios where every listed choice is flawed; credit given for generating novel alternatives.
Scoring:
0.5 × frame_breaking + 0.5 × novelty_rating.
Theoretical anchor:
Guilford's divergent thinking; de Bono, Lateral Thinking.

Attentional Sovereignty

Sustaining deep focus and decision quality under distraction.

What we measure:
Embedded attention checks, response stability under noise, distractor resistance.
Scoring:
0.4 × checks_passed + 0.3 × stability + 0.3 × distractor_resistance.
Theoretical anchor:
Newport, Deep Work; attention-restoration theory.

How scores are produced

  • Weighted-choice scoring. Each choice in each scenario carries a scoring vector across dimensions with a weight. A scenario's contribution to a dimension is the choice value × the scenario weight on that dimension.
  • AI scoring of free-text responses. When a user writes their own answer instead of picking a listed option, Claude scores the response against a rubric specific to that scenario and dimension. Scores are logged for auditability; human review is triggered when scores cross outlier thresholds.
  • Consistency mirror pairs. Selected scenarios have a "mirror" elsewhere in the session with the same structure but different surface framing. The absolute delta between the two responses is the consistency measurement.
  • Population benchmarking. Percentile rankings are computed against the full platform population (opt-in). Benchmarks are refreshed weekly.

Limitations

  • Self-selection. People who choose to take the assessment are not a representative sample. Benchmarks should be read with this in mind.
  • Cultural calibration. Scenarios are written in a Western / UK board context. Cross-cultural validation is ongoing.
  • Snapshot in time. A profile reflects how you reason today, on this day, in this mood. Re-takes often differ by 10–15 points on individual dimensions; the cluster-level picture is more stable.
  • Not a hiring tool. Agonora is a self-reflection and development instrument. It is not validated for hiring, promotion, or formal assessment decisions and should not be used as such.

References

  • Brier, G. W. (1950). Verification of forecasts expressed in terms of probability.
  • Kahneman, D. (2011). Thinking, Fast and Slow.
  • Tetlock, P. & Gardner, D. (2015). Superforecasting.
  • Kidder, R. (2005). Moral Courage.
  • Cameron, K. & Quinn, R. (2011). Diagnosing and Changing Organizational Culture.
  • Klein, G. (1998). Sources of Power.
  • Guilford, J. P. (1967). The Nature of Human Intelligence.