What is Calibration?

Calibration is the practice of aligning interviewers on what each score level means and what the bar for hire looks like, so different interviewers reach consistent assessments of the same evidence.

By Lee Flanagan

27th Apr. 2026  |  Last Updated: 27th Apr. 2026

Extended definition

Without calibration, structured interviewing fails quietly. Two interviewers can use the same scorecard, the same rubric, and the same questions, and still score the same candidate two points apart — because their internal interpretation of the rubric differs.

Calibration is the deliberate, ongoing work of closing that gap. It happens through shared training, joint scoring exercises, post-debrief discussion of disagreements, and analytics that surface drift across interviewers.

Calibration isn’t a one-time event; it’s a continuous discipline. Teams that calibrate well make consistent hire decisions; teams that don’t make hire decisions that depend more on which interviewers happened to be in the loop than on the candidate.

How calibration works

Calibration operates at three levels:

  • Initial calibration through training — New interviewers go through training that includes joint scoring exercises — watching the same recorded interview, scoring independently, then comparing. The exercise reveals where rubric interpretation differs and gives the team a baseline.
  • Ongoing calibration through debrief — Every debrief is a calibration moment. When two panellists score a competency differently, the discussion reveals whether the difference is about evidence, rubric interpretation, or bar. Closing those gaps in real time keeps the panel calibrated.
  • Analytical calibration through interview intelligence — Modern platforms track each interviewer’s score distribution over time — average scores per competency, hire rate of their scorecards, agreement with peers. The analytics surface drift before it shows up in hire quality. An interviewer who consistently scores half a point above the panel needs targeted recalibration.

Calibration is bidirectional — the rubric anchors interviewers, and interviewer experience refines the rubric. Score levels that consistently get interpreted differently across the team usually indicate a rubric gap, not just an interviewer gap. Mature teams iterate the rubric and the calibration together.

The hardest calibration problem is bar drift over time. Hiring managers under pressure to fill roles unconsciously lower the bar; interviewers who haven’t seen exceptional candidates in a while reset their reference point. Periodic recalibration sessions — re-scoring a known strong hire’s interview, comparing notes against original scores — surface this drift before it compounds.

Why calibration matters

Calibration is the difference between a structured interview process that produces consistent decisions and one that produces theatre. Without calibration, scorecards get filled in but mean different things to different interviewers, debriefs become arguments about interpretation rather than evidence, and hire quality varies more by panel composition than by candidate. For VPs of TA, calibration is the most consequential discipline to invest in after the structure itself — getting four interviewers to actually mean the same thing when they write a “4” on the scorecard is what makes structured interviewing work in practice.

Common mistakes and misconceptions about calibration

  • Treating calibration as one-time training — Initial training matters but isn’t enough. Calibration is continuous — every debrief is a calibration moment, and quarterly recalibration sessions catch drift.
  • Ignoring score distribution data — If one interviewer averages 3.8 and another averages 2.9 on the same role, calibration is broken. Interview intelligence platforms surface this; without the data, the drift goes unnoticed.
  • Calibrating only on hire/no-hire — Hire/no-hire alignment is downstream of competency-level calibration. Two interviewers can agree on hire/no-hire while scoring competencies very differently, which produces inconsistency on the next candidate.
  • Letting tenure substitute for calibration — Senior interviewers often score idiosyncratically because they’ve internalised their own bar. Tenure doesn’t equal calibration; recalibration applies to senior interviewers as much as new ones.
  • Skipping calibration after process changes — New rubrics, new competencies, or new interview formats reset calibration. Teams that change the process without recalibrating produce months of inconsistent hiring.

Frequently asked questions

What is calibration in interviewing?

Calibration is the practice of aligning interviewers on what each score level means and what the bar for hire looks like, so different interviewers reach consistent assessments of the same evidence. Two interviewers can use the same scorecard, the same rubric, and the same questions, and still score the same candidate two points apart — because their internal interpretation of the rubric differs.

What does calibration mean in interviewing?

Calibration is the alignment of interviewers on what scores mean and where the bar for hire sits. Without it, two interviewers using the same rubric can reach different scores for the same candidate. Calibration is achieved through training, joint scoring exercises, debrief discussion, and ongoing score-distribution analytics.

How do you calibrate interviewers?

Through joint scoring exercises (watching the same interview and scoring independently, then comparing), through debrief discussion of disagreements, and through analytical review of each interviewer's score patterns over time. Calibration is continuous, not a one-time training event.

How do you know if your interviewers are well calibrated?

Score distributions across interviewers should be similar for similar candidate populations. Major outliers — one interviewer averaging much higher or lower than peers — indicate calibration drift. Interview intelligence platforms surface this data; without analytics, calibration drift goes invisible until it shows up as inconsistent hire quality.

What causes calibration to drift?

Time without joint scoring exercises, hiring pressure that lowers the bar implicitly, new interviewers joining without proper onboarding, and process changes (new rubrics, new competencies) that reset the baseline. Periodic recalibration sessions catch drift before it affects hire quality at scale.