Episode 65

Understanding A.I. in candidate assessment | Nick Johnston

AI assessment tools promise speed and scale, but most vendors can’t prove validity or manage bias. Nick Johnston breaks down what actually works, what’s pseudoscience, and when to keep humans in the loop.
 

Episode Key Takeaways

Speech-to-text processing is the foundation of most video assessment AI, yet it fails on accents and non-native speakers at alarming rates. One pilot test across Eastern European, Middle Eastern, American, English, and South African accents produced glaring transcription errors—a critical flaw when the algorithm then pattern-matches on word frequency rather than context.
Facial recognition for personality profiling remains unproven and ethically fraught. Unless candidates are reacting to realistic scenarios (not just answering competency questions), assessing emotion as a proxy for personality doesn’t hold up. Even more troubling: most vendors haven’t built separate facial models for different ethnicities, despite lighting, face shape, and skin tone all affecting recognition accuracy.
Nick argues that vendors are not holding themselves to the same validation standards they demand of traditional psychometric tools. When challenged to run head-to-head correlation studies against established personality models, most refuse—a red flag that should disqualify them from consideration.
Front-end filtering (CVs, applications, basic qualifications) is where AI adds genuine value without sacrificing candidate experience. Later-stage assessments—especially for high-value roles—require human judgment and inference that current AI cannot reliably replicate.
Candidate experience cuts both ways: asynchronous video assessments let people apply on their schedule, but they systematically disadvantage candidates who perform better in live conversation. Offering an alternative path isn’t just legally required in Europe; it’s ethically sound and protects against hidden discrimination.

Frequently
Asked
Questions

How do AI vendors build their assessment models?
Most request 500+ candidate samples and use one of two approaches: either they monitor recruiter decisions in your system to infer patterns, or they run the AI in parallel with your team to measure correlation. The risk is obvious—if your team has bias, the model inherits it. The best vendors actively monitor for adverse impact and correct during implementation.
Data-driven means the algorithm makes the decision. Data-informed means the data informs a human decision. For AI assessment, the distinction matters: use AI to surface candidates and patterns, but keep humans accountable for final hiring calls, especially in later stages where inference and judgment are required.
Traditional online values assessments with established validity work fine. But speech-to-text analysis of competency answers, or facial emotion recognition during standard interviews, don’t reliably predict values fit. The proxy is too weak unless candidates are placed in realistic scenarios where their emotional response reveals something meaningful about their values.
It’s already happening. People are hiding keyword-stuffed text boxes in CVs and researching how to pass AI filters—just as they’ve gamed ATS systems for years. Vendors will need to evolve detection methods, but the arms race is inevitable. This is another reason to use AI for filtering, not final decisions.
Yes, if the assessment criteria are simple (availability, location, basic qualifications). For entry-level retail or hourly roles, a chatbot handling screening and offer generation may actually improve candidate experience over a lengthy manual process. But always offer an alternative for candidates who prefer human interaction.