An international team of researchers evaluated more than 3,000 real applications in cooperation with one of the “world’s leading recruitment platforms”. The results show that human recruiters show several cognitive biases, while these patterns do not occur or occur less in AI. Overall, compared to experienced recruiters, the AI was able to better assess in the long term who is likely to be successful for the advertised positions.
The study was designed to evaluate a total of 3,296 real applications in the US for three tech job categories: programmer, web designer and content creator. Specifically, it was about asynchronous interviews, which were then evaluated either by an AI system or a professional recruiter. Twelve months after the evaluation, the study organizers then analyzed the applicants’ Linkedin data. It looked at which jobs the former candidates are now in and whether they have already been promoted to higher positions.
AI scores better predict future career success
According to the quartet of authors led by Andreas Leibbrandt, the results of the study clearly show that in these cases the AI provided better forecasts than humans.
The AI score is significantly more meaningful, particularly when assessing whether applicants would be employed in senior positions: for each standard deviation higher in the AI score, the probability of a senior position increases by 6.1 percentage points. The HR managers, on the other hand, made no significant predictions at -1.2 percentage points. But the AI also performed at least twice as well as its human counterpart when it came to questions about whether the candidates had managed to change positions over the course of the twelve months or were now employed after previously being unemployed.
If you look for possible reasons for this finding, the researchers point out, among other things, that human evaluations are influenced by various factors. This includes, on the one hand, the so-called time-of-day effect, which suggests that the predictive power of recruiters fluctuates over the course of the day, and, on the other hand, an anchoring effect. Accordingly, recruiters fixate too much on the answer to the first interview question, whereas they hardly take the information from the subsequent questions into account. In addition, many recruiters apparently preferred to stay in the middle range when assigning the score. However, it is difficult to determine actual winners or losers from this.
Algorithm rates underrepresented groups higher
According to the information, these three effects do not affect the AI. But stereotypical preference or disadvantage through an algorithm also happened much less often, at least in this study. The data collected shows that the AI awarded higher scores to women and underrepresented minorities compared to recruiting specialists. HR professionals, on the other hand, tended to assign higher scores to men, whites and Asians.
The AI’s best list subsequently contained 10 percent more women and 7 percent more underrepresented minorities, which in this case primarily meant US citizens of African or Latino-Hispanic origin.
German experts also see potential here
Dirk Sliwka, head of the seminar for general business administration and human resources management at the University of Cologne, has looked at how AI can be integrated into performance evaluations.
He conducted a study himself, the results of which he explained to the human resources industry in an interview. Although he still sees weaknesses in performance assessment using Large Language Models (LLM) because the tasks involved are often too complex, he is very positive about the US study mentioned above: “I believe that well-trained algorithms can quickly be better than humans, especially when it comes to assessing potential for open positions,” he says.
A possible reason, according to Sliwka, is that LLMs are “much better at processing information about nuances than people who have cognitive limitations and are subject to stereotypes.” The US study shares this finding with that of Sliwka. The results in this country also showed that the LLM were comparatively better at predicting real performance than human assessors.
Sliwka sees the new study in particular as evidence of the weak points and limitations of human raters – for example, when they (unconsciously) discriminate against minorities or rate people better in the morning than in the afternoon. This does not happen to the algorithm used by the researchers.
“We humans are not as good at assessing the potential of other people as we believe,” says Sliwka from the two studies. People often have a lot of stereotypes in their heads without being aware of them and overlook some details. Algorithms that are based on good training data could very quickly be much better. This also applies to application processes in Germany.
Experts Cathrin Christ, Director of Technology and Transformation at Deloitte Consulting, and Tim Verhoeven, Senior Manager Talent Intelligence at Indeed, both report that many recruiting decisions today are still based on gut feeling, unstructured interviews and personal impressions. According to Verhoeven, the use of AI with standardized data can lead to professionalization in recruiting. Because this does not rely on intuition, but rather on statistical pattern recognition. “The future belongs to HR that uses AI precisely and efficiently – where it is better than humans and can relieve them,” is the expert’s opinion.
AI as structured preparation for decisions
However, according to Sliwka, the prerequisite for a promising use of AI is an algorithm that works with good training data. “With good training data, there is a lot to suggest that at some point it will even be an ethical requirement that we should at least use algorithms when pre-selecting applicants,” says the expert. In this way, disadvantages can be reduced in the future and better predicted who will actually do a good job later on.
Christ also sees the possibility of reducing subjective influences here. Since AI evaluates based on defined criteria, you can make assessments more structured and comparable. This provides a basis on which human decisions can be made.
According to the expert, the focus so far has been primarily on comparing CVs with the job requirements. In the future, however, it will be more important to include interactive formats such as interviews in the evaluation by AI, as significantly more signals about the actual suitability of applicants will be visible here.
According to Christ, larger organizations have already increasingly defined clear criteria that structure assessments and make them comprehensible. These could be prepared for an AI so that it is capable of analysis. However, it should primarily serve as a preparatory instance so that the final decision remains with people.
Asynchronous interviews halve the applicant pool
The study design in the US analysis used asynchronous interview formats. Accordingly, applicants had to upload video or audio recordings of themselves answering the interview questions. However, the study results show that of 2,535 candidates, only 40 percent actually completed their applications.
In the control group, however, 85 percent of the candidates stayed in the process. Here, 667 applicants were informed by email that they had passed the first round of applications. There was a link in the email that they could use to indicate whether they wanted to continue taking part in the application process.
As a possible reason, the researchers cite the fact that applicants expect high numbers of applicants and competition in asynchronous formats. They also fear less fair evaluations here. Verhoeven sees it similarly and points out that many applicants view AI critically in the selection process or perceive it as impersonal. AI solutions that are located earlier in the process and are used as orientation aids, for example, are more interesting for applicants.
When using such formats, it should be noted that although the quality of recruiting work can be improved, the existing pool of applicants is potentially smaller.
Info
Die Studie „A Brave New World of Hiring: A Natural Field Experiment on How Asynchronous Interviews and AI Assessment Reshape Recruitment“ wurde von
- Andreas Leibbrandt (Monash University)
- Joseph Vecci (University of Gothenburg)
- Mallory Avery (Monash University) und
- Edwin Ip (University of Exeter)
responsible.
The paper is available on the Internet at: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6180838

Tonia Schöler is a volunteer at Human Resources.


