Human resources management: How did the initial question for this study come about?
Prof. Dr. Dirk Sliwka: My co-author Rainer Rilke observed that when ChatGPT is supposed to assess applications, for example, it surprisingly often gives very positive evaluations and little differentiation. Since I myself am researching subjective performance assessments, the idea came up to examine this more deeply together.

How exactly did you proceed?
For example, we submitted the CEOs of the top 500 publicly traded companies in the US, the S&P 500, to the Large Language Model to rate their performance on a standardized scale of one to five. Companies often use the same scale for performance reviews. It almost never gave the bottom two values. So it shows a typical pattern that we often observe with human raters – there is an “reluctance” to give bad ratings. I thought the LLM would do a better job of that.

Share.
Leave A Reply

Exit mobile version