Gender

Will Algorithms Remove Gender Bias in Hiring?

New research offers reason for optimism.

Posted October 10, 2022 | Reviewed by Tyler Woods

Key points

Gender bias in hiring and leadership remain an issue in today's workplace.
Most of the popular practices have been demonstrated as ineffective or lack support.
New studies show that well-constructed algorithms could reduce gender bias in hiring.

Despite some progress, gender discrimination in hiring remains a challenge. Women are judged more harshly than men, with a broad assumption of less competence. Only 15 percent of CEOs at Fortune 500 companies are female. Women receive lower ratings of their potential, and, as a result, are 14 percent less likely to be promoted—this bias ultimately explaining as much as half of the gender promotion gap.

Are popular practices effective?

Current gender balancing techniques are scientifically poorly supported:

Training claiming to limit unconscious biases is ineffective, and ironically contributes to activating stereotypes.
Lean-in approaches lack empirical support, mainly because, contrary to popular beliefs, men and women have an almost identical level of self-confidence—but it is perceived differently by others.
Paradoxically, in organizations that advocate meritocracy, decision-makers tend to favor men.
Quotas, while having a better reach, only heal the wound, and fuel the mistaken belief that women are less competent. They do not guarantee that the women promoted will be efficient.

While these actions are laudable, they are more a cover to protect against potential litigation, than an effective way of fighting discrimination.

How algorithms can reduce bias

Achieving gender parity, while also improving the quality of the people promoted, requires more standardized methods that go beyond our intuition and cognitive bias. For example, Danielle Li, Lindsey Raymond, and Peter Bergman, from MIT and Columbia, showed that some algorithms (supervised learning and Upper Confidence Bounds [UCB]) could increase the share of women selected, up to a balance of 50 percent, compared to 35 percent for hiring decisions made by humans.

Florian Pethig and Julia Kroenung, from Mannheim University, show that women prefer to be judged by a hiring algorithm, in particular, because of its perceived objectivity.

Yochanan Bigman, Desman Wilson, Mads Arnestad, Adam Waytz, and Kurt Gray of Yale, Northwestern, and North Carolina Universities explain that people are morally less offended by algorithm-driven discrimination.

These examples should not hide other widely publicized and criticized practices, where the use of algorithms has contributed to the exacerbation of discrimination. Instead, it must open the way to the development and usage of more ethical algorithms, where the beneficial effects prevail. While training an algorithm to predict which candidates will get the favor of a recruiter, and to imitate human intuition, will necessarily bring out and amplify the same biases we’re trying to fight, training it to predict real success, based on data that are gender-blind and truly predictive of leaders’ performance, will surely improve and de-gender hiring decisions.

Choosing the right data…

Pieces of information from the CV/resume most widely used in the current hiring algorithm do not meet this requirement and force the reproduction of biases. There is, indeed, a lot of gendered information in a CV. Even when removing the most gendered data (e.g., names, hobbies, and gendered words), simple models can differentiate genders with an 82 percent accuracy. And the data from a CV is of poor quality for predicting leaders’ efficiency.

On the contrary, data relating to psychological capital (personality, motives, reasoning) are better suited to the task. On the one hand, they are less impacted by gender, and men and women tend to display similar behaviors. Contrary to popular opinion, even though there are some differences (e.g., narcissism), the psychological and cognitive attributes between genders are, for the most part, similar.

Janet Shibley Hyde, of the University of Wisconsin, through a large and pioneering meta-analysis, proposed the gender similarities hypothesis by showing that gender differences are overinflated, finding that 78 percent of the differences were small or close to zero—in particular concerning psychological factors.

Ethan Zell, Zlatan Krizan, and Sabrina Teeter reinforced these conclusions by highlighting an overall overlap of 84 percent in the distribution of personality scores between men and women, and weak or very weak differences in 85 percent of cases. A better understanding gender differences in personality, therefore, requires seeing the forest and the trees: larger and smaller gender differences also reflect different ways of organizing the same data.

Gender Essential Reads

Insights From Research of Teens Unhappy With Their Gender

Gender Gaps in Employment Come From Masculine Workplaces

Similarly, the differences in leadership potential are very small and tend to slightly benefit women, due to emotional intelligence. On the other hand, psychological data are better predictors: indeed, 50 percent of the performance and emergence of a leader is explained by personality—in particular, intellectual openness, emotional stability, and agreeableness.

…to reach natural gender parity

Training an algorithm based on psychological data, and teaching it to identify relevant and non-gendered cues of leadership potential (e.g., curiosity, intellectual humility, empathy), will solve two of the main modern HR issues: (1) advancing the quality and efficiency of our leaders, and, (2) achieving natural gender balance in leadership positions. Our studies highlight the capacity of personality-based algorithms to recommend men and women in similar proportions for different roles (mean weighted impact ratio = 0.99 and mean Cohen’s d = 0.04).

Moreover, even by training this algorithm on a male-dominated sample, as many could be forced to do due to the current disparities in leadership positions, the algorithm recommendations will still be fair when applied to a neutral sample. Indeed, regardless of the psychological variables identified as predictors in the male sample, these will be found in almost similar proportions in women.

In the end, the question is not whether these algorithms come close to a perfect and fair prediction: it’s about knowing whether they make it possible to do better than the current methods, and better than the human status quo. Today, if the use of algorithms raises legitimate and essential questions, they allow, when they are trained with the right data, the building of more efficient decision-making processes, which are also fairer for women.

If future scandals emerge and lead to debates—some vendors of algorithmic pre-employment assessments being too opaque about the fairness of their solution—people should be proactive in understanding algorithms and their added value in decision-making, rather than submitting to deceptive and lobbying sensationalism.