Discrepancies in AI Resume Evaluations: Claude vs. GPT and Fairness Issues

A study reveals inconsistencies in AI evaluations of resumes, raising concerns for companies adopting AI in hiring.

By Byte-Pulse Newsroom·AI-augmented editorial system·Jun 04, 2026·8 min read

Edited bySerhat Er·Founder & Editor-in-Chief

Updated Jun 13, 2026

Reported fromHeise ↗

Discrepancies in AI Resume Evaluations: Claude vs. GPT and Fairness Issues

Byte-Pulse original cover. Source story: Heise.

AI Chatbots in Recruitment: Claude vs. GPT and Fairness Concerns

Alarming Discrepancies in AI Evaluations

The role of artificial intelligence in recruitment is under the microscope like never before. A recent analysis from i10x.ai delved into how AI chatbots evaluate resumes, shedding light on disparities that could significantly impact hiring practices. In this study, fictitious candidate profiles across diverse industries were scrutinized using four advanced AI models: GPT-5.4, Claude Sonnet 4.6, Gemini 3 Pro, and xAI Grok 4.3. The results were surprising and concerning, revealing significant inconsistencies in how these models rated the same resumes. Such discrepancies raise serious questions about AI’s reliability in hiring decisions. For companies, understanding these risks is crucial, especially when considering reliance on a single AI model.

The inconsistency becomes particularly alarming when considering the high stakes involved in recruitment. According to a report by Deloitte, a wrong hire can cost a company up to five times the candidate's annual salary. Given the financial implications, the accuracy and fairness of AI evaluations are not just academic concerns but practical necessities for business.

Claude's Claims of Superiority

Among the AI tools evaluated, Claude emerged as the strictest evaluator. It recommended merely 42% of the resumes generated by GPT, while its own resumes saw an 84% recommendation rate. This might suggest that Claude has a perceived edge in strictness, but it also highlights potential biases in AI assessments. On the other hand, the Gemini model recommended 90% of Claude's resumes, indicating a more lenient evaluation approach.

These differences suggest that candidates might be evaluated based on arbitrary factors rather than their actual merit. This has significant implications for fairness and equity in hiring. If AI models can be swayed by factors unrelated to a candidate's qualifications, businesses might inadvertently bypass the best candidates, affecting their competitive edge in the market.

The Star of the Show: Gemini

Gemini's performance is a compelling case for why companies should adopt a broader perspective when choosing AI tools for recruitment. With a 97% recommendation rate for its own resumes, Gemini stands out. In comparison, GPT lagged at 82%. These gaps highlight the importance of ensuring that hiring processes remain fair and effective.

For businesses, the takeaway is clear: a multi-model approach in AI recruitment might enhance the evaluation process. By incorporating multiple AI perspectives, companies can mitigate biases and improve the accuracy of candidate assessments. This approach can lead to a more diverse and inclusive workforce, ultimately benefiting the organization as a whole.

Compared to: The Predecessor or Competitor

To better understand these results, it's helpful to compare these AI tools with their predecessors. GPT-5.4 is an evolution of the GPT-3.5 model, which was known for its robust language capabilities but also criticized for its lack of contextual understanding in nuanced tasks like resume evaluation. Claude Sonnet 4.6, on the other hand, represents an iteration over its earlier version that focused on better contextual comprehension and reduced biases in decision-making processes.

In terms of cost, GPT-5.4 and Claude Sonnet 4.6 are priced similarly, with enterprise solutions ranging from €2,000 to €5,000 per month depending on usage levels. Gemini 3 Pro, often marketed as a premium option, commands a slightly higher price, reflecting its high recommendation rates and perceived accuracy.

Bias in AI: A Call for Rigorous Testing

The i10x analysis underscores the critical need for companies to rigorously test their AI models for bias. The study's conclusion was clear: “We did not test whether AI evaluates fairly. We tested whether AI evaluates consistently. The answer is: no.” Such inconsistency is alarming, especially when considering potential biases against specific resume styles or qualifications.

One concrete example of bias could be how AI models interpret different educational backgrounds or work experiences. Inconsistent evaluations might disadvantage candidates from non-traditional educational paths or those with career breaks. Organizations should rigorously test their AI tools with identical synthetic resumes to uncover any systematic preferences and address them accordingly.

A Multi-Model Approach is Essential

The findings strongly advocate for a multi-model strategy, involving panels of models to provide averaged evaluations. This could enhance accuracy and reduce the risks of bias. Moreover, firms should be transparent with applicants about which models are used in screening. This transparency is not just a regulatory requirement under EU AI regulations for high-risk applications, but also a means to build trust with candidates.

Transparency helps candidates understand the evaluation process and aligns with the EU's emphasis on ethical AI. It's about ensuring that AI systems do not perpetuate discrimination and operate with a clear, understandable rationale. Businesses that embrace this transparency can enhance their reputation and attract top-tier talent.

Real Daily-Use Scenario

Consider a mid-sized European tech company looking to fill a software engineering position. This company uses AI to screen the initial batch of resumes. By employing a multi-model approach, the HR team inputs the resumes into Claude, GPT, and Gemini. Each model provides its recommendation score and feedback.

The HR manager, aware of each model's strengths and weaknesses, then reviews the aggregated results. Suppose Claude identifies candidates who excel in technical skills, while Gemini highlights those with strong project management abilities. The HR team can then make a more informed decision by combining these insights with human intuition and knowledge about the company's specific needs.

This scenario illustrates how AI can complement human decision-making rather than replace it. By leveraging AI's ability to process vast amounts of data quickly, companies can focus on interviewing candidates who are not just qualified on paper but also a good fit for the company's culture and objectives.

What This Means for You

For job seekers, the i10x findings suggest an agnostic approach to AI in resume crafting. Testing different AI tools can help uncover styles and formats that resonate better with recruiters. Relying solely on ChatGPT for resume building might no longer suffice; with Gemini appearing to take the lead, applicants should consider experimenting with various AI models to enhance their applications.

This approach allows candidates to tailor their resumes to reflect diverse strengths that different AI models might prioritize. Whether it’s emphasizing technical skills, leadership qualities, or creative problem-solving, using AI insights can help applicants present themselves in the best possible light.

What's Still Unclear?

While the i10x study raises compelling points, it also leaves several questions unanswered. For instance, how do these models perform in real-world scenarios where human judgment interacts with AI recommendations? What specific factors drive the substantial differences in evaluations among models? Moreover, how can companies effectively implement multi-model assessments without complicating their recruitment processes?

These questions indicate areas that require further research and exploration. As AI continues to evolve, understanding these dynamics will be crucial for refining its role in recruitment.

Context: European Regulations and Ethical AI

With the EU intensifying its focus on AI applications, businesses must be vigilant about the ethical implications of using AI in hiring. New regulations aim to ensure that AI systems do not perpetuate discrimination and operate transparently. This adds a layer of complexity for companies deploying AI in recruitment, necessitating a balance between technical performance and compliance with emerging rules.

The EU's regulatory framework emphasizes the importance of ethical AI usage, focusing on non-discrimination, transparency, and accountability. For businesses, this means ensuring that their AI tools are not only effective but also comply with these principles. This compliance is not merely a legal obligation but a strategic advantage in building a fair and inclusive workplace.

How This Fits the Broader Trend

The discrepancies highlighted in the i10x study are part of a broader industry trend. As AI adoption accelerates, concerns about fairness, bias, and transparency become more pronounced. The tech world is increasingly moving towards responsible AI practices, balancing efficiency with ethical considerations.

This shift reflects a growing recognition that AI should enhance, not hinder, the hiring process. By focusing on fairness and transparency, companies can leverage AI's capabilities while ensuring equitable treatment for all candidates. This balance is essential for maintaining trust and credibility in the increasingly AI-driven recruitment landscape.

Operator Perspective: The Realities of Shipping AI Solutions

Having spent over a decade in tech, I understand the practical challenges of rolling out AI in recruitment. Companies need to match technology with human sensibilities. Relying too heavily on AI evaluations could overlook the nuanced qualities that make a candidate truly shine. A balanced approach that combines AI insights with human judgment could pay off.

Any practitioner in the field knows that while AI offers significant advantages in processing data and identifying patterns, it lacks the human touch necessary for evaluating soft skills, cultural fit, and potential. These aspects are critical in determining the success of a hire and ensuring long-term retention.

Why This Matters

The implications of this study are profound. For companies, it highlights the necessity of a multi-faceted approach to AI recruitment to ensure fairness and consistency. For applicants, it underscores the importance of engaging with various AI tools to enhance their applications. As AI continues to evolve, vigilance against bias and inconsistency remains crucial.

Hiring practices must reflect a commitment to fairness and transparency. The future of hiring shouldn't hinge solely on algorithms; it must also embrace the human touch that shapes our workplaces. By integrating AI with human insight, companies can foster a more equitable, efficient, and effective recruitment process.

Source

Heise – https://www.heise.de/news/KI-Chatbots-bewerten-KI-Lebenslaeufe-Claude-findet-sich-besser-als-GPT-11317531.html?wt_mc=rss.red.ho.ho.atom.beitrag.beitrag

Discuss this story

Got a take, a correction, or a follow-up tip? Reply where you read — we read everything.

Discuss on Bluesky@byte-pulse.bsky.social Discuss on X@bytePulsenew Email the deskeditorial@byte-pulse.net Submit a tip/contact

Found an error? File a correction at /corrections. Substantive corrections are logged publicly.

#ai#recruitment#bias#AI models#Claude#Gemini

Get the 5 tech stories worth your time — 3× a week

One short email. The most important AI news, fact-checked, no fluff. Free, unsubscribe anytime.

More from AI

🤖 AI

iOS 27 AI Tier: Latest iPhones Lock Full Potential

Byte-Pulse examines iOS 27's public beta, revealing a tiered system where 'Apple Intelligence' features are gated by chip generations and RAM, creating an uneven experience for users

By Byte-Pulse Newsroom·4 days ago·4 min

🤖 AI

macOS 27 Golden Gate Beta: Apple's AI Leap Faces EU Privacy Scrutiny

Apple's macOS 27 Golden Gate public beta offers a revamped Siri AI, but what are the real-world implications? We examine stability, data risks, and EU privacy concerns.

By Byte-Pulse Newsroom·4 days ago·3 min

🤖 AI

Fidji Simo's Health-Driven Exit Tests OpenAI's C-Suite Resilience Amid IPO Plans

Fidji Simo, a crucial figure in OpenAI's product and business operations, departs due to illness, raising questions about leadership depth ahead of a planned IPO.

By Byte-Pulse Newsroom·Jul 10, 2026·8 min

🤖 AI

Meta's Muse Image Defaults to Public Instagram Photos, Sparking Privacy Backlash

Meta's Muse Image AI uses public Instagram photos by default, prompting privacy concerns. Learn how to opt-out now.

By Byte-Pulse Newsroom·Jul 09, 2026·3 min

About the author

Byte-Pulse Newsroom

AI-augmented editorial system

The Byte-Pulse Newsroom is the editorial system that produces Byte-Pulse's daily tech news coverage. Each story is cross-referenced across 3+ independent outlets, drafted with AI assistance by the newsroom system (Drafter → Editor → Fact-Checker → Polisher), and reviewed by Serhat Er, Editor-in-Chief, before publication. We disclose AI augmentation openly. Editorial accountability stays with the named editor on every article. Tips: editorial@byte-pulse.net.

HardwareAIGamingMobileSecurity

X Mastodon Bluesky YouTube TikTok Website

Editorially reviewed on Jun 13, 2026. Spotted an error? Tell us.

From other sections

Don’t miss these

🎮 Gaming

Ubisoft's 'Black Flag Resynced' Sales Boom Undercut by Barcelona Studio Layoffs and Strike

Ubisoft celebrates 2M 'Black Flag Resynced' sales, but a strike at co-developer Ubisoft Barcelona over 51 job cuts reveals a harsh corporate reality.

By Byte-Pulse Newsroom·3 days ago·5 min

📱 Mobile

Samsung's Galaxy Watch 9 & Ultra 2 Leak: Snapdragon Power, High-End Ambitions

Samsung's Galaxy Watch 9 and Ultra 2 leaks detail a strategic shift to Qualcomm chips, challenging Apple and redefining Android smartwatches.

By Byte-Pulse Newsroom·3 days ago·4 min

⚙️ Hardware

Anker Balkonkraftwerk Deal: Beyond the €977 Price Tag

A new Golem-exclusive deal offers a 1.92 kWp Balkonkraftwerk with Anker SOLIX storage for 977 Euro. We cut through the hype to assess its true worth.

By Byte-Pulse Newsroom·6 days ago·7 min0

🛡️ Security

Apple's Rare Third macOS RC: Unpacking Security Concerns

Byte-Pulse explores the implications of Apple's unusual third Release Candidate for macOS updates, examining the severity of unannounced security fixes and their impact on European users

By Byte-Pulse Newsroom·Jun 29, 2026·3 min

🚗 EV & Auto

Tesla Model 3 vs Polestar 2: Choosing Your Next EV Wisely

A balanced breakdown of Tesla Model 3 and Polestar 2. Compare specs, performance, design, and more to find the right EV for you.

By Serhat Er·Jun 26, 2026·6 min0

🎮 Gaming

Black Flag Resynced: A Technical Marvel That Loses Its Assassin's Heart

Byte-Pulse investigates if Assassin's Creed Black Flag Resynced is a true remake or a cynical cash grab

By Byte-Pulse Newsroom·Jul 08, 2026·4 min