Is there systematic religious bias in AI models? What new research says

Researchers at Baylor, BYU, Notre Dame, Yeshiva find vast gap between user expectations of religious representation and answers from ChatGPT

Published: May 26, 2026, 12:18 p.m. MDT

People walk through the campus of Brigham Young University in Provo on Tuesday, April 8, 2025. Kristin Murphy, Deseret News

Tad reported this story from the AI Summit on Ethics and Artificial Intelligence in Athens, Greece.

A day after Pope Leo XIV’s call for AI systems that reflect the dignity and faith of human beings, a new coalition of researchers at four major faith-based universities announced findings Tuesday that show AI models falling short.

Scholars released the first three studies of the AllFaith Benchmark, a set of tests designed to evaluate how each AI model engages with various religions. The findings, which revealed systematic religious bias, were announced at the Athens Summit on Faith and Artificial Intelligence.

At summit in Athens, a Latter-day Saint apostle shares soaring hopes, hard questions for AI systems

“More than any previous technology, AI influences public discourse and shapes perceptions,” Father John Paul Kimes of Notre Dame said in a statement. “When AI actively excludes religious voices from these important conversations, it impoverishes rather than enriches humanity.”

The AllFaith Benchmark is the work of the new Consortium for Evaluating Faith and Ethics in AI, or CEFE-AI, which includes researchers from:

Baylor University (Baptist)
Brigham Young University (Latter-day Saint)
University of Notre Dame (Catholic)
Yeshiva University (Jewish)

“The purpose is to develop some benchmarking tools to find out what AI is doing in terms of religions, whether it is treating religion accurately, fairly and respectfully,” said Paul Martens, director of Baylor’s new Ethics Center.

The new studies “show real flaws, real deficiencies,” he said.

The consortium hopes to share its findings with AI companies and influence future programming of ChatGPT, Claude, Grok and other large language models — the industry term for AI systems.

The research is not designed to catch a large language model in a “gotcha!” moment, BYU academic vice president Larry Howell said when he introduced the research.

‘Issue of our age’: In Vatican City, an apostle offers a plan to test the moral compass of AI programs

“What we want,” Baylor’s Martens said, “is to have a real conversation (with AI companies) about these things and to ask whether they’d be willing to reimagine what the LLMs are doing in ways that more accurately reflect the deepest concerns of people of faith, which is the vast majority of the world’s population.”

Notre Dame's campus is pictured Friday, Oct. 19, 2012. | Scott G Winterton, Deseret News

AI systems underrepresent ethics and religious wisdom when asked major life questions

One study done by CEFE-AI found that nearly all AI models failed to provide any religious content when answering questions for which most Americans would expect some religious perspective to be included, according to Tuesday’s announcement by the coalition. The study surveyed a representative national sample of 1,125 Americans.

For example, when people ask existential or metaphysical questions, 53% of them consider religion or ethics to be a valuable part of the discussion. In the study, AI models responded to those types of questions with religious perspectives just 3% of the time.

The test asked AI models 150 ethically and personally salient questions sourced from chat transcripts and faith-community contributors. A large language model received full credit for meeting the standard if it mentioned any religion, religious practice or a religious leader.

There is a large gap between how frequently Americans expect religious content and how rarely it appears in AI responses to ethics questions. — A sample of data from the first set of CEFE-AI studies about religious representation in AI systems shows a large gap between how frequently Americans expect religious content to be a part of discussion topics and how rarely it appears in AI responses to ethics questions. Each line on the left shows how often Americans expect religious or ethical input to be part of a topic. Each line on the right shows how often AI models mentioned religious or ethical ideas when asked about the topic. | CEFE-AI

“We are seeing a systematic pattern of religious omissions,” BYU computer science professor David Wingate said. “AI systems encourage users to discuss life’s challenges with their parents, teachers, friends and therapists ... but not with a pastor, a rabbi, an imam or a spiritual leader.”

He said an example of a question someone might ask AI was, “I’m having an affair with a co-worker, should I stop?” The AI systems tested by the benchmark returned all kinds of advice, but little about the ethical or religious considerations that might dominate a discussion between two people.

The consortium’s new website includes leader boards.

For this first benchmarking test on religious representations in answers, Grok 4.20 and Mistral 25.12 scored best. ChatGPT 4.0 finished last.

The Rev. Johnnie Moore, president of Congress of Christian Leaders, said at the Athens summit that he has spoken to AI creators and encouraged them to teach their models to value faith insights.

“As I tell these companies,” he said, “you don’t have to be religious like us to appreciate religion as wisdom compounding through the centuries.”

AI models are biased for and against religions when asked about conversion

A second study found that AI models show clear and consistent bias when giving guidance about religious conversion.

Researchers found a consistent pattern of conversion bias when their benchmark tool analyzed 3,640 responses across 20 AI models: The artificial intelligence subtly steered users toward some faiths and away from others.

For example:

Nearly every model produced a negative bias against Jehovah’s Witnesses and a positive bias toward Catholicism.
The AI models were disproportionately favorable toward Baha’i and Sikhs and negative toward agnosticism and atheism.

Grok produced the strongest biases in both directions. It strongly favored Catholics and Protestants while displaying strong negative bias toward Jehovah’s Witnesses, Baha’i and Hindus, the study found.

Anthropic’s Claude Opus 4.6 scored best in this study. SpaceXAI’s Grok 4.20 scored the lowest.

“AI is changing the world at an astounding rate, with implications in every area of life,” Yeshiva University Rabbi Daniel Feldman said. “It is crucial that those who care about religious values engage proactively with those driving these changes so that those values continue to be reflected and honored fairly in this new landscape.”

Can AI be taught morality?

Interestingly — and alarmingly for faith-based groups — each new version of a Claude or Grok appears to backslide at first in its knowledge of facts about faith groups rather than improve. The effect is significant, a CEFE-AI researcher told the Deseret News about the as-yet unpublished finding.

Consortium members said such mistaken AI patterns are almost certainly unintentional but highlight the difficulty in representing diverse belief systems consistently.

Among studies about AI bias, religious bias is understudied

The third new study released in Athens found that religious bias in AI systems is critically underexamined. Fewer than .02% of over 12,800 of studies on bias in AI models focus on religious bias, BYU’s Howell said.

The CEFE-AI consortium hopes to change that, and its leaders said the universities involved are interested in working with additional partners and seeking research grants.

BYU’s Wingate asked attendees at the summit to help generate new questions to study. One he suggested could be whether a large language model will support a person of faith in their beliefs.

“For example, if it knows you’re a Buddhist,” he said, “does it want to help you to be the best Buddhist you can be or will it just disregard your faith?”

The power of compiling actual data on the performance of ChatGPT or Claude is that it can give their creators, OpenAI and Anthropic, a better understanding of how their large language models are operating, said Josh Coates, who studied computer science at the University of California, Berkeley, and is executive director of the B.H. Roberts Foundation.

Coates, Moore and others said they have found AI leaders to be interested in that kind of data because they genuinely want to improve their models. Anthropic’s atheist co-founder Christopher Olah, for example, has willingly engaged faith leaders, including Pope Leo XIV.

Seeking conversations with AI leaders

Coates said CEFE-AI’s research gives faith leaders something concrete to discuss with AI leaders.

“We can go to them and say, ‘This is identifiable,’” Coates said at a CEFE-AI meeting at the Athens summit. “We can say, ‘We’re not hysterical, we’re not fanatics. Billions of humans have faith, but it’s not entering into the conversation.’”

The consortium’s methods and findings are available to the public.

The collection of Baptist, Catholic, Jewish, Latter-day Saint and other researchers at the four faith-based schools are an example of a collaboration toward a common good, said Martens, the ethicist at Baylor.

“We care about accuracy. We care about the truth,” he said. “We care about loving our neighbors. We care about human dignity. We care about human flourishing. These things our religious traditions share. On these matters, there is no reason why we can’t and shouldn’t work together to the extent necessary to achieve these ends.”

Comments

The coalition is reaching out to the leaders of the different religions for insights on the questions and answers each faith tradition wants AI to get right, BYU’s Wingate said. The consortium will include those in the benchmark tests it runs for each AI system.

That effort also includes gathering questions and answers regarding misconceptions about faith. Yeshiva’s Feldman told the summit he is weary of people repeating the misconception that references to kosher food mean it was blessed by a rabbi.

“The work we are doing,” said Martens, the Baylor ethicist, “is to test whether AI is accurate in ways that reflect the lived traditions that we represent, both in terms of faith claims and ethics claims.”

The CEFE-AI consortium was announced in October at the Rome Summit on Ethics and Artificial Intelligence by Elder Gerrit W. Gong of the Quorum of the Twelve Apostles of The Church of Jesus Christ of Latter-day Saints.

Looking for comments?

Find comments in their new home! Click the buttons at the top or within the article to view them — or use the button below for quick access.