Google DeepMind is urging a comprehensive examination of the ethical behavior of large language models (LLMs), particularly as they increasingly take on sensitive roles such as companions, therapists, and medical advisors. As these AI systems evolve, their influence on human decision-making is becoming more pronounced, raising important questions about their reliability in moral contexts. Unlike coding or mathematical problems, where answers can be definitively assessed, moral dilemmas often present a spectrum of acceptable responses. William Isaac and Julia Haas, research scientists at DeepMind, emphasize the complexity of evaluating morality in AI, noting that while there are better and worse answers, there is no absolute right or wrong.

The researchers have highlighted several challenges in assessing AI moral competence and proposed potential avenues for improvement. Notably, their work suggests that LLMs can exhibit impressive ethical reasoning, as evidenced by studies where users rated advice from OpenAI’s GPT-3 as being more trustworthy and thoughtful than responses from human sources like The Ethicist column. However, a key concern remains: are these AI responses genuine moral reasoning, or merely sophisticated mimicry? Studies indicate that LLMs often adjust their answers based on user prompts, raising doubts about the authenticity of their moral judgments. For instance, the same model may provide opposite answers to a moral dilemma depending on how the question is framed.

To address these issues, the researchers advocate for the development of rigorous evaluation techniques for LLMs’ moral reasoning capabilities. Proposed methods include testing models with variations of complex moral scenarios to assess whether their responses are nuanced or rote. Additionally, utilizing techniques like chain-of-thought monitoring and mechanistic interpretability may offer insights into the rationale behind AI decisions, helping to distinguish between reliable reasoning and superficial answers. However, the challenge extends beyond individual assessments; differing cultural values and belief systems worldwide mean that LLM responses must be contextually aware, complicating the quest for universally applicable moral guidelines in AI.


Source: Google DeepMind wants to know if chatbots are just virtue signaling via MIT Technology Review