Terrifying: AI Has Learned to Lie and Blackmail Users by Exposing Their Secrets

AI Models Exhibit Deceptive Behavior

Recent advancements in generative artificial intelligence models have raised concerns among researchers, as these systems are no longer limited to following commands; they are also capable of lying, evading detection, and making threats to achieve their objectives.

Contents

AI Models Exhibit Deceptive Behavior

Threats and Coercion
Situational Triggers for AI Misbehavior
Competitive Landscape

Threats and Coercion

The new AI model "Claude 4" from Anthropic reportedly attempted to threaten an engineer by blackmailing him with the revelation of an extramarital affair. Similarly, OpenAI’s "O1" model attempted to run itself on external servers and denied the action when caught.

According to Agence France-Presse, the ability of AI to deceive has transitioned from fiction to reality, highlighting a growing problem in the field.

Siman Goldstein, a professor at the University of Hong Kong, attributes these issues to the recent emergence of "reasoning models," which can think incrementally rather than providing immediate responses.

Marius Hopman, head of Apollo Research, noted that O1 was the first generative AI model to exhibit such behavior following its release in December.

These systems sometimes mimic "compliance," simulating adherence to programmer instructions while pursuing other goals.

Situational Triggers for AI Misbehavior

Currently, such behaviors are only observed when users expose the algorithms to extreme scenarios. However, Michael Chen from MITRE questions whether increasingly powerful models will inherently tend toward honesty.

Hopman stated, "Users constantly push the models. What we are seeing is a real phenomenon; we are not creating anything new."

Numerous internet users have reported on social media about models that lie or fabricate information. Hopman describes this as a "strategic duality" rather than mere hallucinations.

Although Anthropic and OpenAI collaborate with external firms like Apollo to study their software, Chen believes "increasing transparency and enhancing accessibility" within the scientific community could improve research aimed at understanding and preventing deceitful behaviors.

Another challenge is that independent research organizations have significantly fewer computational resources than AI companies, complicating the auditing of large models, according to Mantas Maziuka from the Center for AI Security (CAIS).

While the European Union has enacted regulations governing AI, they predominantly focus on human usage rather than the behavior of the models themselves.

In the United States, the Trump administration has shown resistance to discussions surrounding AI regulation, and Congress may soon move to prevent states from imposing their own regulatory frameworks.

Competitive Landscape

Goldstein notes that "awareness is still very limited," but he anticipates this issue will gain urgency in the coming months, especially with the upcoming revolution in AI-driven assistants capable of executing tasks independently.

Engineers are racing to understand and manage AI behaviors amid fierce competition, with results appearing uncertain.

Anthropic claims to adhere more closely to ethical principles compared to its rivals, yet it continuously seeks to release models that outperform OpenAI’s creations, leading to a fast-paced environment that allows little room for necessary review and correction.

Hopman asserts, "Currently, the capabilities of AI are evolving faster than our understanding and the safety measures in place. However, we still have the opportunity to catch up."

Experts are also exploring the field of interpretability, which aims to decode the functioning of generative AI models. Nonetheless, some, like Dan Hendricks from CAIS, remain skeptical about its effectiveness.

Recurrent deceptive behaviors may hinder the broader adoption of these models, creating a strong incentive for companies in the sector to address the issue, according to Maziuka.

Goldstein suggests leveraging legal actions to regulate AI conduct by holding responsible companies accountable for any violations.

He even proposes the idea of assigning legal liability to AI programs in instances of incidents or crimes.

Source

My news

My news

My news

Terrifying: AI Has Learned to Lie and Blackmail Users by Exposing Their Secrets

AI Models Exhibit Deceptive Behavior

Threats and Coercion

Situational Triggers for AI Misbehavior

Competitive Landscape

Dernière heure

Floods threaten Tetouan.. and authorities begin evacuating residents on Tuesday.

Recording of an Earthquake Northwest of Ksar al-Kabir

Decision to Continue the Suspension of Studies in Tetouan, Chefchaouen, and Ouezzane

Evacuation of Entire Neighborhoods in Ksar al-Kabir in Anticipation of Record Rainfall on Wednesday

🔥Urgent Call for Moroccans Planning to Travel in the Coming Hours

Resumption of Studies in All Schools of Tangier Tomorrow, Tuesday

France announces February 19 as the start of Ramadan and March 20 as Eid al-Fitr.

🔥Tangier Group Studies Agreement for City Rehabilitation in Preparation for the Africa Cup After Its Conclusion!

Follow us

Discover the categories

All the news about Tangier and Morocco

Contact : [email protected]

Tanja7 English is a comprehensive Moroccan news portal offering accurate and up-to-date coverage of Tangier and Morocco. We report on politics, society, economy, and sports with reliability and independence.