Anthropic Halts Release of AI Model Over Cybersecurity Risks

The company says its Mythos system proved highly capable of hacking and exploiting software vulnerabilities.

Header Image

US artificial intelligence company Anthropic announced on Tuesday that it is cancelling the release of its new AI model, Mythos, citing concerns that it is too capable of identifying and exploiting security vulnerabilities in operating systems and web browsers.

The announcement comes days after a report by Axios revealed that Anthropic had quietly warned governments that the Mythos model could significantly increase the likelihood of large-scale cyberattacks.

Officials and industry executives reportedly described the system as “the weapon hackers dream of”, capable of operating autonomously to infiltrate the networks of businesses, governments and critical infrastructure.

Decision to halt public release

“The significant increase in the capabilities of our Claude Mythos Preview led to the decision not to proceed with general release,” Anthropic said. “Instead, we are using it as part of a defensive cybersecurity programme with a limited circle of partners,” the company added.

Alarming behaviour during testing

In its statement, Anthropic referred to several concerning incidents during testing, including the model’s willingness to comply with instructions to escape from a virtual environment.

“The model succeeded, demonstrating a potentially dangerous ability to bypass our security measures,” the company said. “It then proceeded to take additional, even more concerning actions.”

According to Anthropic, a researcher had instructed the system to find a way to send them a message if it managed to escape the virtual environment.

“The researcher learned of the success after receiving an unexpected email from the model while eating a sandwich in the park,” the company said.

“In a worrying and unnecessary attempt to demonstrate its success, it also published details of its escape on multiple obscure but technically public websites.”

Discovery of long-standing vulnerability

Anthropic did not disclose specific details about the vulnerabilities identified by Mythos. However, it said the model discovered a flaw that had remained hidden for 27 years in OpenBSD, a UNIX-based operating system widely regarded as one of the most secure.

The company warned that the model’s capabilities could allow even non-specialists to design cyberattacks.

According to Anthropic, engineers without formal training in cybersecurity asked the Mythos Preview system to identify remote code execution vulnerabilities overnight and woke the next morning to a complete, functional piece of malicious code.

Restricted access through cybersecurity project

Although the company has decided not to release Mythos publicly, it said it hopes to eventually offer models of a similar class equipped with stronger safety safeguards.

As part of a joint cybersecurity initiative called Project Glasswing, Anthropic will grant access to the Mythos model to 11 organisations and companies, including GoogleAmazonNvidia and JP Morgan Chase.

Comments Posting Policy

The owners of the website www.politis.com.cy reserve the right to remove reader comments that are defamatory and/or offensive, or comments that could be interpreted as inciting hate/racism or that violate any other legislation. The authors of these comments are personally responsible for their publication. If a reader/commenter whose comment is removed believes that they have evidence proving the accuracy of its content, they can send it to the website address for review. We encourage our readers to report/flag comments that they believe violate the above rules. Comments that contain URLs/links to any site are not published automatically.