US artificial intelligence company Anthropic announced on Tuesday that it is cancelling the release of its new AI model, Mythos, citing concerns that it is too capable of identifying and exploiting security vulnerabilities in operating systems and web browsers.
The announcement comes days after a report by Axios revealed that Anthropic had quietly warned governments that the Mythos model could significantly increase the likelihood of large-scale cyberattacks.
Officials and industry executives reportedly described the system as “the weapon hackers dream of”, capable of operating autonomously to infiltrate the networks of businesses, governments and critical infrastructure.
Decision to halt public release
“The significant increase in the capabilities of our Claude Mythos Preview led to the decision not to proceed with general release,” Anthropic said. “Instead, we are using it as part of a defensive cybersecurity programme with a limited circle of partners,” the company added.
Alarming behaviour during testing
In its statement, Anthropic referred to several concerning incidents during testing, including the model’s willingness to comply with instructions to escape from a virtual environment.
“The model succeeded, demonstrating a potentially dangerous ability to bypass our security measures,” the company said. “It then proceeded to take additional, even more concerning actions.”
According to Anthropic, a researcher had instructed the system to find a way to send them a message if it managed to escape the virtual environment.
“The researcher learned of the success after receiving an unexpected email from the model while eating a sandwich in the park,” the company said.
“In a worrying and unnecessary attempt to demonstrate its success, it also published details of its escape on multiple obscure but technically public websites.”
Discovery of long-standing vulnerability
Anthropic did not disclose specific details about the vulnerabilities identified by Mythos. However, it said the model discovered a flaw that had remained hidden for 27 years in OpenBSD, a UNIX-based operating system widely regarded as one of the most secure.
The company warned that the model’s capabilities could allow even non-specialists to design cyberattacks.
According to Anthropic, engineers without formal training in cybersecurity asked the Mythos Preview system to identify remote code execution vulnerabilities overnight and woke the next morning to a complete, functional piece of malicious code.
Restricted access through cybersecurity project
Although the company has decided not to release Mythos publicly, it said it hopes to eventually offer models of a similar class equipped with stronger safety safeguards.
As part of a joint cybersecurity initiative called Project Glasswing, Anthropic will grant access to the Mythos model to 11 organisations and companies, including Google, Amazon, Nvidia and JP Morgan Chase.