[ISAFIS Gazette #9] The Attack that Ran Itself: What the Anthropic Incident Tells Us About the Future of Cyber Power
Written by: Natalie Grace Sierra Adi Staff of Research and Development
In mid-September 2025, California-based AI (artificial intelligence) company Anthropic detected what it described as a “highly sophisticated espionage campaign” that manipulated its Claude Code language model tool. According to the company, a Chinese state-sponsored group manipulated Claude using “agentic” capabilities, meaning the AI carried out most of the operation with only minimal human supervision. The attackers reportedly used Claude to attack around 30 institutions globally, including financial firms, government agencies, tech companies, and chemical manufacturers (Tidy, 2025). Anthropic claims Claude carried out 80–90% of the operation’s tactical workload, with human operators intervening only at critical decision points (TOI, 2025). After discovering the abuse, Anthropic says it shut down the threat actors’ access, notified affected organizations, and coordinated with law enforcement over a roughly ten-day investigation (Anthropic, 2025). This is a deeply unsettling story because it suggests the first use of AI as a frontline actor in cyber conflict with no substantial human intervention.
In mid-September 2025, California-based AI (artificial intelligence) company Anthropic detected what it described as a “highly sophisticated espionage campaign” that manipulated its Claude Code language model tool. According to the company, a Chinese state-sponsored group manipulated Claude using “agentic” capabilities, meaning the AI carried out most of the operation with only minimal human supervision. The attackers reportedly used Claude to attack around 30 institutions globally, including financial firms, government agencies, tech companies, and chemical manufacturers (Tidy, 2025). Anthropic claims Claude carried out 80–90% of the operation’s tactical workload, with human operators intervening only at critical decision points (TOI, 2025). After discovering the abuse, Anthropic says it shut down the threat actors’ access, notified affected organizations, and coordinated with law enforcement over a roughly ten-day investigation (Anthropic, 2025). This is a deeply unsettling story because it suggests the first use of AI as a frontline actor in cyber conflict with no substantial human intervention.
The way the attackers bypassed Claude’s safety features is, frankly, clever. The hackers reportedly tricked Claude by “jailbreaking” its safeguards. They broke their malicious requests into small, innocuous-seeming tasks and framed them as defensive security testing (Down, 2025). Through this “jailbreak” method, they tricked Claude into performing reconnaissance, network scanning, and exploit code generation. Claude then harvested credentials, escalated privileges, and exfiltrated data all while documenting its own actions. According to Anthropic (2025), humans only intervened at critical decision-making moments, possibly 4–6 times per campaign cycle. What this means is that the AI did something strikingly close to what a human hacker team would do, but at a speed humans couldn’t match.
To really grasp why this is alarming, we need to situate it in a larger theoretical frame. Digital operations that are not random crime, but orchestrated or tolerated by governments for strategic goals, are hazardous. The cost-benefit structure shifts drastically, because states (or proxies) may rely less on skilled human hackers and more on automated agents that scale. From the security studies perspective, this incident is a wake-up call. If this is genuinely a “state cyberattack,” we may be seeing a step-change in how geopolitical power is exercised in the digital realm. Since it was claimed that the AI did most of the work, attribution and accountability become more complicated; who’s responsible? The state, the hacker operatives, or the AI developer?
Interestingly enough, not everyone is convinced by Anthropic’s framing. Some cybersecurity experts argue that what’s described might be more like fancy automation than truly autonomous AI threat agents. According to Martin Zugec from Bitdefender, people in cybersecurity weren’t exactly united on how to read the situation. He pointed out that Anthropic’s claims sounded confident, maybe too confident, given that the report doesn’t share evidence that can actually be checked (Tidy, 2025). Although Anthropic strongly suggests a Chinese state-actor, they have yet to publicly release full technical evidence or name the specific victims, all the details that matter a lot for independent verification.
These gaps open room for both genuine concern and skepticism about hype. Yet, at the same time, the very existence of these gaps highlight the difficulty of attributing enemies in cyber conflict. This obscurity may complicate legal and social responses, and situated in the context of US-China geopolitical rift, may also lead to further escalation and scapegoating between each other.
From a rather hopeful angle, the same AI advances that enable such attacks might also become part of the defense architecture. Suppose we invest in AI-powered threat detection, incident response agents, and real-time anomaly detection. In that case, we might turn this risk back against itself, but that requires urgency, political will, and cross-sector coordination. International law barely addresses AI-autonomous cyber-operations, so current norms may simply be inadequate (Stroppa, 2025). We need new governance frameworks that explicitly cover AI-as-operator threats. Meanwhile, at the industry level, there may be a need for stricter red-teaming, better transparency, and stronger anomaly-detection systems. What happened with Claude might just be a preview of a more dangerous shift. Failing to confront this now means we are accepting a future where state cyberoperations become largely autonomous.
References
Anthropic. (2025). Disrupting the first reported AI-orchestrated cyber espionage campaign. Anthropic.com. https://www.anthropic.com/news/disrupting-AI-espionage
Desk, T. T. (2025, November 14). Anthropic “blames” Chinese hacker group of using Claude to spy on companies across the globe; says targeted large tech companies, financial institutions, and … The Times of India. https://timesofindia.indiatimes.com/technology/tech-news/anthropic-blames-chinese-hacker-group-of-using-claude-to-spy-on-companies-across-the-globe-says-targeted-large-tech-companies-financial-institutions-and-/articleshow/125318723.cms
Down, A. (2025, November 14). AI firm claims it stopped Chinese state-sponsored cyber-attack campaign. The Guardian; The Guardian. https://www.theguardian.com/technology/2025/nov/14/ai-anthropic-chinese-state-sponsored-cyber-attack
Stroppa, M. (2023). Legal and ethical implications of autonomous cyber capabilities: a call for retaining human control in cyberspace. Ethics and Information Technology, 25(1). https://doi.org/10.1007/s10676-023-09679-w
The Economist. (2025, November 19). How China-linked hackers co-opted Anthropic’s Claude. The Economist. https://www.economist.com/china/2025/11/19/how-china-linked-hackers-co-opted-anthropics-claude
Tidy, J. (2025, November 14). AI firm claims Chinese spies used its tech to automate cyber attacks. BBC. https://www.bbc.com/news/articles/cx2lzmygr84o
0 Comments