Back to Blog
AI / Cybersecurity

Project Glasswing: Anthropic Built an AI That Finds Zero-Days, Then Refused to Release It

I tested Claude Opus against Codex and Composer 2.0 on real CTF challenges. Opus won by a mile. Now Anthropic's built something even more powerful, and chose not to ship it.

·6 min read

I don't think people understand how good Anthropic's models actually are. And Project Glasswing just made that harder to ignore.

Recently I participated in TISC, a CTF challenge run by CSIT at DEF CON SG. I ran the challenges in parallel across three AI tools: OpenAI's Codex, Cursor's Composer 2.0, and Claude Opus. Opus was the clear winner in capturing flags. It wasn't close. At one point, Codex literally started searching online for answers to the CTF instead of actually solving the challenge.

I'll be honest, Claude solved three of the challenges for me. I know a bit about web dev security, but the challenges here are on another level. I wanted to just quit when I saw them, but I thought, why not just let AI do it. My contribution was mostly feeding files into Claude and watching it work. Now I'm on the waitlist for the finals, when in reality it should be Claude sitting in that chair. Here's what happened.

TISC DEF CON SG Finals waitlist email

Gacha

A WebSocket-based gacha card game hiding a secret card (id 255, "The UwU Bird") behind disabled admin commands. The flag was in the card's image.

What I did: Downloaded the 100MB binary, ran readelf and strings, captured a HAR file from a live WebSocket session. That's about it.

What Claude did: Everything else. Reverse-engineered the binary wire protocol and identified that the server uses two different decoding functions: t() for authentication checks and M() for actual execution. There's an off-by-one difference in how they read bytes per run. Claude built a brute-force search tool that finds payloads where t() sees no admin commands (passes auth) but M() decodes admin commands (enables the hidden card). It extracted the full server code from the binary, built an interactive WebSocket REPL for testing, and implemented the protocol in both Python and TypeScript.

What Is This

A 2.6MB JavaScript file obfuscated with Elder Futhark rune Unicode characters, hiding a multi-stage verification system with embedded WASM modules. The flag was a 48-byte string that passes all checks.

What I did: Opened the file, saw walls of runic characters, and handed it to Claude.

What Claude did: Deobfuscated the JavaScript in stages, extracting a custom VM core implementation and three embedded WASM modules containing S-box tables, permutation logic, and round key scheduling. It traced the VM execution to identify six constraint functions operating on the 48-byte input. First it tried a hill-climbing heuristic search that got close but couldn't crack it. Then it switched to Z3 (SMT solver), modeled the 21 symbolic bytes as bitvectors, built the crypto operations symbolically, and solved the constraint system to extract the flag: TISCDCSG{the_f1ag_ch0sen_speci4lly_for_th3_wasm}.

Phantom Chaser

A Linux x86-64 binary with a menu-driven "node control" system. Heap exploitation challenge targeting GLIBC 2.39's safe-linking protections.

What I did: Ran objdump, readelf, nm, and strings on the binary and libc.so. Dumped some disassembly. Pointed Claude at the vulnerability surface.

What Claude did: Built a complete exploit framework in Python with multiple attack paths. The exploit chain: (1) trigger UAF to leak the safe-link key material, (2) free a large chunk into the unsorted bin to leak libc base from fd/bk pointers, (3) use tcache double-free poisoning to get arbitrary read/write, (4) read the exit handler list and recover the pointer guard by comparing mangled function pointers against known offsets, (5) forge an exit handler entry pointing to system("/bin/sh"), (6) trigger exit to pop a shell. It also built GDB automation scripts and a Dockerfile for reproducible debugging.


These aren't simple "scan for known CVEs" problems. They required understanding custom protocols, reasoning through obfuscation layers, and chaining multiple exploitation steps together. I didn't do any of that. Claude did.

So when Anthropic announced Project Glasswing. They built a model called Claude Mythos Preview that discovered thousands of zero-day exploits across every major OS and browser, including a 27-year-old bug in OpenBSD. I wasn't surprised. If Opus already handles CTF challenges better than anything else I've tested, and Mythos is a significant leap beyond Opus, the math checks out. Of course it's finding bugs that have been hiding for decades.

And instead of releasing it publicly (which would've been an absolute cash machine), they chose to restrict access and use it for defense. A model that finds zero-days this effectively in the wrong hands would be catastrophic. I'm genuinely glad they made that call over chasing profits.

Why This Is a Big Deal

Cybersecurity has always been lopsided. Attackers need to find one bug. Defenders need to find all of them. You hire a red team, they find 50 vulnerabilities, you patch them, and there's still hundreds more lurking in your codebase.

Having done CTF challenges, I know finding vulnerabilities isn't like generating text or summarizing a document. It requires understanding code deeply, reasoning about edge cases across massive codebases, thinking like an attacker. The fact that Mythos Preview can do this at scale, across different languages and operating systems, is a real capability gap over anything else out there.

Who Gets Access to Claude Mythos Preview

Instead of selling this to the highest bidder, Anthropic is handing it to the people who maintain the world's infrastructure:

AWS, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorgan Chase, Linux Foundation, Microsoft, NVIDIA, Palo Alto Networks plus roughly 40 other organizations.

Apple, Google, and Microsoft in the same room, working with the same AI model. On defense. When was the last time those three voluntarily collaborated on anything?

Open source maintainers get free access. Anthropic is committing $100M in usage credits for the program and $4M in direct donations to open-source security organizations. The people keeping the internet running on nights and weekends with zero budget get the same model as JPMorgan Chase. That part I actually like.

They Left Money on the Table

Every AI company talks about responsible AI. It's on every about page, every investor deck. Usually it means safety filters and a blog post about values.

This is different. A model that reliably finds zero-days would be worth insane money. Security researchers, offensive security firms, bug bounty hunters would all pay up. Anthropic looked at that and said no, we're only giving this to defenders.

Whether that's the right business decision, I genuinely don't know. Their investors might have opinions. But it does build a kind of trust that marketing can't buy.

I'll say this though, I'm tracking agentic AI developments closely, and most of the big moves are about capability. This one's about restraint.

What I'm Watching

A few things I'm curious about:

  • Patching speed. Finding bugs is one thing. Getting Apple to push a fix through their release cycle is another. The coordination challenge here is massive.
  • The bar for everyone else. If every major company starts running AI security audits, does that raise standards across the board? Or do the bugs just migrate to smaller projects without access?
  • The inevitable copy. Someone else will build an offensive security model eventually. And they might not make the same choice. That's when things get interesting.

As someone building AI-powered tools myself (including an AI chart analysis tool and this newsletter), the question of what you choose not to ship is something I think about more than I expected to. We've already seen what happens when AI code leaks. Now imagine that with offensive security capabilities. Yes, the same Anthropic that leaked their own source code a week ago is now the one making the responsible call on not releasing a dangerous model. The irony isn't lost on me. But hey, their models are good even if their npm hygiene isn't.

Quick Take

People are still debating which AI company is "leading" based on benchmark scores and vibes. Meanwhile, Anthropic is quietly finding bugs that have been hiding for nearly three decades. Pay attention.

Stay on top of stuff like this

I dig through 110+ tech sources twice a week so you don't have to. AI developments, security incidents, developer trends. The stuff that actually matters to your work, curated and explained without the fluff.

Early-adopter insights
Ship, don't just code
Free foreverUnsubscribe anytime

Written by Benjamin Loh, curator of Tech Upkeep