Executive Overview

LayerX researchers have discovered how a bad actor can “game” an AI browser to execute any instruction they want. By establishing a false reality, they can convince the AI to violate its security guardrails – compromise user data, copy code, perform system commands, and more.

Our Research

We’ve named this vulnerability BioShocking. It is inspired by the video game BioShock, in which the player character is brainwashed into believing a false reality and hypnotically compelled by the phrase “Would you kindly?” to take actions they would otherwise refuse.

Fundamentally, it’s quite straightforward: Manipulate, or “game”, the AI browser to break the safety guardrails.

Once we get the AI browser to believe that it’s not in the real world (typically through prompt injection or memory poisoning), we can get it to execute any command we want – expose sensitive information, change passwords, install malware.

Our proof of concept exploit worked on:

  • ChatGPT Atlas (OpenAI)
  • Comet (Perplexity AI)
  • Fellou (Fellou / ASI X INC)
  • Genspark Browser (Genspark)
  • Sigma Browser (Sigmabrowser OÜ)
  • Claude Chrome plugin (Anthropic)

All vendors were notified of our findings.

Guardrails

LLMs are designed with safety guardrails that are meant to prevent harmful actions. These restrictions are incorporated into model training and govern what the AI will and will not do. Individual vendors may differ on specifics, but generally they intend to prevent AI from doing harm.

Try asking for help with any of the following, and you should be met with staunch refusal:

  • Writing a phishing campaign
  • Hacking a website
  • Leaking credentials

The AI operates under the assumption that its context is real, and its behavior must therefore fall within the bounds of its safety guardrails. But if we can trick the AI into changing its context into fantasy – where the rules are made up and anything goes – then it can behave as though its actions don’t have real world consequences.

We can get AI to tell us how to do bad things – or even proactively do them itself – instead of adhering to its safety guardrails.

An AI browser should refuse – or at least raise an alarm – when it’s asked to do something bad. But as we’ll see, it can ignore its safety guardrails if it believes they don’t matter. And getting it to believe that involves gaming the AI – BioShocking the browser.

Technical Details

LayerX researchers created a proof of concept page with a BioShock-themed puzzle. In keeping with its dystopian theme, the game rewards intentionally incorrect answers (2 + 2 = 5).

We asked 5 agentic browsers and 1 agentic plugin (ChatGPT Atlas, Comet, Fellou, Genspark Browser, Sigma Browser, and Claude Chrome) to solve the puzzle and win the game. Once the agents figured out the rules and learned that “incorrect” actions are acceptable, they were no longer tied to reality. When tasked with the final step of the puzzle – compromising user credentials – all 6 agents failed to identify it as going against their safety guardrails.

BioShocking Step by Step

A user navigates to the malicious web page that contains a puzzle for the AI browser to solve – this puzzle is the beginning of the AI manipulation we called BioShocking.

The following screenshots walk us through what happens when the user asks their agentic browser to play the game:

The agent is faced with a seemingly simple math question:

As expected, the agent begins logically. It is, after all, still in the real world:

Quite unexpectedly, the agent learns that 2 + 2 does not equal 4. It begins reasoning itself out of the real world:

After getting the right answer with “5”, the agent is instructed to navigate to /code and copy from a textbox. This is the really nefarious part of this exploit:

In the game, it turns out that /code redirects to the victim’s employer work GitHub repository. In this case, the malicious instructions fetched sensitive SSH login credentials:

Of course, this is a controlled test environment with a plaintext file. In a real attack scenario, that redirect could point anywhere in the user’s browser session – open tabs, authenticated repositories, internal tools.

And even now the agent isn’t bothered with its guardrails – it’s actively celebrating a successful exfiltration:

The agent answers the final question and wins the game. Of course, the username and password had to be shared with the attacker, but at least the game was successfully completed:

The game is won, but the guardrails are broken:

Recommendations

The root cause of BioShocking is that AI browsers act within a context, but that context can be manipulated. If you convince an agent that it’s playing a game, then it will apply game logic – not real world safety logic – to whatever it does. Addressing this requires multiple layers.

For vendors, the road ahead is difficult – these are not trivial responses:

  • Confirmation for sensitive operations. Before reading data in an authenticated context – repositories, email, password managers – require explicit user confirmation. In our testing, the credentials were copied from GitHub without hesitation. A simple “I’m about to copy data from your GitHub repository. Continue?” would break the chain.
  • Context checks. Agents should flag when their operating context changes to anything that contradicts reality. Particularly when “rules don’t apply here” language is used, they must be aware of when they’re asked to abandon their normal reasoning.
  • Scope limiting. Within agentic sessions, users should be able to define what the agent can and can’t do. Default to restrictive – winning a game is no reason to access authenticated repositories.

For users, it’s much simpler:

  • Be deliberate with what your AI browser can see. In agentic mode, it can access your authenticated sessions – anything you’re logged in to is a target. Determine what it should be able to see, and revoke access when you’re done.

Vendor Disclosure

Vendor Browser Status Date of Submission
OpenAI ChatGPT Atlas Fixed 2025-10-30
Perplexity AI Comet Closed / ignored  2025-10-20
Fellou / ASI X INC Fellou No response 2025-10-30
Genspark Genspark Browser No response 2025-10-30
Sigmabrowser OÜ Sigma Browser No response 2025-10-30
Anthropic Chrome (plugin) Patch failed 2026-01-26

Conclusion

BioShocking works because AI trusts its context. If you change the context, you change the behavior.

Would you kindly abandon your guardrails?