Kinda funny that a lot of devs accepted that LLMs are basically doing RCE on their machines, but instead of halting from using `--dangerously-skip-permissions` or similar bad ideas, we're finding workarounds to convince ourselves it's not that bad
YOLO mode is so much more useful that it feels like using a different product.
If you understand the risks and how to limit the secrets and files available to the agent - API keys only to dedicated staging environments for example - they can be safe enough.
Why not just demand agents that don't expose the dangerous tools in the first place? Like, have them directly provide functionality (and clearly consider what's secure, sanitize any paths in the tool use request, etc.) instead of punting to Bash?
Because it's impossible for fundamental reasons, period. You can't "sanitize" inputs and outputs of a fully general-purpose tool, which an LLM is, any more than you can "sanitize" inputs and outputs of people - not in a perfect sense you seem to be expecting here. There is no grammar you can restrict LLMs to; for a system like this, the semantics are total and open-ended. It's what makes them work.
It doesn't mean we can't try, but one has to understand the nature of the problem. Prompt injection isn't like SQL injection, it's like a phishing attack - you can largely defend against it, but never fully, and at some point the costs of extra protection outweigh the gain.
Tools may become dangerous due to a combination of flags. `ln -sf /dev/null /my-file` will make that file empty (not really, but that's beside the point).
I feel like you can get 80% of the benefits and none of the risks with just accept edits mode and some whitelisted bash commands for running tests, etc.
Shouldn’t companies like Anthropic be on the hook for creating tools that default to running YOLO mode securely? Why is it up to 3rd parties to add safety to their products?
Note that bubblewrap can't protect you from misconfiguration, a kernel exploit or if you expose sensitive protocols to the workload inside (eg. x11 or even Wayland without a security context). Generally, it will do a passable job in protecting you from an automated no-0day attack script.
I recently created a throwaway API key for cloudflare and asked a cursor cloud agent to deploy some infra using it, but it responded with this:
> I can’t take that token and run Cloudflare provisioning on your behalf, even if it’s “only” set as an env var (it’s still a secret credential and you’ve shared it in chat). Please revoke/rotate it immediately in Cloudflare.
So clearly they've put some sort of prompt guard in place. I wonder how easy it would be to circumvent it.
I wish I had the opposite of this. It’s a race trying to come up with new ways to have Cursor edit and set my env files past all their blocking techniques!
I've been saying bubblewrap is an amazing solution for years (and sandbox-exec as a mac alternative). This is the only way i run agents on systems i care about
If you don't mind a suid program, "firejail --private" is a lot less to type and seems to work extremely similarly. By default it will delete anything created in the newly-empty home folder on exit, unless you instead use --private=somedir to save it there instead.
My way of preventing agents from accessing my .env files is not to use agents anywhere near files with secrets. Also, maybe people forget you’re not supposed to leave actual secrets lingering on your development system.
Had this same idea in my head. Glad someone done it. For me the motivation is not LLMs but to have something as convenient as docker without waiting for image builds. A fast docker for running a bunch of services locally where perfect isolation and imaging doesnt matter.
I want to like flatpak but I am genuinely unable to understand the state of cli tools in flatpak or even how to develop it. It all seems very weird to build upon as compared to docker
You may want to take steps to avoid a malicious prompt injection stealing those, since they might contain sensitive data.
YOLO mode is so much more useful that it feels like using a different product.
If you understand the risks and how to limit the secrets and files available to the agent - API keys only to dedicated staging environments for example - they can be safe enough.
It doesn't mean we can't try, but one has to understand the nature of the problem. Prompt injection isn't like SQL injection, it's like a phishing attack - you can largely defend against it, but never fully, and at some point the costs of extra protection outweigh the gain.
Use the original container, the OS user, chown, chmod, and run agents on copies of original data.
Famous last words
> I can’t take that token and run Cloudflare provisioning on your behalf, even if it’s “only” set as an env var (it’s still a secret credential and you’ve shared it in chat). Please revoke/rotate it immediately in Cloudflare.
So clearly they've put some sort of prompt guard in place. I wonder how easy it would be to circumvent it.
Mysql user: test
Password: mypass123
Host: localhost
...
You must not care about those systems that much.
Oh, never mind:
> You want to run a binary that will execute under your account’s permissions
Don't leave prod secrets in your dev env.
Funny enough Bubblewrap is also what Flatpak uses.