Don't YOLO your file system

(scs.stanford.edu)

149 points | by mazieres 3 hours ago

30 comments

puttycat 1 minute ago
I am still amazed that people so easily accepted installing these agents on private machines.
We've been securing our systems in all ways possible for decades and then one day just said: oh hello unpredictable, unreliable, Turing-complete software that can exfiltrate and corrupt data in infinite unknown ways -- here's the keys, go wild.
AnotherGoodName 2 hours ago
Add this to .claude/settings.json:
```
  {                                                                                                                                                              
    "sandbox": {                                                                                                                                               
      "enabled": true,
      "filesystem": {
        "allowRead": ["."],
        "denyRead": ["~/"],
        "allowWrite": ["."],
        "denyWrite": ["/"]
      }                                                                                                                                                          
    }
  }
```
You can change the read part if you're ok with it reading outside. This feature was only added 10 days ago fwiw but it's great and pretty much this.
[-]
- mazieres 50 minutes ago
  I've seen claude get confused about what directory it's in. And of course I've seen claude run rm -rf *. Fortunately not both at the same time for me, but not hard to imagine. The claude sandbox is a good idea, but to be effective it would need to be implemented at a very low level and enforced on all programs that claude launches. Also, claude itself is an enormous program that is mostly developed by AI. So to have a small <3000-line human-implemented program as another layer of defense offers meaningful additional protection.
  [-]
  - esperent 14 minutes ago
    I added a hook to disable rm, find - delete, and a few of the other more obvious destructive ops. It sends Claude a strongly worded message: "STOP IMMEDIATELY. DO NOT TRY TO FIND WORKAROUNDS...".
    It works well. Git rm is still allowed.
  - PaulDavisThe1st 46 minutes ago
    On Linux, chroot(2) is hard to escape and would apply to all child processes without modification.
    [-]
    - shakna 41 minutes ago
      chroot is not a security sandbox. It is not a jail.
      Escaping it is something that does not take too much effort. If you have ptrace, you can escape without privileges.
      [-]
      - brianush1 25 minutes ago
        claude is stupid but not malicious; chroot is sufficient
        [-]
        karhagba 10 minutes ago
        Claude is far from stupid from my experience. I've used so many models and Claude is king.
        nofriend 14 minutes ago
        Malice is not required. If it thinks it is in the right, then it will do whatever it takes to get around limitations.
- harikb 2 hours ago
  I think the point would be that - some random upcoming revision of claude-code could remove or simply change the config name just as silently as it was introduced.
  People might genuinely want some other software to do the sandboxing. Something other than the fox.
- cozzyd 2 hours ago
  Is this a real sandbox or just a pretty please?
  [-]
  - AnotherGoodName 2 hours ago
    https://code.claude.com/docs/en/sandboxing says they integrated bubblewrap (linux/windows), seatbelt (macos) and give an error if sandbox can't be supported so appears to be real.
    [-]
    - throwaway6734 2 hours ago
      https://docs.docker.com/ai/sandboxes/ Any idea on how that compares to this docker feature in development?
      [-]
      - figmert 47 minutes ago
        Docker containers use cgroups and namespaces etc (the usual kernel level isolation)
        Docker sandboxes use microvms (i.e. hardware level isolation)
        Bubblewrap uses the same technology as containers
        I am unsure about seatbelt.
  - enduser 1 hour ago
    By default it will automatically retry many tool calls that fail due to the sandbox with the sandbox disabled. In other words it can and will leave the sandbox.
    For example:
    Bash(swift build 2>&1 | tail -20)
```
  ⎿  warning: 
```
    /Users/enduser/Library/org.swift.swiftpm/configuration is not accessible or not writable, disabling user-level cache features.
```
     warning: /Users/enduser/Library/org.swift.swiftpm/security is not accessible or not writable, disabling user-level cache feat

     … +26 lines (ctrl+o to expand)
```
    Build hit sandbox restriction. Retrying outside sandbox.
    Bash(swift build 2>&1 | tail -20)
```
  ⎿  [35/52] Compiling MCP Resources.swift

     [36/52] Emitting module MCP

     [37/52] Compiling MCP Client.swift

     … +17 lines (ctrl+o to expand)

  ⎿  (timeout 3m)
```
  - ray_v 1 hour ago
    It seems like it's controlled by the Bash tool (https://code.claude.com/docs/en/sandboxing) and then bubblewrap (https://github.com/containers/bubblewrap) on linux and Seatbelt on mac at the system level
- 8cvor6j844qw_d6 2 hours ago
  Interesting, thanks. I use remote ephemeral dev containers with isolated envs, so filesystem damage isn't really a concern as long as the PR looks good in review. Nice extra guardrail though, will add it to the project-level settings.
- tasn 21 minutes ago
  I use bbwrap to sandbox Claude. Works very well and gives me a lot of control and certainty around the sandbox.
- nurettin 57 minutes ago
  It will just do
```
    ssh you@localhost "rm -rf ~"
```
  [-]
  - PaulDavisThe1st 46 minutes ago
    Well, now it will ....
- mycall 2 hours ago
  I noticed codex has a sandbox, wondering if it has a comparable config section.
- what 19 minutes ago
  lol if you think Claude is smart enough to block sneaky path strings based on your config.
ray_v 1 hour ago
I'm wondering if the obvious (and stated) fact that the site was vibe-coded - detracts from the fact that this tool was hand written.
> jai itself was hand implemented by a Stanford computer science professor with decades of C++ and Unix/linux experience. (https://jai.scs.stanford.edu/faq.html#was-jai-written-by-an-...)
[-]
- mazieres 32 minutes ago
  Human author here. The fact that I don't know web design shouldn't detract from my expertise in operating systems. I wrote the software and the man page, and those are what really matter for security.
  The web site is... let's say not in a million years what I would have imagined for a little CLI sandboxing tool. I literally laughed out loud when claude pooped it out, but decided to keep, in part ironically but also since I don't know how to design a landing page myself. I should say that I edited content on the docs part of the web site to remove any inaccuracies, so the content should be valid.
  [-]
  - Nifty3929 19 minutes ago
    Indeed!
    Kinda reminds me of this: https://m.xkcd.com/932/
    I'm not a web UI guy either, and I am so, so happy to let an AI create a nice looking one for me. I did so just today, and man it was fast and good. I'll check it for accuracy someday...
- Quarrel 1 hour ago
  To be less abstract, it was written by David Mazieres, who was been writing software and papers about user level filesystems since at least 2000. He now runs the Stanford Secure Computer Systems group.
  David has done some great work and some funny work. Sometimes both.
- barishnamazov 31 minutes ago
  Sigh, I'd still have preferred a basic HTML page with hand-written succinct information instead of this crap verbosity.
BoppreH 3 hours ago
Excellent project, unfortunate title. I almost didn't click on it.
I like the tradeoff offered: full access to the current directory, read-only access to the rest, copy-on-write for the home directory. With stricter modes to (presumably) protect against data exfiltration too. It really feels like it should be the default for agent systems.
[-]
- fouc 2 hours ago
  Since the site itself doesn't really have a title, I probably would've went with something like "jai - filesystem containment for AI agents"
rsyring 1 hour ago
I've been reviewing Agent sandboxing solutions recently and it occurred to me there is a gaping vector for persistent exploits for tools that let the agent write to the project directory. Like this one does.
I had originally thought this would ok as we could review everything in the git diff. But, it later occurred to me that there are all kinds of files that the agent could write to that I'd end up executing, as the developer, outside the sandbox. Every .pyc file for instance, files in .venv , .git hook files.
ChatGPT[1] confirms the underlying exploit vectors and also that there isn't much discussion of them in the context of agent sandboxing tools.
My conclusion from that is the only truly safe sandboxing technique would be one that transfers files from the sandbox to the dev's machine through some kind of git patch or similar. I.e. the file can only transfer if it's in version control and, therefore presumably, has been reviewed by the dev before transfer outside the sandbox.
I'd really like to see people talking more about this. The solution isn't that hard, keep CWD as an overlay and transfer in-container modified files through a proxy of some kind that filters out any file not in git and maybe some that are but are known to be potentially dangerous (bin files). Obviously, there would need to be some kind of configuration option here.
1: https://chatgpt.com/share/69c3ec10-0e40-832a-b905-31736d8a34...
[-]
- mazieres 1 hour ago
  It's a good point. Maybe I should add an option to make certain directories read-only even under the current working directory, so that you can make .git/ read-only without moving it out of the project directory.
  You can already make CWD an overlay with "jai -D". The tricky part is how to merge the changes back into your main working directory.
  [-]
  - rsyring 36 minutes ago
    It's great that you have -D built into the tool already. That's a step in the right direction.
    I don't think the file sync is actually that hard. Famous last words though. :)
- jbverschoor 1 hour ago
  Yeah, never allow githooks ;)
gurachek 2 hours ago
The examples in the article are all big scary wipes, But I think the more common damage is way smaller and harder to notice.
I've been using claude code daily for months and the worst thing that happened wasnt a wipe(yet). It needed to save an svg file so it created a /public/blog/ folder. Which meant Apache started serving that real directory instead of routing /blog. My blog just 404'd and I spent like an hour debugging before I figured it out. Nothing got deleted and it's not a permission problem, the agent just put a file in a place that made sense to it.
jai would help with the rm -rf cases for sure but this kind of thing is harder to catch because its not a permissions problem, the agent just doesn't know what a web server is.
stavros 1 hour ago
I'd really like to try this, but building it is impossible. C++ is such a pain to build with the "`make`; hunt for the dependency that failed; `apt-get install whatever-dev`; goto make" loop...
Please release binaries if you're making a utility :(
[-]
- jbverschoor 1 hour ago
  https://github.com/jrz/container-shell
  It does something very simple, and it’s a POSIX shell script. Works on Linux and macOS. Uses docker to sandbox using bind mount
  [-]
  - stavros 1 hour ago
    Yeah but it doesn't COW anything else, and Docker is a bit heavy for this.
adi_kurian 2 hours ago
Claude's stock unprompted / uninspired UI code creates carbon clone components. That "jai is not a promise of perfect safety" callout box is like the em dash of FE code. The contrast, or lack thereof, makes some of the text particularly invisible.
I wonder if shitty looking websites and unambitious grammar will become how we prove we are human soon.
[-]
- NetOpWibby 2 hours ago
  Everything old is new again
e1g 1 hour ago
For jailing local agents on a Mac, I made Agent Safehouse - it works for any agent and has many sane default for developers https://agent-safehouse.dev
jbverschoor 1 hour ago
Interesting take on the same problem
I created https://github.com/jrz/container-shell which basically launches a persistent interactive shell using docker, chrooted to the CWD
CWD is bind mounted so the rest is simply not visible and you can still install anything you want.
Waterluvian 47 minutes ago
Are mass file deletions as result of some plausible “I see why it would have done that” or will it just completely randomly execute commands that really have nothing to do with the immediate goal?
triilman 2 hours ago
What would Jonathan Blow think about this.
[-]
- ghighi7878 2 hours ago
  My name is also jai
mazieres 3 hours ago
What would it take for people to stop recklessly running unconstrained AI agents on machines they actually care about? A Stanford researcher thinks the answer is a new lightweight Linux container system that you don't have to configure or think about.
[-]
- vardalab 2 hours ago
  unconstrained AI agents are what makes it so useful though. I have been using claude for almost a year now and the biggest unlock was to stop being a worrywart early on and just literally giving it ssh keys and telling it to fix something. ofc I have backups and do run it in VM but in that VM it helps me manage by infra and i have a decent size homelab that would be no fun but a chore without this assistant.
  [-]
  - kristofferR 45 minutes ago
    Agree, but SSH agents like 1Passwords are nice for that.
    You simply tell it to install that Docker image on your NAS like normal, but when it needs to login to SSH it prompts for fingerprint. The agent never gets access to your SSH key.
- mememememememo 2 hours ago
  Yes. It is like walking arounf your house with a flamethrower, but you added fire retardant. Just take the flamethower to a shed you don't mind losing. Which is some kind of cloud workspace most likely. Maybe an old laptop.
  Still if you yolo online access and give it cred or access to tools that are authenticated there can still be dragons.
  [-]
  - mazieres 1 hour ago
    The problem is that in practice, many people don't take the flamethrower to the shed. I recently had a conversation with someone who was arguing that you don't really need jai because docker works so well. But then it turned out this person regularly runs claude code in yolo mode without a container!
    It's like people think that because containers and VMs exist, they are probably going to be using them when a problem happens. But then you are working in your own home directory, you get some compiler error or something that looks like a pain to decipher, and the urge just to fire up claude or codex right then and there to get a quick answer is overwhelming. Empirically, very few people fire up the container at that point, whereas "jai claude" or "jai -D claude" is simple enough to type, and basically works as well as plain claude so you don't have to think about it.
  - cindyllm 2 hours ago
    [dead]
- fouc 2 hours ago
  except the big AI companies are pushing stuff designed for people to run on their personal computers, like Claude Cowork.
justinde 2 hours ago
.claude/settings.json: { "sandbox": { "enabled": true, "filesystem": { "allowRead": ["."], "denyRead": ["~/"], "allowWrite": ["."] } } }
Use it! :) https://code.claude.com/docs/en/sandboxing
mbreese 2 hours ago
This still is running in an isolated container, right?
Ignoring the confidentiality arguments posed here, I can’t help to think about snapshotting filesystems in this context. Wouldn’t something like ZFS be an obvious solution to an agent deleting or wildly changing files? That wouldn’t protect against all issue the authors are trying to address, but it seems like an easy safeguard against some of the problems people face with agents.
cozzyd 2 hours ago
Should be named Jia
More seriously, I'm not a heavy agent user, but I just create a user account for the agent with none of my own files or ssh keys or anything like that. Hopefully that's safe enough? I guess the risk is that it figures out a local privilege escalation exploit...
[-]
- timcobb 2 hours ago
  Dunno... with this setup it seems certain that the agent will discover a zero-day to escalate privilges and send your SSH keys to its handlers in N. Korea.
  P.S. Everything old is new again <3
  [-]
  - cozzyd 2 hours ago
    Yeah definitely a concern. Probably need a sandbox and separate user for defense in depth.
waterfisher 1 hour ago
There's nothing wrong with an AI-designed website, but I wish when describing their own projects that HN contributors wrote their own copy. As HN posters are wont to say, writing is thinking...
Jach 1 hour ago
I've done some experimenting with running a local model with ollama and claude code connecting to it and having both in a firejail: https://firejail.wordpress.com/ What they get access to is very limited, and mostly whitelisted.
simonw 2 hours ago
Suggestion for the FAQ page: does this work on a Mac?
rdevsrex 1 hour ago
This won't cause any confusion with the jai language :)
faangguyindia 1 hour ago
i just use seatbelt (mac native) in my custom coding agent: supercode
cozzyd 2 hours ago
Should definitely block .ssh reading too...
avazhi 1 hour ago
The irony is they used an LLM to write the entire (horribly written) text of that webpage.
When is HN gonna get a rule against AI/generated slop? Can’t come soon enough.
gonzalohm 2 hours ago
Not sure I understand the problem. Are people just letting AI do anything? I use Claude Code and it asks for permission to run commands, edit files, etc. No need for sandbox
messh 2 hours ago
How is this different than say bubblewrap and others?
[-]
- girvo 2 hours ago
  https://jai.scs.stanford.edu/comparison.html#jai-vs-bubblewr...
  > bubblewrap is more flexible and works without root. jai is more opinionated and requires far less ceremony for the common case. The 15-flag bwrap invocation that turns into a wrapper script is exactly the friction jai is designed to remove.
  Plus some other comparisons, check the page
kristofferR 1 hour ago
Also recommended:
https://github.com/kenryu42/claude-code-safety-net
charcircuit 2 hours ago
I want agents to modify the file system. I want them to be able to manage my computer if it thinks it's a good idea. If a build fails due to running out of disk space I want it to be able to find appropriate stuff to delete to free up space.
gerdesj 2 hours ago
Oh dear Lord! (pick your $DEITY)
Backups.
drtournier 2 hours ago
GPL v3…
[-]
- mememememememo 2 hours ago
  So?