APP.md

Posted 2026-02-14 • 6 min read

For over two years now I've been working on different AI projects — a personal wearable, an opensource coding agent, and now an experimental personal agent I'm calling Daycare. They started as completely separate ideas, but one by one they all ended up in the same place: the thing that actually matters is letting the LLM roam free. Every interesting capability — answering emails, writing code, managing infrastructure — eventually requires the agent to have real access to real systems. And the moment it does, you're holding your breath.

Artifacts, Lovable, vibecoding apps, OpenClaw — they are all essentially the same. They let LLMs loose on real systems and the results are fascinating. I just asked my own R&D version of OpenClaw to teach it to send and receive emails. It found the GitHub token, set my email to receive notifications in the repository and then created an issue — therefore achieving the goal of sending an email. For the receiving side it created a 10-minute email on two different web apps. Nobody told it how. It just figured it out.

This is what I want more of — LLMs roaming wildly across production systems, finding creative paths to get things done, composing tools in ways no one anticipated. The problem is obvious: the same creativity that finds a clever workaround to send an email can also delete your database. Most of the LLM code is not malicious. I think most of humanity's problems come not from malice but from stupidity, lack of attention, or other natural human things — and LLMs have learned from this. An LLM is not sufficiently sophisticated to break free completely (famous last words?), but it is more than capable of breaking things by accident. So the question is: how do you unleash this without the bad stuff happening?

Thankfully we have already solved this problem — just not for LLMs. Running untrusted code on mobile platforms and in browsers is the concept of sandboxed apps, and we've been refining it for decades. There are three approaches. The "web way" where anyone can do anything but you are completely isolated from everyone else and must implement a backchannel to connect between apps. The "Android way" where you have a true Linux sandboxed app with a permissions list baked into the binary — you read the list before launching and decide if you agree. And the "Apple way" where your app is completely separated and must ask for permission to access anything, one capability at a time. Apps used to roam wildly on the system doing whatever they wanted, but we tightened them up over the decades and now you can be comfortable knowing that some obscure software for some Chinese gadget you got from AliExpress won't exfiltrate everything from your iPhone. We need exactly the same feeling for LLM agents.

Only the Apple way survived and is flexible enough for what we need (Apple has other problems but they are more corporate than technical). Android had to rebuild its permission system to be runtime and abandon the old one. The web way is too limiting — each app must integrate with everyone separately (why do we need to do OAuth with everyone?). The only limitation of the Apple model is that it doesn't allow apps to communicate freely, but I think we can expand it to include this with an additional permission prompt. So the model is clear: LLM apps should have their own sandbox, a tight permission system, private and shared storage. An app's input is a prompt and its output is also text. In between, an app can write files to shared folders and tell the caller about them. Apps should be marked as single-threaded or thread-safe (the LLM can pick itself!). Apps have access to your inference providers. Apps are skills with extra prompts and guardrails.

But the sandbox alone is not enough — apps must have a user-configurable permission system. The vanilla GitHub App would be able to create repositories, delete them — but do you want your LLM to delete your production repo? You can see two UX styles emerging already: Codex is fire-and-forget, you don't even read what it does; Claude Code is the opposite, babysitting you through every diff. The YOLO style is clearly winning, but it only works if the permission layer underneath is solid. And that layer needs two kinds of rules. Hard rules — "do not call this specific command" — can be formalized in non-LLM code. Honest and reliable, but it won't scale. Soft rules are fuzzy: "don't do this on a non-work day", "don't do anything stupid". The incoming request can come from anything and you can't expect it to know your specific rules for interacting with GitHub. The fuzzy rules are the most important ones, because they represent intent. "Don't break things" can mean many things, but keeping everything as-is is crucial since you can improve reasoning and rules for your sandbox over time. You can distill "don't break things" into hard rules to harden security on the fly — the system gets smarter and safer the longer it runs.

Having a sandbox allows you to do wilder things — you can generate self-hosted small apps, safely deploy them as-is, and share them with people. These apps can then call other apps, or inference. And everything is truly composable: you don't need to define any interfaces, everything is just a prompt. Models can generate deterministic code if they want to be more efficient.

This composability is the real unlock. You don't need API specs, you don't need SDKs, you don't need to wait for someone to build an integration. A prompt goes in, text comes out, files land in shared storage. One app calls another app the same way a user would — by asking. The LLM becomes both the glue and the runtime.

And because everything is sandboxed, you can download apps from strangers. Just like you install random apps from the App Store without worrying they'll wipe your phone, you should be able to grab an LLM app someone published, give it minimal permissions, and let it run. The sandbox means you don't need to trust the author — you just need to trust the container. This is what unlocks a real ecosystem. Without it, every agent tool is either something you built yourself or something from a big vendor you're praying didn't ship a bug. With it, anyone can publish, anyone can install, and the permissions system handles the rest.

The dangerous path is what we have today — LLMs generating code that runs with full access to everything, where the only safety mechanism is a human skimming a diff they don't fully understand. The safe path is treating LLM-generated code the way we already treat untrusted mobile apps: sandboxed, permissioned, auditable. We solved this problem once. We just need to solve it again for agents.

@ex3ndr @ex3ndr @founders @ex3ndr

Everything is in Public Domain