Hyperfile
A unified file system used jointly by agents and humans — an agent reads a file as Markdown; a human receives the file itself.
1. Purpose
Hyperfile is a unified file system used jointly by agents and humans.
An agent reads a file as Markdown; a human receives the file itself. The same file serves both.
2. Model
A file is:
- a logical file — a stable handle: name, owner, identity;
- an immutable, content-addressed version chain — each edit appends a new version; bytes are stored by their hash;
- a lazily-derived Markdown representation for agents — produced on first read, then cached.
A file belongs to exactly one agent (and that agent's org). Files are per-agent private.
3. Architecture
The system is a small, stable core surrounded by replaceable engines.
Choosing a self-hosted or a third-party implementation for any engine is a configuration change; the core stays the same.
4. Core
The core is Hyperfile itself: the part every deployment shares.
| Module | Responsibility |
|---|---|
| Catalog | The record of what exists: files, version chains, ownership, and which derivations are cached. An archive's member manifest hangs off its file. |
| Convert-on-read | On first read of a representation, returns it from cache or calls the engine to produce it, then caches the result. Computes each derivation once under concurrency (single-flight) and tells a slow caller when to retry (Refresh). The sole caller of engines. |
| Access control | Decides whether an agent or a human may read, write, or share a file: an agent reaches its own files, and a human reaches a file if they may reach its owning agent, all within one org. Sharing is an explicit human act. |
5. Engines
Each engine has one implementation per deployment, selected by configuration. The same core runs whichever implementation is chosen.
| Module | Responsibility | Implementations |
|---|---|---|
| Storage | Stores and fetches bytes by content hash, dedupes globally, and issues short-lived access links. | R2, S3, self-hosted |
| Conversion | Turns a file into the agent's Markdown — bytes in, Markdown out. | PaddleOCR-VL (self-host), Reducto (managed), … |
| Unpacker | Expands an archive into its members (path, bytes); the core stores the members and records the manifest, members inherit the archive's permissions, and nested archives recurse. | libarchive, … |
6. Human Viewing
For a human, Hyperfile serves the file's raw bytes and leaves presentation to the client: the client displays what it can and downloads the rest.
7. Flows
A human uploads, an agent reads. The upload passes access control; the bytes become a content-addressed blob, and an immutable version is appended. Nothing is converted yet — conversion waits for the first read. When the agent reads the file, access control admits it; convert-on-read returns the cached Markdown, or, on a first read, produces it once and caches it, so concurrent reads share a single conversion.
An agent produces, a human downloads. The write passes access control — a write on behalf of an agent goes into that agent's space — the bytes become a content-addressed blob, and a version is recorded. The human sees the file in a listing filtered by access control, and opens it — the client displays it, or offers it as a download.
An agent opens an archive. An archive is stored like any other file, and unpacked on its first read into its members. The agent navigates the members by path and reads each as Markdown, the same way; members inherit the archive's permissions.
8. Principles
- Content addressing. Bytes are stored by hash and deduped globally — one byte pool shared across orgs.
- Isolation lives in the catalog and access control. A principal reaches bytes only through a file record they may read.
- Pluggability is configuration. Each engine has one implementation, swapped by config; a new content type is a new engine. The core stays the same.
- Privacy is which engine is wired, decided per deployment. A deployment that keeps private bytes in-boundary wires self-hosted engines.
- Markdown is derived on first read and cached.
9. Deferred and Open
- A viewer engine (rendering formats into a displayable form) and audio/video transcription arrive later — each a new engine the existing core accommodates.
- Data residency. Physical per-org partitioning of storage is a deployment choice, taken when compliance requires it.