Proposal: Unified Agent Architecture

I read through the entire stack: Pi’s extensions/skills/system-prompt, COT’s runner/tools/briefing/prompt/maintain-runner, the architecture analysis document, and every major extension. Here’s what I found.

The Core Problem: Two Systems, One Mechanism

Pi and COT both do the same thing at their core:

loop:
  LLM generates → tools execute → context accumulates → repeat

Both use createAgentSession from the Pi SDK. Both call session.prompt(). Both have tool registries, context management, error handling. But they share zero implementation above the SDK level:

Concept	Pi Implementation	COT Implementation
Tool registry	Extensions (loaded at startup, permanent)	`registry.ts` (factory pattern, on-demand)
bash/read/write/edit	Pi SDK built-in tools	Separate tool modules (import side-effects)
WhatsApp	`wa/index.ts` extension (CLI wrapper)	`wa-*.ts` tools (DB query wrappers)
Email	`google/index.ts` (API wrapper)	`email-*.ts` tools (DB query wrappers)
Memory	`agent-memory/index.ts` extension	`cot-memory.ts` + `search-agent-memory.ts`
Search	`unified-search/index.ts` extension	`unified-search.ts` tool
Agent delegation	`agents-runtime` (child processes)	`spawn_sub_agent` (in-process sessions)
Context management	Smart-read progressive modes	Sub-agent isolation
Guidance	AGENTS.md + 27 skills	242-line system prompt + briefing

Every capability is implemented twice. When you add a new data source, you build it twice. When you fix a bug, you fix it twice. This isn’t DRY — it’s two complete agent systems that happen to share a database.

The Five Deeper Problems

1. The guidance stack is inverted.

Pi loads ~10K tokens of system prompt (AGENTS.md) on every turn. It includes environment details, Codex delegation rules, tool routing decision trees, workspace conventions — most irrelevant to most turns. Then skills add another 2-7K tokens each when loaded. The model processes all of this before doing anything.

The architecture doc proved: 52% of sessions are <5K chars with 1.6 tools. The system is optimized for the 0.5% complex case while every session pays the cost.

2. Tool loading is all-or-nothing.

Pi loads 51+ tools permanently. Each tool definition costs tokens in the cached prefix — and more importantly, costs model attention on every turn (the model must consider all tools before choosing). The architecture doc showed 4 tools account for 70% of usage. The other 47 tools are loaded for every session but used rarely.

COT loads all 65+ tools too, but organized into groups. Only COT_PROCESS_TOOLS + ORCHESTRATOR_READ_TOOLS go to the main session. Sub-agents get scoped subsets. This is better but still manual and static.

3. Context has no economic model.

Neither system gives the model information about its own context budget. The read tool consumes 52.2% of all tool result bytes. Smart-read added progressive disclosure modes (peek/head/grep/structure), but the model doesn’t know why it should use them. It doesn’t know how much budget remains, what it’s already consumed, or what the cost of a mode: full read will be.

4. Agent composition is three different systems.

Pi’s agent_spawn creates child pi processes (full extension loading, DB-backed lifecycle, message passing)
COT’s spawn_sub_agent creates lightweight in-process sessions (shared tool deps, no DB, synchronous)
Pi’s Codex extension creates OpenAI Codex sessions (external API, sandbox, context management)

These can’t interoperate. A Pi child agent can’t use COT’s database tools. A COT sub-agent can’t use Pi’s browser extension. Codex can’t do either. They’re three islands.

5. The ESA pattern is ceremony, not architecture.

COT’s system prompt mandates Explore→Synthesize→Act every run. This makes sense when there’s lots of new data to process. But when nothing’s changed? The model still goes through the motions — spawning empty explorers, synthesizing nothing, writing the same scratchpad. The pattern is burned into the prompt, not emergent from the situation.

The Proposal: Sessions All the Way Down

Core Insight

Every “agent” in the current system is the same thing: an LLM session with tools, a context budget, and a lifecycle policy. The differences are configuration, not architecture.

Current Thing	What It Actually Is
Pi interactive session	Session(trigger=human, tools=universal, trust=high, lifecycle=multi-turn)
COT process run	Session(trigger=timer, tools=orchestrator, trust=medium, lifecycle=stateful-one-shot)
COT sub-agent	Session(trigger=parent, tools=scoped, trust=inherited, lifecycle=one-shot)
COT maintain worker	Session(trigger=queue, tools=task-specific, trust=low, lifecycle=one-shot)
Pi child agent	Session(trigger=parent, tools=inherited, trust=inherited, lifecycle=multi-turn)
Codex delegation	Session(trigger=parent, tools=sandboxed, trust=sandboxed, lifecycle=multi-turn)

They should all be the same runtime with different configuration.

Five Primitives

The entire system reduces to five concepts:

1. Session (the core loop)

interface Session {
  id: string;
  model: Model;
  tools: Tool[];
  budget: ContextBudget;
  lifecycle: Lifecycle;
  trust: TrustLevel;
  state: SessionState;

  prompt(input: string): AsyncIterable<Event>;
  spawn(contract: SessionContract): Session;
  dispose(): void;
}

One loop. One implementation. Every “agent” type uses it.

The critical difference from what exists today: budget is a first-class concept, not an afterthought. The session knows how much context it has consumed, what percentage remains, and can make intelligent decisions (or expose this to the model).

2. Capability (tools + guidance as a unit)

interface Capability {
  name: string;                    // e.g., "email", "whatsapp", "code-editing"
  tools: ToolDefinition[];         // the tools this capability provides
  guidance: string;                // usage guidance injected with the tools
  dependencies?: string[];         // other capabilities this requires
  cost: number;                    // token cost of loading this capability
}

This replaces three separate concepts:

Pi extensions (tool providers)
Pi skills (usage guidance)
COT tool groups (named tool sets)

A capability is tools + the knowledge to use them correctly, loaded as one unit. When you load “email,” you get email_read, email_search, email_list_threads AND the guidance about using snippets instead of full bodies, two-pass patterns, triage integration.

This kills the current problem where tools exist in extensions but their usage guidance lives in skills. They should never be separated.

Capability tiers:

Tier	Loaded When	Examples
Core	Always	bash, read, write, edit, memory
Domain	When session contract includes them	email, whatsapp, calendar, web, search
Specialist	When explicitly requested	browser, codex, agents, garmin
Ephemeral	Created for specific contexts	database query wrappers, workflow-specific tools

3. Profile (session configuration template)

interface Profile {
  name: string;
  capabilities: string[];          // which capabilities to load
  trust: TrustLevel;               // approval requirements
  lifecycle: LifecycleConfig;      // how the session starts, runs, ends
  budget: BudgetConfig;            // context limits, compaction strategy
  guidance: string;                // profile-specific system prompt
}

Profiles replace the current fragmented configuration:

Pi’s AGENTS.md → “interactive” profile guidance
COT’s system prompt → “orchestrator” profile guidance
COT’s maintain worker prompt → “worker” profile guidance

const PROFILES = {
  interactive: {
    capabilities: ["core", "memory", "search"],
    // other capabilities loaded on demand
    trust: "high",
    lifecycle: { type: "multi-turn", humanInLoop: true },
    budget: { compactAt: 0.7, defaultReadMode: "peek" },
    guidance: "..." // minimal — human is watching
  },
  
  orchestrator: {
    capabilities: ["core", "memory", "delegation", "orchestrator-tools"],
    trust: "medium",
    lifecycle: { type: "stateful-one-shot", stateStore: "scratchpad" },
    budget: { compactAt: 0.6, defaultReadMode: "structure" },
    guidance: "..." // ESA-like but advisory, not mandatory
  },
  
  worker: {
    capabilities: ["core"],
    // additional capabilities specified per-task
    trust: "low",
    lifecycle: { type: "one-shot", maxTurns: 10 },
    budget: { compactAt: 0.5 },
    guidance: "Execute the task. Return structured results. No exploration."
  },
  
  child: {
    capabilities: [], // inherited from parent + task-specific
    trust: "inherited",
    lifecycle: { type: "scoped", maxTurns: 5 },
    budget: { inherit: "parent-remaining" },
    guidance: "" // set by parent
  }
};

4. State (persistence across sessions)

interface StateStore {
  // Ephemeral (within session)
  context: Message[];              // conversation history
  
  // Persistent (across sessions)  
  memory: MemoryStore;             // long-term knowledge (current system)
  scratchpad: ScratchpadStore;     // inter-run state (COT's current approach)
  queue: QueueStore;               // background task queue (maintain)
}

The current memory system works. Keep it. The scratchpad pattern (inter-run state for autonomous sessions) works. Keep it. The maintain queue works. Keep it.

What changes: these are all accessible through the same state interface, not through separate tool implementations per system.

5. Budget (context economics)

interface ContextBudget {
  total: number;                   // model's context window
  used: number;                    // current consumption
  remaining: number;               // total - used
  percentUsed: number;             // 0-100
  
  // Exposed to the model via tool results
  report(): BudgetReport;
  
  // Automatic policies
  compactAt: number;               // percentage threshold
  readDefault: ReadMode;           // default mode for read tool
  
  // Adaptive behavior
  suggestReadMode(fileSize: number): ReadMode;
  shouldDelegate(estimatedCost: number): boolean;
}

This is the genuinely new thing. Neither system has it today.

Every tool result includes a budget footer:

[Context: 47% used | 106K remaining | Read mode: peek recommended]

The model can then make informed decisions without explicit instructions. It doesn’t need 500 tokens of AGENTS.md telling it when to use peek mode — it can see the budget and reason about it.

How Current Systems Map to the New Model

Pi Interactive → Profile(“interactive”)

Before: AGENTS.md (10K tokens) + 51 permanent tools + 27 skills
After:  Profile guidance (2K tokens) + 5 core tools + capabilities loaded on demand

AGENTS.md gets decomposed:

Environment info → loaded only when relevant (SSH, cross-machine tasks)
Codex rules → part of the “codex” capability, loaded only when delegating
Tool routing → eliminated (the model sees only tools it needs)
Workspace conventions → part of the “code-editing” capability
Memory instructions → part of the “memory” capability (already loaded as core)

Skills get merged into capabilities:

research skill + web_search/web_fetch tools → “research” capability
whatsapp skill + wa tools → “whatsapp” capability
github skill + git tools → “github” capability

The model starts with 5 tools and ~2K guidance. It asks for more when it needs them (or the system auto-loads based on the conversation).

COT Process → Profile(“orchestrator”)

Before: 242-line system prompt + 65+ tools + hardcoded ESA + briefing builder
After:  Profile guidance (advisory ESA) + orchestrator capabilities + briefing as input

The ESA pattern becomes advisory, not mandatory:

You receive a briefing with the current state of Aaron's digital life.
Your goal: surface what needs attention, maintain state, queue background work.

When data is large or complex, delegate reads to child sessions to 
preserve your context for synthesis and action. When the situation is 
simple, act directly.

Same tools, but loaded as capabilities:

Core: bash, read, write, edit, memory
Orchestration: scratchpad, priorities, telegram, queue
Data reading: email, whatsapp, calendar, garmin (loaded when briefing shows data)

The briefing builder stays. It’s good. But it now also includes budget information.

COT Sub-agents → Session.spawn(contract)

Before: spawn_sub_agent with manual tool list and 5 params
After:  session.spawn({ capabilities: ["email"], prompt: "...", maxTurns: 3 })

Same mechanism, cleaner contract. The sub-agent gets the email capability (tools + guidance) instead of a raw tool list. Trust and budget are inherited.

COT Maintain Workers → Profile(“worker”)

Before: maintain_runner.ts with task claiming, model resolution, pipeline execution
After:  Queue trigger → Session(profile="worker", capabilities=task.capabilities)

The maintain runner’s lifecycle management (claiming, locking, heartbeats, retries) wraps around a standard session. The session itself is identical to any other.

Pi Child Agents → Session.spawn(contract)

Before: agent_spawn with pi process launch, DB lifecycle, message passing
After:  session.spawn({ capabilities: [...], prompt: "...", lifecycle: "multi-turn" })

The agents-runtime’s DB-backed lifecycle and message passing become the implementation of session.spawn for the multi-turn case. Lightweight spawns (one-shot) don’t need DB tracking.

Codex → Session.spawn(contract) with external provider

Before: codex_start/codex_turn/codex_stop with 14 extension tools
After:  session.spawn({ provider: "openai/codex", capabilities: ["code-editing"], sandbox: true })

Codex becomes just another session with a different model provider and sandbox constraints. The context management rules (the codex skill’s hard-won empirical data) become part of the Codex capability’s guidance.

The Capability Registry (Concrete Design)

This is the most important new abstraction. It replaces extensions + skills + COT tool groups.

Structure

capabilities/
  core/                     # Always loaded
    index.ts                # bash, read, write, edit
    guidance.md             # "Use read with mode:peek for large files..."
    
  memory/                   # Always loaded
    index.ts                # memory_search, memory_read, memory_write
    guidance.md             # "Search before writing. Concrete examples..."
    
  email/                    # Loaded when needed
    index.ts                # email_read, email_search, email_list_threads
    guidance.md             # "Use snippets, not full bodies. Two-pass pattern..."
    pi-adapter.ts           # Gmail API implementation (for Pi)
    cot-adapter.ts          # DB query implementation (for COT)
    
  whatsapp/
    index.ts
    guidance.md
    pi-adapter.ts           # wa CLI wrapper
    cot-adapter.ts          # DB query wrapper
    
  orchestration/            # COT-specific
    index.ts                # scratchpad, priorities, telegram, queue
    guidance.md
    
  code-editing/             # Pi-specific
    index.ts                # enhanced read modes, search, find
    guidance.md             # workspace conventions, file patterns
    
  research/
    index.ts                # web_search, web_fetch
    guidance.md             # "Verify docs before relying..."
    
  agents/
    index.ts                # session.spawn
    guidance.md             # delegation patterns, when to parallelize

Key Design Decision: Adapter Pattern for Data Access

The biggest duplication today is data access. Pi reads email via Gmail API (google extension). COT reads email via PostgreSQL queries (tool-builder). Same data, different access patterns.

The capability system uses adapters:

interface EmailCapability extends Capability {
  tools: [
    { name: "email_read", execute: (params) => adapter.readThread(params) },
    { name: "email_search", execute: (params) => adapter.search(params) },
    { name: "email_list", execute: (params) => adapter.list(params) },
  ];
}

// Pi context: use Gmail API directly
class GmailAdapter implements EmailAdapter {
  async readThread(params) { /* Gmail API call */ }
}

// COT context: use synced DB (faster, no API quota)
class CotEmailAdapter implements EmailAdapter {
  async readThread(params) { /* SELECT FROM cot.emails */ }
}

The tools and guidance are identical. Only the data access layer changes. This eliminates the entire duplication problem.

Context Budget: The Real Innovation

The architecture doc identified read as the 52.2% context killer. Smart-read was a good patch. But the real fix is making the model budget-aware.

How It Works

class ContextBudget {
  private total: number;
  private consumed: number = 0;
  
  addToolResult(result: string): string {
    const cost = estimateTokens(result);
    this.consumed += cost;
    
    // Append budget report to every tool result
    const report = this.formatReport();
    return result + "\n\n" + report;
  }
  
  formatReport(): string {
    const pct = Math.round(this.consumed / this.total * 100);
    const remaining = this.total - this.consumed;
    
    if (pct < 50) return `[Budget: ${pct}% used | ${remaining} tokens remaining]`;
    if (pct < 70) return `[⚠️ Budget: ${pct}% used | Consider using peek/grep modes]`;
    if (pct < 85) return `[🔴 Budget: ${pct}% used | Delegate large reads to child sessions]`;
    return `[🚨 Budget: ${pct}% used | Compact or finish soon]`;
  }
  
  suggestReadMode(fileSize: number): ReadMode {
    const costEstimate = fileSize / 4; // rough token estimate
    const remainingBudget = this.total - this.consumed;
    
    if (costEstimate < remainingBudget * 0.05) return "full";   // <5% of remaining
    if (costEstimate < remainingBudget * 0.15) return "head";   // <15% of remaining  
    return "peek";                                                // large relative to budget
  }
}

The model doesn’t need instructions about when to use peek mode. It sees:

[🔴 Budget: 73% used | Delegate large reads to child sessions]

And it reasons about it naturally. This is cheaper and more reliable than 500 tokens of guidance in AGENTS.md.

Budget-Aware Read Tool

const readTool = {
  name: "read",
  execute: async (params, { budget }) => {
    const stats = await fs.stat(params.path);
    const suggestedMode = budget.suggestReadMode(stats.size);
    
    // If user specified full but budget suggests otherwise, warn
    if (params.mode === "full" && suggestedMode !== "full") {
      // Still honor the request, but include warning
      const content = await readFull(params.path);
      return budget.addToolResult(
        content + `\n\n[Note: This file consumed ~${estimateTokens(content)} tokens. ` +
        `Suggested mode was '${suggestedMode}' given current budget.]`
      );
    }
    
    const mode = params.mode ?? suggestedMode;
    const content = await readWithMode(params.path, mode, params);
    return budget.addToolResult(content);
  }
};

Guidance Hierarchy (Replacing AGENTS.md + Skills)

The current guidance stack:

AGENTS.md (10K tokens, always loaded)
  → Extension tool descriptions (2K tokens, always loaded)
    → Skills (2-7K tokens, loaded on demand)
      → Memory (retrieved on demand)

The proposed stack:

Profile guidance (1-2K tokens, always loaded)
  → Core capability guidance (500 tokens, always loaded)
    → Domain capability guidance (loaded with capability)
      → Memory (retrieved on demand)
        → Budget signals (appended to tool results)

What Goes Where

Profile guidance (always loaded, <2K tokens):

Identity (who am I, what’s my role)
Core behavior (be direct, verify before asserting)
Trust boundaries (what needs approval)
Budget awareness (how to read budget signals)

Core capability guidance (always loaded, <500 tokens):

Read: progressive disclosure modes exist, budget suggests the right one
Memory: search before writing, concrete examples are better than abstract rules

Domain capability guidance (loaded with capability, 500-2K each):

Email: use snippets not bodies, two-pass pattern, triage integration
WhatsApp: message format, chat lookup patterns
Research: verify docs, temporal markers, cross-reference
Code editing: workspace conventions, file naming, version management

NOT in guidance anymore (moved to memory or eliminated):

Environment details (detect at runtime, or query memory when needed)
Codex delegation rules (loaded only when using Codex capability)
Cross-machine coordination (loaded only when SSH is relevant)
Detailed tool routing trees (the model sees only relevant tools)

What About Delegation?

The current system has three delegation mechanisms. The proposal has one: session.spawn(contract).

The Spawn Contract

interface SpawnContract {
  // What it gets
  capabilities: string[];        // which capabilities to load
  prompt: string;                // the task
  
  // How it runs
  model?: string;                // default: inherited or tier-appropriate
  maxTurns?: number;             // default: 5
  trust?: TrustLevel;            // default: inherited
  budget?: BudgetAllocation;     // default: split from parent
  
  // How it returns
  maxOutput?: number;            // truncate result for parent's context
  structured?: boolean;          // expect JSON output
  
  // Lifecycle
  sync?: boolean;                // wait for result (default: true)
  persist?: boolean;             // DB-backed lifecycle (default: false)
}

This replaces:

spawn_sub_agent({ model, tools, prompt, max_turns, max_output })
agent_spawn({ task, tools, model, thinking, timeout, ... })
codex_start({ cwd, instructions, model, sandbox })
run_workflow(name, context)
maintain_write({ task_type, tools, prompt, model, ... })

All five become different configurations of session.spawn():

// COT sub-agent: lightweight, synchronous, scoped
session.spawn({
  capabilities: ["email"],
  prompt: "Summarize unread threads",
  model: "sonnet",
  maxTurns: 3,
  sync: true
});

// Pi child agent: multi-turn, persistent, full capabilities
session.spawn({
  capabilities: ["core", "search", "web"],
  prompt: "Research X and write a report",
  maxTurns: 50,
  persist: true,  // DB-backed lifecycle
  sync: false     // parent continues
});

// Codex delegation: sandboxed, external provider
session.spawn({
  capabilities: ["code-editing"],
  prompt: "Create these 5 files...",
  model: "openai/codex",
  trust: "sandboxed",
  maxTurns: 20
});

// Background worker: queued, one-shot, isolated
session.spawn({
  capabilities: ["research", "memory"],
  prompt: "Find papers on topic X",
  model: "opus",
  sync: false,
  persist: true,  // survives parent session
  scheduled: "next-maintain-cycle"
});

Implementation Strategy

Under the hood, spawn routes to the right implementation:

sync: true, persist: false → in-process session (current spawn_sub_agent approach)
sync: false, persist: false → child process (current agents-runtime approach)
sync: false, persist: true → queue-backed (current maintain approach)
model: "openai/codex" → Codex bridge (current codex extension approach)

The implementations exist. They just need a unified interface.

What’s Actually New (vs. What the Architecture Doc Proposed)

The architecture doc proposed 7 innovations. This proposal agrees with some, diverges on others:

Architecture Doc	This Proposal	Difference
Progressive read modes	✅ Keep (smart-read exists)	+ Budget-aware auto-selection
Token-aware tool loading	✅ Capability system	Capabilities = tools + guidance, not just tools
Context % compaction	✅ Budget-driven	+ Model sees its own budget in real-time
Skill triggers	❌ Replace with capabilities	Capabilities auto-load when session contract specifies domain
Memory auto-proposal	✅ Keep as post-session hook	No change needed
Trust gradient	✅ Profile system	Profiles are richer than just trust
Read tool analytics	✅ Budget tracking provides this	Natural telemetry from budget system

What’s Genuinely New Here

Capabilities as tools + guidance (not separate). This is the biggest architectural change. Currently tools live in extensions and guidance lives in skills. They must always be loaded together, never separated.
Budget as a runtime concept the model can see. Not just token counting for compaction. The model receives budget signals in every tool result and can reason about its own resource constraints.
Adapter pattern for data access. Same capability interface, different backends (API vs DB). Eliminates the entire Pi-vs-COT tool duplication.
Unified spawn contract. One interface for all delegation patterns. The implementation varies, but the model only sees one tool.
Profile-driven session configuration. Not two separate systems with different code, but one system with different profiles.

Migration Path

This isn’t a rewrite. It’s a convergence. Each step delivers value independently.

Phase 1: Capability Extraction (Week 1-2)

Extract the first capability from the overlap zone: memory.

capabilities/memory/
  tools.ts      → memory_search, memory_read, memory_write (from agent-memory extension)
  guidance.md   → extracted from memory-architect skill + AGENTS.md memory section
  adapter.ts    → PostgreSQL (shared by Pi and COT)

Both Pi and COT load the same capability. Pi via extension wrapper, COT via tool registration. Same tools, same guidance, one implementation.

Then: email, whatsapp, search. Each extraction eliminates one duplication.

Phase 2: Budget System (Week 2-3)

Add ContextBudget to the session. Doesn’t require any architectural change — it’s a wrapper around token counting that appends budget reports to tool results.

Measurable impact: reduced context blow from read tool (the 52.2% killer).

Phase 3: Profile System (Week 3-4)

Create profiles for interactive and orchestrator. Extract guidance from AGENTS.md and COT’s system prompt into profiles + capabilities.

AGENTS.md shrinks from ~10K tokens to ~2K. COT’s prompt shrinks similarly. The rest moves into capability guidance loaded on demand.

Phase 4: Unified Spawn (Week 4-5)

Create the spawn contract interface. Implement it as a thin adapter over existing mechanisms:

In-process → current spawn_sub_agent code
Child process → current agents-runtime code
Queue-backed → current maintain_write code
Codex → current codex_start/codex_turn code

The model sees one tool. The implementation dispatches to the right backend.

Phase 5: Adapter Pattern (Week 5-6)

For capabilities with dual implementations (email, whatsapp, calendar), create the adapter interface. Pi uses API adapters, COT uses DB adapters. Same tools, same guidance, different data access.

Rollback

Every phase is independently reversible:

Phase 1: Keep old extension/tool alongside capability
Phase 2: Budget appending can be toggled off
Phase 3: Profiles are additive (old prompts still work)
Phase 4: Spawn wrapper delegates to existing code
Phase 5: Adapters are behind the same tool interface

What NOT to Change

Some things in the current system work well. Don’t touch them.

PostgreSQL as the shared data store. Works. Don’t add anything.
Memory search/read/write pattern. 20% adoption. The API is right.
COT’s briefing builder. 14 parallel queries → structured text. Good engineering.
Smart-read’s progressive modes. Keep all of them. Add budget awareness on top.
COT’s advisory lock + heartbeat + zombie cleanup. Production-grade lifecycle management.
The maintain queue FSM. DB-enforced state machine. Don’t reinvent.
Process event logging. Structured telemetry. Keep exactly as-is.

What This Enables (That’s Currently Impossible)

Cross-system capability sharing. Build a new data source once, both Pi and COT use it.
Dynamic capability loading in COT. Currently COT loads all tools upfront. With capabilities, it loads what the briefing indicates it needs.
Budget-aware model behavior. The model adapts its read strategy to remaining context without explicit instructions.
Unified delegation. “Spawn a worker” means the same thing whether called from Pi or COT.
Incremental system prompt. Instead of 10K tokens always, 2K base + capabilities loaded on demand. Interactive sessions that only do file editing never pay for email/whatsapp/research guidance.
Profile switching. Same runtime can serve interactive and autonomous modes. Test autonomous behavior interactively. Debug COT patterns in Pi.

The Honest Assessment

This proposal is right about:

Capability = tools + guidance (the separation is the root cause of duplication)
Budget as a first-class concept (the model should know its own resource constraints)
Adapter pattern for data access (eliminates the Pi/COT tool duplication)
Profiles over separate systems (same mechanism, different configuration)

This proposal might be wrong about:

Unified spawn may add complexity without value. The four delegation mechanisms serve genuinely different needs. Wrapping them in one interface could obscure important differences (sync vs async, isolated vs shared state, sandboxed vs trusted).
Capability auto-loading might be premature. The current explicit skill loading works. Auto-detection could be unreliable or load unnecessary capabilities.
Budget signals might be noise. If appended to every tool result, the model might learn to ignore them. The signal-to-noise ratio matters.
The migration might not converge. Incremental convergence sounds clean but could result in a system that’s neither old nor new — just two systems with a compatibility layer.

The risk:

The biggest risk is building the compatibility layer and never completing the convergence. You end up with three systems instead of two: Pi, COT, and “unified capabilities” that neither fully uses. The migration phases must each deliver standalone value, or stop after phase 2 (budget system).

Written on February 28, 2026