Loading
Personal Project
Claude Versatile

Claude Versatile is a multi-model AI orchestration framework that lets Claude Code act as the primary controller, delegating sub-tasks to external AI models (OpenAI GPT, Grok, future Gemini) through the Model Context Protocol (MCP). I built this system to solve a real problem in AI-assisted development: no single model excels at everything. By keeping Claude in control while routing specialized work to the best-fit model, the framework combines the strengths of multiple AI providers without sacrificing the unified development experience.

Codex Delegation Demo

0:00
/0:00
User delegating a code analysis task to OpenAI Codex through Claude Code, where Claude assembles context, routes the request via MCP, and presents the result

Two-Layer Architecture

The system splits into two layers, each targeting a different complexity level. Layer 1 handles lightweight, single-shot API calls (code review, web search, generation). Layer 2 runs an autonomous Agent with its own reasoning loop for complex multi-step analysis that would exceed a single API call’s capacity.

flowchart TD
    subgraph Claude["Claude Code (Orchestrator)"]
        CC[Claude Code CLI]
        SK[Skills Layer]
    end

    subgraph L1["Layer 1: Direct API Calls"]
        MCP1[codex MCP Server]
        MCP2[grok MCP Server]
        MCP3[future providers...]
    end

    subgraph L2["Layer 2: Agent Delegation"]
        AMCP[agent MCP Server]
        subgraph Worker["Agent Worker Process"]
            P[Planner]
            CM[Context Manager]
            RT[Read-Only Tools]
        end
    end

    subgraph Models["External AI Models"]
        GPT[OpenAI GPT-5.4]
        GRK[Grok-4]
        GMN[Gemini...]
    end

    CC --> SK
    SK --> MCP1
    SK --> MCP2
    SK --> MCP3
    CC --> AMCP
    AMCP --> Worker
    MCP1 --> GPT
    MCP2 --> GRK
    MCP3 --> GMN
    Worker --> GPT
    Worker --> GRK
    P --> CM
    CM --> RT

All external models are strictly read-only. They cannot modify files, run shell commands, or access git. Every code suggestion returns as plain text for Claude to review and apply. This preserves Claude Code’s rewind mechanism for full rollback capability.

Declarative Provider Framework

Adding a new AI model provider to the system requires roughly 25 lines of code. I designed a defineProvider() lifecycle framework that handles configuration loading, environment variable injection, client creation, and error mapping automatically. Developers only implement the onRegisterTools hook to define their MCP tools.

// A complete MCP Server for any OpenAI-compatible API
defineProvider({
    type: "openai",
    name: "claude-versatile-codex",
    version: "0.3.0",
    configFile: "codex.agent.json",
    onRegisterTools(server, ctx) {
        server.tool("codex_chat", schema, async (params) => {
            const result = await ctx.complete({
                model: params.model,
                messages: [{ role: "user", content: params.prompt }],
            });
            return { content: [{ type: "text", text: result.content }] };
        });
    },
});

The framework supports two provider types: "openai" for OpenAI-compatible APIs (automatic config and client handling) and "native" for custom SDK integrations (user implements onCreateClient). The lifecycle flows through four stages, each with sensible defaults that can be selectively overridden:

flowchart LR
    A["onLoadConfig"] --> B["onCreateClient"]
    B --> C["onRegisterTools"]
    C --> D["onServerReady"]

For OpenAI-compatible providers, the onRegisterTools hook receives a context object with ctx.complete(), a convenience method that encapsulates the full pipeline of message building, completion execution, usage formatting, and error mapping in a single call. Native SDK providers get full control over client creation and tool registration while the framework still handles config loading and server startup.

Request Flow

When Claude delegates a task, the request flows through a well-defined pipeline. The MCP Server lazily initializes its API client on the first tool call (so the server can start without a valid API key), normalizes the response into a CompletionResult format, and maps any provider-specific errors to user-friendly MCP responses.

sequenceDiagram
    participant U as User
    participant C as Claude Code
    participant M as MCP Server
    participant P as Provider API

    U->>C: "Use codex to review this function"
    C->>C: Select tool: codex_chat
    C->>M: MCP tool call (prompt, model, params)
    M->>M: Load config from .versatile/
    M->>M: Initialize client (lazy)
    M->>P: chat.completions.create()
    P-->>M: CompletionResult (content, usage)
    M-->>C: MCP response (text + usage footer)
    C->>C: Review result, decide next action
    C-->>U: Present findings

Data-Driven Model Routing

Adding support for a new AI provider requires exactly one line in the route table. The MODEL_ROUTES configuration maps model name prefixes to their API credentials (config file paths and environment variable names). The Agent’s collectEnv() function traverses this table to dynamically load all provider configurations, and the Worker’s createProviderFromEnv() uses resolveModelRoute() to create the correct CompletionProvider at runtime.

graph LR
    M[model name] --> R{MODEL_ROUTES}
    R -->|"grok-*"| G[grok.agent.json]
    R -->|"default"| O[codex.agent.json]
    G --> P[CompletionProvider]
    O --> P
    P --> Planner

This means users can switch between GPT-5.4, Grok-4, or any future model by simply passing a model parameter. No code changes, no server restarts. The Planner depends on the CompletionProvider interface rather than any specific SDK, so future non-OpenAI adapters (Gemini, Claude native) can plug in without changing Planner code.

Autonomous Agent with Adaptive Control

The Agent runs as an independent child process with its own LLM-driven ReAct loop (Think, Act, Observe, Decide). It reads files, searches patterns, builds understanding incrementally, and returns structured analysis. Process isolation means a crashed Agent never takes down the MCP Server.

Agent ReAct Loop

The MCP Server forks a Worker process via IPC. The Worker drives the Planner, which calls the external LLM with OpenAI function calling (tools parameter + tool_calls response) rather than asking models to hand-write XML or JSON. This is critical for reasoning models (gpt-5.4, o1, o3) where the content field is often null, but tool_calls always returns correctly.

sequenceDiagram
    participant C as Claude Code
    participant M as MCP Server
    participant W as Worker Process
    participant LLM as External LLM

    C->>M: agent_execute(goal, model, autoMode)
    M->>W: fork() + IPC start

    opt autoMode enabled
        W->>LLM: Prompt with goal + plan tool
        LLM-->>W: tool_calls: plan(estimated_steps)
        W->>W: Set effectiveMax = ceil(estimated * 1.5)
    end

    loop ReAct Loop
        W->>LLM: Goal + context + tool definitions
        LLM-->>W: tool_calls: read_file / search_pattern / ...
        W->>W: Execute read-only tools
        W->>W: Update context (sliding window + summary)
        W->>W: Check: iterations, tokens, repetition
        W->>M: IPC status update

        alt LLM calls done tool
            W->>M: IPC complete(result)
        end
    end

    M-->>C: Formatted result (summary, files, tokens)

The Agent’s built-in tool set is strictly read-only: read_file, list_dir, search_pattern, plan (autoMode only), and done. All paths are validated with resolveSafePath() to prevent directory traversal. The Context Manager uses a sliding window with automatic summarization (triggered at 80% of max context, compresses to 50%) to prevent token explosion on long-running tasks.

I implemented a two-layer adaptive iteration control system (autoMode) that automatically manages how long the Agent runs:

L1 Complexity Estimation

The Agent calls a plan tool on its first iteration, outputting an estimated_steps count. The Planner dynamically sets effectiveMaxIterations = ceil(estimated * 1.5). This prevents simple tasks from burning through unnecessary iterations while giving complex tasks room to breathe.

L2 Runtime Guards

L2 guards operate continuously during execution:

  • Repetition Detection tracks the last 5 tool calls in a sliding window. Two consecutive identical calls trigger a redirect message. Two redirects force termination.
  • Token Budget caps cumulative token consumption (default 100k). The Agent stops gracefully when the budget is exhausted.
stateDiagram-v2
    [*] --> Planning: agent_execute(goal)
    Planning --> Executing: plan tool sets effectiveMax
    Executing --> Thinking: ReAct Loop
    Thinking --> Acting: Select tool
    Acting --> Observing: Execute tool
    Observing --> Thinking: Continue reasoning

    Thinking --> Done: done tool called
    Thinking --> Terminated: Max iterations reached
    Observing --> Terminated: Token budget exceeded
    Acting --> Redirected: Repetition detected
    Redirected --> Thinking: Inject redirect hint
    Redirected --> Terminated: 2nd redirect
    Terminated --> [*]
    Done --> [*]

When autoMode is disabled, the system falls back to a fixed maxIterations count, and the plan tool is hidden from the LLM.

Skill Encapsulation and Configuration

Skill Layer

Skills are optional behavior orchestration layers that sit on top of MCP tools. While the tools themselves are self-describing and work without Skills, the Skill layer adds intelligent context assembly and result presentation.

The codex-task Skill automatically collects relevant code context (file contents, project structure, dependency graphs), assembles a structured prompt with appropriate token budgets, and delegates to codex_chat. The grok-search Skill analyzes search intent (factual, news, technical, comparative, exploratory), optimizes the query, selects a matching system prompt, and delegates to grok_search.

Skills are defined as YAML/Markdown files in .claude/skills/, making them version-controllable and shareable. The separation between MCP (tool capability) and Skill (behavior orchestration) keeps each layer focused: MCP Servers can be installed independently via npm, while Skills provide the optional intelligence layer.

Unified Configuration and Timeout Strategy

All configuration lives in a .versatile/ directory (gitignored), with one JSON file per provider. Values are resolved through a three-level priority chain:

flowchart LR
    A[".versatile/*.json"] -->|not found| B["process.env"]
    B -->|not found| C["Hardcoded default"]

Missing files are auto-generated from templates on first run. Placeholder API keys (YOUR_API_KEY_HERE) are detected and treated as missing, falling back to environment variables with a warning. This means the server process can launch and register tools even before the user has configured their API key.

The timeout system operates at two independent levels: per-call timeout (singleCallTimeout, default 2 minutes) controls individual LLM API requests using the OpenAI SDK’s built-in retry with exponential backoff (0.5s x 2^n, max 8s, 25% jitter, covering 408/429/500+ transient errors), while the task-level maxTimeMs (default 5 minutes) caps the total Agent execution time. The Agent checks all termination conditions (iteration count, token budget, repetition, time, abort signal, done tool) on every cycle, and the first condition to trigger wins.

Design Philosophy

Claude Stays in Control — External models are tools, not peers. Claude maintains full context awareness, reviews all suggestions, and executes all code modifications. This ensures the rewind mechanism works and prevents unauthorized changes from external models.

Extensibility Through Data, Not Code — The provider framework, model routing table, and Skill definitions are all data-driven. Adding a new model, a new provider, or a new behavior pattern requires configuration changes, not architectural rewrites.

Isolation as a Feature — The Agent runs in a child process. MCP Servers self-read their configuration. Skills are optional overlays. Each component can fail independently without cascading. This isolation also enables future enhancements (sandboxed code execution, parallel agents) without restructuring the core.

Read-Only by Default — External models cannot write files, run commands, or touch git. This is not a limitation but a deliberate safety boundary. All modifications flow through Claude, creating a single auditable point of control with full rollback capability.

Progressive Complexity — Layer 1 handles 80% of use cases with simple API calls. Layer 2 activates only when the task genuinely requires multi-step reasoning. Skills add intelligence only when the base tools are insufficient. The system avoids unnecessary complexity at every level.