Loading
Personal Project
Claude Versatile

Claude Versatile is a multi-model AI orchestration framework that lets Claude Code act as the primary controller, delegating sub-tasks to external AI models (OpenAI GPT, Grok, future Gemini) through the Model Context Protocol (MCP). I built this system to solve a real problem in AI-assisted development: no single model excels at everything. By keeping Claude in control while routing specialized work to the best-fit model, the framework combines the strengths of multiple AI providers without sacrificing the unified development experience.

Codex Delegation Demo

0:00
/0:00
User delegating a code analysis task to OpenAI Codex through Claude Code, where Claude assembles context, routes the request via MCP, and presents the result

Two-Layer Architecture

The system splits into two layers, each targeting a different complexity level. Layer 1 handles lightweight, single-shot API calls (code review, web search, generation). Layer 2 runs an autonomous Agent with its own reasoning loop for complex multi-step analysis that would exceed a single API call’s capacity.

flowchart TD
    subgraph Claude["Claude Code (Orchestrator)"]
        CC[Claude Code CLI]
        SK[Skills Layer]
    end

    subgraph L1["Layer 1: Direct API Calls"]
        MCP1[codex MCP Server]
        MCP2[grok MCP Server]
        MCP3[future providers...]
    end

    subgraph L2["Layer 2: Agent Delegation"]
        AMCP[agent MCP Server]
        subgraph Worker["Agent Worker Process"]
            P[Planner]
            CM[Context Manager]
            subgraph TR["ToolRegistry (Plugin)"]
                FS[filesystem/]
                CR[core/]
                GK[grok/]
                CX[codex/]
            end
        end
    end

    subgraph Models["External AI Models"]
        GPT[OpenAI GPT-5.4]
        GRK[Grok-4]
        GMN[Gemini...]
    end

    CC --> SK
    SK --> MCP1
    SK --> MCP2
    SK --> MCP3
    CC --> AMCP
    AMCP --> Worker
    MCP1 --> GPT
    MCP2 --> GRK
    MCP3 --> GMN
    P --> CM
    CM --> TR
    GK --> GRK
    Worker --> GPT
    Worker --> GRK

All external models are strictly read-only. They cannot modify files, run shell commands, or access git. Every code suggestion returns as plain text for Claude to review and apply. This preserves Claude Code’s rewind mechanism for full rollback capability.

Declarative Provider Framework

Adding a new AI model provider to the system requires roughly 25 lines of code. I designed a defineProvider() lifecycle framework that handles configuration loading, environment variable injection, client creation, and error mapping automatically. Developers only implement the onRegisterTools hook to define their MCP tools.

// A complete MCP Server for any OpenAI-compatible API
defineProvider({
    type: "openai",
    name: "claude-versatile-codex",
    version: "0.3.0",
    configFile: "codex.agent.json",
    onRegisterTools(server, ctx) {
        server.tool("codex_chat", schema, async (params) => {
            const result = await ctx.complete({
                model: params.model,
                messages: [{ role: "user", content: params.prompt }],
            });
            return { content: [{ type: "text", text: result.content }] };
        });
    },
});

The framework supports two provider types: "openai" for OpenAI-compatible APIs (automatic config and client handling) and "native" for custom SDK integrations (user implements onCreateClient). The lifecycle flows through four stages, each with sensible defaults that can be selectively overridden:

flowchart LR
    A["onLoadConfig"] --> B["onCreateClient"]
    B --> C["onRegisterTools"]
    C --> D["onServerReady"]

For OpenAI-compatible providers, the onRegisterTools hook receives a context object with ctx.complete(), a convenience method that encapsulates the full pipeline of message building, completion execution, usage formatting, and error mapping in a single call. Native SDK providers get full control over client creation and tool registration while the framework still handles config loading and server startup.

Request Flow

When Claude delegates a task, the request flows through a well-defined pipeline. The MCP Server lazily initializes its API client on the first tool call (so the server can start without a valid API key), normalizes the response into a CompletionResult format, and maps any provider-specific errors to user-friendly MCP responses.

sequenceDiagram
    participant U as User
    participant C as Claude Code
    participant M as MCP Server
    participant P as Provider API

    U->>C: "Use codex to review this function"
    C->>C: Select tool: codex_chat
    C->>M: MCP tool call (prompt, model, params)
    M->>M: Load config from .versatile/
    M->>M: Initialize client (lazy)
    M->>P: chat.completions.create()
    P-->>M: CompletionResult (content, usage)
    M-->>C: MCP response (text + usage footer)
    C->>C: Review result, decide next action
    C-->>U: Present findings

Data-Driven Model Routing

Adding support for a new AI provider requires exactly one line in the route table. The MODEL_ROUTES configuration maps model name prefixes to their API credentials (config file paths and environment variable names) and capability flags (such as supportsFunctionCalling). The Agent’s collectEnv() function traverses this table to dynamically load all provider configurations, and the Worker’s createProviderFromEnv() uses resolveModelRoute() to create the correct CompletionProvider at runtime.

graph LR
    M[model name] --> R{MODEL_ROUTES}
    R -->|"grok-*"| G["grok.agent.json (FC: false)"]
    R -->|"default"| O["codex.agent.json (FC: true)"]
    G --> P[CompletionProvider]
    O --> P
    P --> Planner

This means users can switch between GPT-5.4, Grok-4, or any future model by simply passing a model parameter. No code changes, no server restarts. The Planner depends on the CompletionProvider interface rather than any specific SDK, and automatically adapts its tool calling strategy based on the route’s supportsFunctionCalling flag. Future non-OpenAI adapters (Gemini, Claude native) can plug in without changing Planner code.

Autonomous Agent with Adaptive Control

The Agent runs as an independent child process with its own LLM-driven ReAct loop (Think, Act, Observe, Decide). It reads files, searches patterns, builds understanding incrementally, and returns structured analysis. Process isolation means a crashed Agent never takes down the MCP Server.

Agent ReAct Loop

The MCP Server forks a Worker process via IPC. The Worker drives the Planner, which communicates with the external LLM through one of two modes: OpenAI function calling (tools parameter + tool_calls response) for models that support it, or prompt-based XML format (<thought> + <action>) for models that don’t. Function calling is preferred for reasoning models (gpt-5.4, o1, o3) where the content field is often null, but tool_calls always returns correctly. The mode is selected automatically based on the supportsFunctionCalling flag in MODEL_ROUTES.

sequenceDiagram
    participant C as Claude Code
    participant M as MCP Server
    participant W as Worker Process
    participant LLM as External LLM

    C->>M: agent_execute(goal, model, autoMode)
    M->>W: fork() + IPC start

    opt autoMode enabled
        W->>LLM: Prompt with goal + plan tool
        LLM-->>W: tool_calls: plan(estimated_steps)
        W->>W: Set effectiveMax = ceil(estimated * 1.5)
    end

    loop ReAct Loop
        W->>LLM: Goal + context + tool definitions
        LLM-->>W: tool_calls: read_file / search_pattern / ...
        W->>W: Execute read-only tools
        W->>W: Update context (sliding window + summary)
        W->>W: Check: iterations, tokens, repetition
        W->>M: IPC status update

        alt LLM calls done tool
            W->>M: IPC complete(result)
        end
    end

    M-->>C: Formatted result (summary, files, tokens)

The Agent’s built-in tool set is strictly read-only: read_file, list_dir, search_pattern, web_search (Grok-powered, conditional on API key), plan (autoMode only), and done. All file paths are validated with resolveSafePath() to prevent directory traversal. The Context Manager uses a sliding window with automatic summarization (triggered at 80% of max context, compresses to 50%) to prevent token explosion on long-running tasks.

I implemented a two-layer adaptive iteration control system (autoMode) that automatically manages how long the Agent runs:

L1 Complexity Estimation

The Agent calls a plan tool on its first iteration, outputting an estimated_steps count. The Planner dynamically sets effectiveMaxIterations = ceil(estimated * 1.5). This prevents simple tasks from burning through unnecessary iterations while giving complex tasks room to breathe.

L2 Runtime Guards

L2 guards operate continuously during execution:

  • Repetition Detection tracks the last 5 tool calls in a sliding window. Two consecutive identical calls trigger a redirect message. Two redirects force termination.
  • Token Budget caps cumulative token consumption (default 100k). The Agent stops gracefully when the budget is exhausted.
stateDiagram-v2
    [*] --> Planning: agent_execute(goal)
    Planning --> Executing: plan tool sets effectiveMax
    Executing --> Thinking: ReAct Loop
    Thinking --> Acting: Select tool
    Acting --> Observing: Execute tool
    Observing --> Thinking: Continue reasoning

    Thinking --> Done: done tool called
    Thinking --> Terminated: Max iterations reached
    Observing --> Terminated: Token budget exceeded
    Acting --> Redirected: Repetition detected
    Redirected --> Thinking: Inject redirect hint
    Redirected --> Terminated: 2nd redirect
    Terminated --> [*]
    Done --> [*]

When autoMode is disabled, the system falls back to a fixed maxIterations count, and the plan tool is hidden from the LLM.

Plugin-Based Tool Architecture

In v0.3.0-alpha.5, I refactored the Agent’s monolithic tool system into a plugin-based architecture. The original tools.ts (270 lines of tightly coupled tool definitions) was split into categorized directories (core/, filesystem/, grok/, codex/), each registering tools through a central ToolRegistry class. This makes the tool system open for extension without modifying existing code.

flowchart TD
    subgraph Registry["ToolRegistry"]
        R[register / getEnabled / has]
    end

    subgraph Core["core/"]
        P[plan]
        D[done]
    end

    subgraph FS["filesystem/"]
        RF[read_file]
        LD[list_dir]
        SP[search_pattern]
    end

    subgraph Grok["grok/"]
        WS[web_search]
    end

    subgraph Codex["codex/"]
        FT[future tools...]
    end

    Core --> Registry
    FS --> Registry
    Grok --> Registry
    Codex --> Registry
    Registry --> Worker

The createBuiltinRegistry(env) factory function conditionally registers provider-specific tools based on available API keys in the environment. If GROK_API_KEY is present, the Agent gains a web_search tool powered by Grok’s built-in search capability. If OPENAI_API_KEY is present, future Codex-specific tools become available. Core and filesystem tools are always registered.

Metadata-Driven Planner

Each tool now carries an AgentToolMetadata object that drives Planner behavior declaratively:

interface AgentToolMetadata {
    category?: "core" | "filesystem" | "external" | "custom";
    tracksFileRead?: boolean;       // Planner tracks 'path' arg in filesRead
    skipRepetitionCheck?: boolean;  // Exempt from L2 repetition detection
    systemPromptHint?: string;      // Injected into system prompt when tool is available
}

Previously, the Planner contained hardcoded tool name checks (if (toolName === "read_file") ...). Now it reads metadata to decide behavior: whether to track file reads, whether to skip repetition detection for certain tools, and what hints to inject into the system prompt. This means third-party tools can declare their own Planner behavior without modifying Planner code.

Config-Driven Tool Enablement

The enabledTools field in agent.json controls which tools the Agent exposes to the LLM. This replaces the previous hardcoded tool list and allows users to customize the Agent’s capabilities per-project:

{
  "enabledTools": ["read_file", "list_dir", "search_pattern", "done", "plan", "web_search"]
}

Omitting a tool from this list hides it from the LLM entirely. The ToolRegistry.getEnabled(names) method filters the registered tools to only those specified, so the Planner never sees tools the user has disabled.

Multi-Model Tool Calling Adaptation

Not all models support OpenAI’s function calling protocol. Grok-4 via the realseek proxy, for example, returns plain text instead of structured tool_calls. I implemented a dual-mode system that automatically adapts to each model’s capabilities.

flowchart TD
    M[Model Request] --> Check{supportsFunctionCalling?}
    Check -->|true| FC[Function Calling Mode]
    Check -->|false| XML[XML Prompt Mode]

    FC --> Tools["tools param + tool_calls response"]
    XML --> Prompt["System prompt describes tools"]
    XML --> Format["LLM outputs <thought> + <action> XML"]
    XML --> Parse["parseLegacyContent extracts tool calls"]

    Tools --> Norm[Normalized CompletionResult]
    Parse --> Norm

The MODEL_ROUTES table now includes a supportsFunctionCalling flag per model prefix. When the Planner detects a model without function calling support, it:

  1. Omits the tools parameter from the API request
  2. Injects XML format instructions into the system prompt, describing available tools and the expected <thought> + <action> response format
  3. Parses the LLM’s plain-text response using parseLegacyContent to extract tool calls
  4. The ContextManager converts tool_call/tool messages to plain assistant/user messages for models that don’t understand the function calling message format

This adaptation is transparent to the rest of the system. The Planner always receives a normalized CompletionResult with content, toolCalls, and usage fields regardless of which mode was used. Future non-OpenAI adapters (Gemini’s functionDeclarations, Claude’s tool_use) can be implemented in lib/adapters/ without changing Planner code.

Skill Encapsulation and Configuration

Skill Layer

Skills are optional behavior orchestration layers that sit on top of MCP tools. While the tools themselves are self-describing and work without Skills, the Skill layer adds intelligent context assembly and result presentation.

The codex-task Skill automatically collects relevant code context (file contents, project structure, dependency graphs), assembles a structured prompt with appropriate token budgets, and delegates to codex_chat. The grok-search Skill analyzes search intent (factual, news, technical, comparative, exploratory), optimizes the query, selects a matching system prompt, and delegates to grok_search.

Skills are defined as YAML/Markdown files in .claude/skills/, making them version-controllable and shareable. The separation between MCP (tool capability) and Skill (behavior orchestration) keeps each layer focused: MCP Servers can be installed independently via npm, while Skills provide the optional intelligence layer.

Unified Configuration and Timeout Strategy

All configuration lives in a .versatile/ directory (gitignored), with one JSON file per provider. Values are resolved through a three-level priority chain:

flowchart LR
    A[".versatile/*.json"] -->|not found| B["process.env"]
    B -->|not found| C["Hardcoded default"]

Missing files are auto-generated from templates on first run. Placeholder API keys (YOUR_API_KEY_HERE) are detected and treated as missing, falling back to environment variables with a warning. This means the server process can launch and register tools even before the user has configured their API key.

The timeout system operates at two independent levels: per-call timeout (singleCallTimeout, default 2 minutes) controls individual LLM API requests using the OpenAI SDK’s built-in retry with exponential backoff (0.5s x 2^n, max 8s, 25% jitter, covering 408/429/500+ transient errors), while the task-level maxTimeMs (default 5 minutes) caps the total Agent execution time. The Agent checks all termination conditions (iteration count, token budget, repetition, time, abort signal, done tool) on every cycle, and the first condition to trigger wins.

Design Philosophy

Claude Stays in Control — External models are tools, not peers. Claude maintains full context awareness, reviews all suggestions, and executes all code modifications. This ensures the rewind mechanism works and prevents unauthorized changes from external models.

Extensibility Through Data, Not Code — The provider framework, model routing table, tool metadata, and Skill definitions are all data-driven. Adding a new model, a new provider, a new Agent tool, or a new behavior pattern requires configuration changes, not architectural rewrites. The plugin-based tool system extends this principle: third-party tools declare their Planner behavior through metadata rather than requiring Planner modifications.

Isolation as a Feature — The Agent runs in a child process. MCP Servers self-read their configuration. Skills are optional overlays. Each component can fail independently without cascading. This isolation also enables future enhancements (sandboxed code execution, parallel agents) without restructuring the core.

Read-Only by Default — External models cannot write files, run commands, or touch git. This is not a limitation but a deliberate safety boundary. All modifications flow through Claude, creating a single auditable point of control with full rollback capability.

Progressive Complexity — Layer 1 handles 80% of use cases with simple API calls. Layer 2 activates only when the task genuinely requires multi-step reasoning. Skills add intelligence only when the base tools are insufficient. The dual-mode tool calling system adapts transparently to model capabilities without exposing complexity to the user. The system avoids unnecessary complexity at every level.