Skip to main content

StateSet Computer Use Agent - Architecture Overview

Executive Summary

StateSet Computer Use Agent is a production-grade AI automation platform powered by Claude Opus 4.5. The system deploys multiple specialized AI agents that can see, understand, and interact with desktop environments to complete complex, long-running tasks autonomously. Built with Python using async/await patterns throughout, the platform implements Anthropic’s context engineering research achieving 95% cost savings compared to naive approaches. Key Metrics:
  • Average tokens/task: 7,500 (95% reduction from 150k baseline)
  • Average cost/task: 0.11(950.11 (95% savings from 2.25 baseline)
  • Average task duration: 30 seconds (33% faster with parallel execution)
  • Parallel speedup: 30-50% on multi-tool tasks

System Architecture Diagram

┌─────────────────────────────────────────────────────────────────────────────┐
│                              User Interface                                  │
│           ┌──────────────┐  ┌──────────────┐  ┌──────────────┐             │
│           │  CLI/Shell   │  │  Dashboard   │  │    APIs      │             │
│           │   Scripts    │  │   (Next.js)  │  │   (REST)     │             │
│           └──────┬───────┘  └──────┬───────┘  └──────┬───────┘             │
└──────────────────┼─────────────────┼─────────────────┼───────────────────────┘
                   │                 │                 │
┌──────────────────▼─────────────────▼─────────────────▼───────────────────────┐
│                            ORCHESTRATION LAYER                               │
│  ┌────────────────────────────────────────────────────────────────────────┐ │
│  │                           main.py                                       │ │
│  │  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────────────┐│ │
│  │  │ Agent Selector  │  │  GlobalState    │  │  Multi-Agent Runner     ││ │
│  │  │ (keyword-based) │  │  (thread-safe)  │  │  (asyncio.gather)       ││ │
│  │  └─────────────────┘  └─────────────────┘  └─────────────────────────┘│ │
│  └────────────────────────────────────────────────────────────────────────┘ │
└───────────────────────────────────┬─────────────────────────────────────────┘

┌───────────────────────────────────▼─────────────────────────────────────────┐
│                              AGENT ENGINE                                    │
│  ┌────────────────────────────────────────────────────────────────────────┐ │
│  │                         agent/loop.py                                   │ │
│  │  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  ┌────────────┐ │ │
│  │  │  Sampling    │  │    API       │  │   System     │  │  Message   │ │ │
│  │  │    Loop      │  │  Providers   │  │   Prompt     │  │  Manager   │ │ │
│  │  │              │  │ (3 backends) │  │   Init       │  │  (cache)   │ │ │
│  │  └──────────────┘  └──────────────┘  └──────────────┘  └────────────┘ │ │
│  └────────────────────────────────────────────────────────────────────────┘ │
│                                                                              │
│  ┌────────────────────┐  ┌────────────────────┐  ┌────────────────────────┐ │
│  │   SubagentManager  │  │     MCPManager     │  │  StructuredOutput      │ │
│  │  (task delegation) │  │  (external tools)  │  │    Parser              │ │
│  └────────────────────┘  └────────────────────┘  └────────────────────────┘ │
└───────────────────────────────────┬─────────────────────────────────────────┘

┌───────────────────────────────────▼─────────────────────────────────────────┐
│                             TOOL LAYER                                       │
│  ┌──────────────────────────────────────────────────────────────────────┐   │
│  │                        ToolCollection                                 │   │
│  │  ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────────────┐│   │
│  │  │ Computer   │ │   Bash     │ │   Edit     │ │      Memory        ││   │
│  │  │ Tool       │ │   Tool     │ │   Tool     │ │      Tool          ││   │
│  │  │ (GUI ops)  │ │ (commands) │ │ (files)    │ │ (persistence)      ││   │
│  │  └────────────┘ └────────────┘ └────────────┘ └────────────────────┘│   │
│  │  ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────────────┐│   │
│  │  │   AGI      │ │  Subagent  │ │  StateSet  │ │     AskUser        ││   │
│  │  │   Tool     │ │   Tool     │ │  CLI Tool  │ │      Tool          ││   │
│  │  └────────────┘ └────────────┘ └────────────┘ └────────────────────┘│   │
│  └──────────────────────────────────────────────────────────────────────┘   │
└───────────────────────────────────┬─────────────────────────────────────────┘

┌───────────────────────────────────▼─────────────────────────────────────────┐
│                          OPTIMIZATION LAYER                                  │
│  ┌───────────────────┐  ┌───────────────────┐  ┌───────────────────────────┐│
│  │ ParallelExecutor  │  │ ContextOptimizer  │  │    ToolExecutionGuard     ││
│  │ (30-50% speedup)  │  │ (5 patterns)      │  │ (safety + verification)   ││
│  └───────────────────┘  └───────────────────┘  └───────────────────────────┘│
│  ┌───────────────────┐  ┌───────────────────┐  ┌───────────────────────────┐│
│  │ StuckDetection    │  │   Verification    │  │      Checkpoint           ││
│  │ (loop prevention) │  │ (visual confirm)  │  │   (state persistence)     ││
│  └───────────────────┘  └───────────────────┘  └───────────────────────────┘│
└───────────────────────────────────┬─────────────────────────────────────────┘

┌───────────────────────────────────▼─────────────────────────────────────────┐
│                         OBSERVABILITY LAYER                                  │
│  ┌──────────────────────────────────────────────────────────────────────┐   │
│  │                    UnifiedObservability                               │   │
│  │  ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────────────┐│   │
│  │  │ Structured │ │OpenTelemetry│ │ Prometheus │ │  Real-time Event  ││   │
│  │  │  Logging   │ │  Tracing   │ │  Metrics   │ │    Streaming      ││   │
│  │  └────────────┘ └────────────┘ └────────────┘ └────────────────────┘│   │
│  └──────────────────────────────────────────────────────────────────────┘   │
└───────────────────────────────────┬─────────────────────────────────────────┘

┌───────────────────────────────────▼─────────────────────────────────────────┐
│                          EXTERNAL SERVICES                                   │
│  ┌────────────┐  ┌────────────┐  ┌────────────┐  ┌────────────────────────┐ │
│  │ Anthropic  │  │  StateSet  │  │   Stripe   │  │     MCP Servers        │ │
│  │    API     │  │   APIs     │  │  Billing   │  │ (Slack, GitHub, etc.)  │ │
│  └────────────┘  └────────────┘  └────────────┘  └────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘

Core Components

1. Main Orchestrator (main.py)

The entry point for all agent execution, responsible for: Environment Validation:
def validate_environment(*, require_display: bool = True) -> Dict[str, str]:
    """Validates ANTHROPIC_API_KEY, DISPLAY, STRIPE_API_KEY, WORKSPACE_PATH"""
Agent Selection:
def get_active_agents(instruction: str) -> List[AgentType]:
    """Keyword-based agent selection from instruction text"""
    # Matches: "auto-close" → AUTO_CLOSE, "social media" → SOCIAL_MEDIA, etc.
Global State Management:
class GlobalState:
    running: bool              # System-wide running flag
    tasks: Set[asyncio.Task]   # Active agent tasks
    shutdown_event: Event      # Graceful shutdown coordination
    _lock: threading.Lock      # Thread-safe state management
Multi-Agent Execution:
async def continuous_loop(agents: List[AgentConfig], instruction: str):
    """Spawns agents in parallel using asyncio.gather()"""
    tasks = [asyncio.create_task(run_agent(agent, instruction)) for agent in agents]
    results = await asyncio.gather(*tasks, return_exceptions=True)
Task Completion Analysis:
async def analyze_task_completion(messages, agent_type) -> TaskStatus:
    """Agent-specific completion detection with indicator patterns"""
    # AUTO_CLOSE: "ticket closed", "successfully closed", "task finished"
    # SOCIAL_MEDIA: "comment hidden", "content removed", "moderation complete"

2. Agent Loop (agent/loop.py)

The core conversation engine with Claude API: Sampling Loop:
async def sampling_loop(
    model: str,                    # claude-opus-4-5-20251101
    provider: APIProvider,         # ANTHROPIC | BEDROCK | VERTEX
    system_prompt_suffix: str,     # Agent-specific rules
    messages: List[BetaMessageParam],
    tool_collection: ToolCollection,
    # New capabilities
    enable_subagents: bool = True,
    mcp_servers: Dict = None,
    output_schema: Dict = None,
) -> SamplingLoopResult:
API Provider Support:
ProviderModel IDUse Case
ANTHROPICclaude-opus-4-5-20251101Direct API access
BEDROCKanthropic.claude-opus-4-5-20251101-v1:0AWS infrastructure
VERTEXclaude-opus-4-5-20251101Google Cloud
Beta Flags:
  • prompt-caching-2024-07-31 - 90% cost reduction on cached tokens
  • advanced-tool-use-2025-11-20 - Tool search (regex/bm25)
  • effort-2025-11-24 - Effort parameter (low/medium/high)
  • computer-use-2025-11-24 - Latest tool version with zoom action
System Prompt Initialization:
async def initialize_system_prompt(agent_config: AgentConfig) -> str:
    """Fetches rules/attributes from StateSet APIs:
       - /api/rules/get-agent-rules
       - /api/attributes/get-agent-attributes
       - /api/agents/get-agent
    """

3. Tool System (agent/tools/)

Tool Hierarchy:
BaseAnthropicTool (Abstract)
├── ComputerTool (3 versions)
│   ├── Actions: screenshot, click, type, scroll, drag, zoom
│   ├── Resolution scaling (XGA, WXGA, FWXGA)
│   └── Performance: 8ms typing delay, 100-char groups
├── BashTool
│   ├── Persistent session with sentinel pattern
│   ├── Async subprocess management
│   └── 60-second timeout
├── EditTool
│   ├── File creation/modification
│   └── Directory traversal prevention
├── MemoryTool
│   ├── Commands: view, create, str_replace, insert, delete, rename
│   ├── Prompt injection sanitization
│   └── Per-agent memory isolation
├── AGITool
│   └── Extended AGI capabilities
├── SubagentTool (lazy-loaded)
│   └── Spawn specialized subagents
└── AskUserTool
    └── Human-in-the-loop requests
Tool Versions:
VersionReleaseFeatures
computer_use_20251124CurrentZoom action, deferred tool loading
computer_use_20250124PreviousStable production version
computer_use_20241022LegacyBackward compatibility
ToolCollection API:
class ToolCollection:
    tool_map: Dict[str, BaseAnthropicTool]  # name → tool

    def to_params(self) -> List[Dict]       # Convert to API format
    async def run(self, name, input) -> ToolResult
    def set_deferred_tools(self, tools: List[str])  # For tool search

Advanced Capabilities

4. Subagent System (agent/subagent.py)

Implements Anthropic’s sub-agent compression pattern for 95% context savings: Subagent Types:
TypeModelMax TokensUse Case
EXPLOREHaiku4096Fast codebase exploration
ANALYZESonnet8192Deep analysis with thinking
EXECUTESonnet4096Task execution with verification
RESEARCHHaiku4096Web search and synthesis
CODESonnet8192Code generation/modification
Architecture:
MainAgent (Opus 4.5)

    ├── spawn_subagent("explore", "Find auth files")
    │   └── Returns: 2k summary (not 50k raw output)

    ├── spawn_subagent("analyze", "Review patterns")
    │   └── Returns: Structured insights

    └── spawn_subagent("execute", "Refactor code")
        └── Returns: Confirmation + diff
Usage:
from agent.subagent import SubagentManager, SubagentType

manager = SubagentManager(api_key)
result = await manager.spawn(
    task="Analyze the authentication flow",
    subagent_type=SubagentType.ANALYZE,
)
# result.result contains compressed summary

5. MCP Client Integration (agent/mcp_client.py)

Connect to external Model Context Protocol servers: Supported Transports:
  • STDIO (subprocess)
  • SSE (Server-Sent Events)
  • HTTP (direct HTTP)
Preset Servers:
PRESET_SERVERS = {
    "slack": {"command": "npx", "args": ["-y", "@modelcontextprotocol/server-slack"]},
    "github": {"command": "npx", "args": ["-y", "@modelcontextprotocol/server-github"]},
    "postgres": {"command": "npx", "args": ["-y", "@modelcontextprotocol/server-postgres"]},
    "filesystem": {...},
    "memory": {...},
    "brave-search": {...},
    "puppeteer": {...},
    "sqlite": {...},
}
Usage in sampling_loop:
result = await sampling_loop(
    mcp_servers={
        "slack": {
            "command": "npx",
            "args": ["-y", "@modelcontextprotocol/server-slack"],
            "env": {"SLACK_BOT_TOKEN": os.environ["SLACK_BOT_TOKEN"]}
        }
    },
    # Agent now has access to mcp__slack__send_message, etc.
)

6. Structured Output (agent/structured_output.py)

Force Claude to return valid JSON matching specified schemas: Pre-defined Schemas:
  • TICKET_ANALYSIS_SCHEMA - Support ticket analysis
  • TASK_RESULT_SCHEMA - Task completion results
  • CODE_ANALYSIS_SCHEMA - Code review findings
  • ENTITY_EXTRACTION_SCHEMA - Entity extraction
Usage:
from agent.structured_output import OutputSchema, StructuredOutputParser

schema = OutputSchema(
    name="TicketAnalysis",
    schema={
        "type": "object",
        "properties": {
            "tickets_to_close": {"type": "array", "items": {"type": "string"}},
            "summary": {"type": "string"},
            "confidence": {"type": "number", "minimum": 0, "maximum": 1}
        },
        "required": ["tickets_to_close", "summary"]
    }
)

result = await sampling_loop(output_schema=schema.schema, ...)
parser = StructuredOutputParser(schema)
data = parser.parse(response_text)  # Validates against schema

Optimization Systems

7. Parallel Executor (agent/parallel_executor.py)

Automatic parallel execution for independent tool calls: Dependency Analysis:
class DependencyAnalyzer:
    def analyze(self, tool_calls: List[ToolCall]) -> ExecutionPlan:
        """
        Rules:
        - Computer tool calls: Always sequential (visual state dependency)
        - Same path parameter: Sequential (file system dependency)
        - Read-only tools: Can parallelize
        - Write operations: Sequential
        """
Execution Strategy:
Tool Calls: [screenshot, bash(ls), bash(pwd), click]

Dependency Analysis:
- screenshot → click (computer tool dependency)
- bash(ls), bash(pwd) (independent, read-only)

Execution Plan:
1. [screenshot]           # Sequential
2. [bash(ls), bash(pwd)]  # Parallel
3. [click]                # Sequential

Result: 30-50% speedup

8. Context Optimizer (agent/context_optimizer.py)

Implements 5 Anthropic context engineering patterns: Pattern 1: Just-in-Time Retrieval
# Instead of: read_file("large_file.py")
# Use: grep("pattern", "large_file.py") | head -50
Pattern 2: Dynamic Compaction
class ContextBudget:
    OPTIMAL = 50_000           # EXCELLENT attention quality
    ATTENTION_DEGRADATION = 100_000  # GOOD → DEGRADED
    WARNING = 150_000          # DEGRADED → WARNING
    CRITICAL = 200_000         # WARNING → CRITICAL
Pattern 3: Structured Note-Taking
# Persistent memory outside context window
memory_tool.create("auth_findings", "OAuth2 flow uses refresh tokens...")
Pattern 4: Sub-Agent Compression
# 50k raw exploration → 2k structured summary
subagent = await manager.spawn(task="Find all API endpoints", type=EXPLORE)
Pattern 5: Attention Budget Monitoring
class AttentionQuality(Enum):
    EXCELLENT = "excellent"  # < 50k tokens
    GOOD = "good"           # < 100k tokens
    DEGRADED = "degraded"   # < 150k tokens
    WARNING = "warning"     # < 200k tokens
    CRITICAL = "critical"   # > 200k tokens

9. Tool Execution Guard (agent/tool_guard.py)

Safety and verification layer: Features:
  • Pre-execution Validation: Safety checks before tool execution
  • Visual Verification: Confirms actions took effect (optional)
  • Stuck Detection: Monitors for infinite loops
  • Result Caching: 120-second TTL for cacheable operations
Speed Modes:
# Normal mode: Verification enabled (~0.5-1.0s per action)
python main.py "task"

# Fast mode: Skip verification (2-3x faster)
AGENT_FAST_MODE=1 python main.py "task"

10. Stuck Detection (agent/stuck_detection.py)

Prevents infinite loops and stuck patterns: Detection Methods:
  • Repeating same action consecutively
  • Cycling between 2-3 actions
  • No visual progress (identical screenshots)
  • Slow progress (too few actions per time)
Recovery Strategies:
class StuckDetector:
    def check(self, action: ActionRecord) -> Optional[RecoverySuggestion]:
        """
        Returns suggestions like:
        - "Try a different approach"
        - "Scroll to see more content"
        - "Check if element exists"
        """

Observability System

11. Unified Observability (agent/observability/)

Single interface for all observability concerns: Configuration:
from agent.observability import get_observability, configure_observability

configure_observability(
    enable_metrics=True,
    enable_tracing=True,
    enable_streaming=True,
    metrics_port=9090,
    otlp_endpoint="localhost:4317",  # OpenTelemetry
)
Usage:
obs = get_observability()

async with obs.task_context("AUTO_CLOSE", "agent-123", "Close tickets"):
    obs.log_info("Starting task", tickets_count=10)

    with obs.tool_execution("computer", action="click"):
        # Automatically tracked
        pass

    obs.record_api_call(
        provider="anthropic",
        model="claude-opus-4-5-20251101",
        latency=2.5,
        input_tokens=1500,
        output_tokens=500,
    )
Components:
ComponentPurposeBackend
Structured LoggingJSON logs with contextPython logging
Distributed TracingRequest correlationOpenTelemetry
MetricsPerformance trackingPrometheus
Event StreamingReal-time updatesSSE/WebSocket
Health MonitoringSystem healthCircuit breakers
Environment Variables:
METRICS_PORT=9090           # Prometheus metrics
OTLP_ENDPOINT=localhost:4317  # OpenTelemetry collector
LOG_FORMAT=json             # json | human | compact
LOG_LEVEL=INFO              # DEBUG | INFO | WARNING | ERROR

Infrastructure

12. Configuration Management (agent/config.py)

Centralized configuration with documented rationale: Configuration Classes:
@dataclass
class ContextSettings:
    optimal_budget: int = 50_000           # From Anthropic research
    degradation_threshold: int = 100_000   # Attention starts degrading
    warning_threshold: int = 150_000       # Significant degradation
    max_context: int = 200_000             # Model limit

@dataclass
class ToolSettings:
    bash_timeout: int = 60                 # Optimized from 120s
    typing_delay_ms: int = 8               # Characters per ms
    screenshot_retention: int = 5          # Most recent screenshots

@dataclass
class BudgetSettings:
    input_price_per_million: float = 3.0   # Claude Opus 4.5
    output_price_per_million: float = 15.0
    cached_input_price: float = 0.30       # 90% savings

13. Exception Hierarchy (agent/exceptions.py)

Comprehensive error handling:
AgentError (base)
├── RetryableError
│   ├── NetworkError
│   ├── RateLimitError
│   ├── TimeoutError
│   └── ServiceUnavailableError
├── NonRetryableError
│   ├── ConfigurationError
│   ├── ValidationError
│   ├── SecurityError
│   └── AuthenticationError
├── BudgetError
│   ├── DailyBudgetExceededError
│   └── TaskBudgetExceededError
└── ToolError
    ├── ToolExecutionError
    ├── ToolTimeoutError
    └── ToolValidationError

14. Health Monitoring (agent/health.py)

Production health checks:
class HealthChecker:
    async def check_anthropic_api(test_connectivity=True) -> HealthCheck
    async def check_system_resources() -> HealthCheck
    async def check_disk_space() -> HealthCheck

    # Circuit breaker for failing services
    circuit_breaker: CircuitBreaker
Health States:
  • HEALTHY - All checks passing
  • DEGRADED - Some checks failing, system operational
  • UNHEALTHY - Critical failures

Dashboard Architecture

15. Backend (dashboard/backend/)

FastAPI REST API with async operations:
dashboard/backend/
├── app/
│   ├── main.py          # FastAPI app factory
│   ├── api/             # REST API routes
│   │   ├── jobs.py      # Job CRUD
│   │   ├── templates.py # Workflow templates
│   │   ├── artifacts.py # Screenshot/output storage
│   │   └── metrics.py   # Performance tracking
│   ├── models/          # SQLAlchemy ORM models
│   ├── schemas/         # Pydantic schemas
│   ├── services/        # Business logic
│   ├── tasks/           # Celery workers
│   │   └── worker.py    # Async agent execution
│   └── core/            # Configuration, database
└── migrations/          # Alembic schema versioning
Key Technologies:
  • FastAPI with CORS
  • SQLAlchemy async ORM
  • PostgreSQL database
  • Celery task queue
  • Server-Sent Events (SSE)
  • S3-compatible artifact storage (boto3)

16. Frontend (dashboard/frontend/)

Next.js 14 application:
dashboard/frontend/
├── app/                 # App router pages
├── components/          # React components
├── hooks/               # Custom React hooks
└── lib/                 # Utilities
Key Technologies:
  • Next.js 14 with app router
  • React Query for data fetching
  • Tailwind CSS styling
  • EventSource for real-time updates

Execution Flow

Complete Request Flow

1. User Command


2. validate_environment()
   ├── Check ANTHROPIC_API_KEY
   ├── Check DISPLAY
   └── Validate optional keys


3. get_active_agents(instruction)
   ├── Parse keywords: "auto-close" → AUTO_CLOSE
   └── Return: List[AgentConfig]


4. continuous_loop(agents, instruction)

   ├──────────────────────────────────────┐
   │                                      │
   ▼                                      ▼
5a. run_agent(AUTO_CLOSE)           5b. run_agent(SOCIAL_MEDIA)
   │                                      │
   ▼                                      ▼
6. initialize_system_prompt()        6. initialize_system_prompt()
   ├── Fetch rules from StateSet         (parallel)
   └── Build system prompt


7. sampling_loop()

   ├─── Send to Claude API ──────────────────────────┐
   │         │                                       │
   │         ▼                                       │
   │    Claude Response                              │
   │    ├── Text content                             │
   │    └── Tool calls                               │
   │         │                                       │
   │         ▼                                       │
   ├─── ToolExecutionGuard.execute()                 │
   │    ├── DependencyAnalyzer                       │
   │    ├── ParallelToolExecutor                     │
   │    ├── StuckDetection                           │
   │    └── Verification (optional)                  │
   │         │                                       │
   │         ▼                                       │
   │    Tool Results                                 │
   │         │                                       │
   └─────────┴───────────────────────────────────────┘

             ▼ (loop until done)


8. analyze_task_completion()
   ├── Check completion indicators
   └── Return TaskStatus


9. send_stripe_meter_event()
   ├── Token usage
   └── Cost calculation


10. shutdown_gracefully()
    ├── Cancel all tasks
    └── Cleanup resources

Agent Types

Supported Agents

Agent TypeKeywordsPurpose
AUTO_CLOSE”auto-close”, “ticket”Close resolved support tickets
SOCIAL_MEDIA”social media”, “moderate”Content moderation
LINKEDIN_MESSENGER”linkedin”, “outreach”LinkedIn automation
SLACK_SUPPORT”slack”, “support”Slack support automation
SHOPIFY”shopify”, “e-commerce”E-commerce management
ONBOARDING”onboarding”, “setup”User onboarding
STATESET_AGENTIC”stateset”, “custom”Custom tasks

Agent Configuration

@dataclass
class AgentConfig:
    org_id: str               # Organization identifier
    agent_id: str             # Unique agent identifier
    description: str          # Agent purpose
    capabilities: List[str]   # What the agent can do
    stripe_customer_id: str   # Billing identifier

Security Architecture

API Key Management

  • All keys via environment variables
  • Validation on startup
  • No key transmission to external services

Tool Safety

  • Directory traversal prevention in EditTool
  • Prompt injection protection in MemoryTool
  • Pre-execution validation via ToolExecutionGuard
  • Agent memory isolation (per agent_id)

Sandbox Execution

  • Tools run in controlled environment
  • File system access limited by permissions
  • Network access controlled by system

Performance Characteristics

Benchmarks

MetricValueNotes
Tokens/task7,50095% reduction from 150k
Cost/task$0.1195% savings from $2.25
Task duration30s33% faster with parallel
Parallel speedup30-50%On multi-tool tasks
Typing speed8ms/charOptimized from 50ms
Bash timeout60sOptimized from 120s

Cost Breakdown

OperationPrice
Input tokens$3.00/1M
Output tokens$15.00/1M
Cached input$0.30/1M (90% savings)

File Organization

stateset-computer-use-agent/
├── main.py                      # Entry point, orchestration
├── agent/
│   ├── loop.py                  # Core sampling loop
│   ├── parallel_executor.py     # Parallel tool execution
│   ├── context_optimizer.py     # Context engineering
│   ├── tool_guard.py            # Safety checks
│   ├── stuck_detection.py       # Loop prevention
│   ├── verification.py          # Visual verification
│   ├── subagent.py              # Subagent spawning
│   ├── mcp_client.py            # MCP integration
│   ├── structured_output.py     # JSON schema validation
│   ├── checkpoint.py            # State persistence
│   ├── metrics.py               # Performance tracking
│   ├── skill_manager.py         # Skill system
│   ├── config.py                # Configuration
│   ├── exceptions.py            # Error hierarchy
│   ├── logging_config.py        # Structured logging
│   ├── health.py                # Health monitoring
│   ├── observability/           # Unified observability
│   │   ├── unified.py
│   │   ├── tracing.py
│   │   └── metrics.py
│   └── tools/                   # Tool implementations
│       ├── base.py
│       ├── collection.py
│       ├── computer.py
│       ├── bash.py
│       ├── edit.py
│       ├── memory.py
│       ├── agi.py
│       └── groups.py
├── dashboard/
│   ├── backend/                 # FastAPI + Celery
│   └── frontend/                # Next.js 14
├── start-*.sh                   # Launch scripts
└── test_*.py                    # Test suites

Extension Points

Adding New Agents

  1. Define AgentConfig in AGENT_CONFIGS
  2. Add keyword detection in get_active_agents()
  3. Create completion indicators in analyze_task_completion()

Adding New Tools

  1. Inherit from BaseAnthropicTool
  2. Implement __call__ returning ToolResult
  3. Add to version groups in agent/tools/groups.py
  4. Update tool traits if cacheable/read-only

Adding MCP Servers

await mcp_manager.add_server("custom-server", {
    "command": "npx",
    "args": ["-y", "@my/custom-mcp-server"],
    "env": {"API_KEY": "..."}
})

Quick Reference

Environment Variables

# Required
ANTHROPIC_API_KEY=sk-ant-api03-...
DISPLAY=:1

# Optional
STRIPE_API_KEY=sk_live_...
WORKSPACE_PATH=/path/to/workspace
AGENT_FAST_MODE=1              # Skip verification
METRICS_PORT=9090              # Prometheus
OTLP_ENDPOINT=localhost:4317   # OpenTelemetry
LOG_FORMAT=json                # json | human | compact
LOG_LEVEL=INFO                 # DEBUG | INFO | WARNING | ERROR

Common Commands

# Run agents
python main.py "auto-close tickets"
python main.py "auto-close and social media"  # Parallel

# With options
python main.py --effort medium "task"
python main.py --tool-search regex --defer-tool agi_agent "task"

# Dashboard
cd dashboard && docker compose up -d

This architecture provides a scalable, maintainable foundation for computer use automation with AI agents, implementing production-grade patterns for reliability, observability, and cost optimization.