StateSet Computer Use Agents - Architecture Overview

Overview

AI-powered automation platform that leverages Claude Opus 4 to perform various tasks through computer interaction. The system uses multiple specialized agents that can control desktop environments, interact with web applications, and automate complex workflows.

🛠️ Key Features

  • Multi-Agent Architecture - Specialized agents for different tasks
  • Computer Vision & Control - Agents can see and interact with desktop environments
  • API Integration - Seamless integration with StateSet APIs
  • Metered Billing - Usage-based billing through Stripe
  • Parallel Processing - Multiple agents can run concurrently
  • Graceful Shutdown - Safe termination with Ctrl+C

📋 Requirements

  • Ubuntu Linux (kernel 5.15.0+)
  • Python 3.8+
  • Virtual display (DISPLAY=:1)
  • Anthropic API key

System Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        User Interface                            │
│                   (Command Line / Shell Scripts)                 │
└─────────────────────┬───────────────────────────────────────────┘

┌─────────────────────▼───────────────────────────────────────────┐
│                         main.py                                  │
│  ┌─────────────┐  ┌──────────────┐  ┌────────────────────┐    │
│  │   Agent     │  │   Agent      │  │   Global State     │    │
│  │  Selector   │  │   Runner     │  │   Management       │    │
│  └─────────────┘  └──────────────┘  └────────────────────┘    │
└─────────────────────┬───────────────────────────────────────────┘

┌─────────────────────▼───────────────────────────────────────────┐
│                      agent/loop.py                               │
│  ┌─────────────┐  ┌──────────────┐  ┌────────────────────┐    │
│  │  Sampling   │  │   API        │  │   System Prompt    │    │
│  │    Loop     │  │  Providers   │  │   Initialization   │    │
│  └─────────────┘  └──────────────┘  └────────────────────┘    │
└─────────────────────┬───────────────────────────────────────────┘

┌─────────────────────▼───────────────────────────────────────────┐
│                     Tool Collection                              │
│  ┌────────────┐  ┌──────────────┐  ┌────────────────────┐     │
│  │  Computer  │  │    Bash      │  │      Edit          │     │
│  │   Tool     │  │    Tool      │  │      Tool          │     │
│  └────────────┘  └──────────────┘  └────────────────────┘     │
└─────────────────────┬───────────────────────────────────────────┘

┌─────────────────────▼───────────────────────────────────────────┐
│                    External Services                             │
│  ┌────────────┐  ┌──────────────┐  ┌────────────────────┐     │
│  │ Anthropic  │  │  StateSet    │  │     Stripe         │     │
│  │    API     │  │    APIs      │  │     Billing        │     │
│  └────────────┘  └──────────────┘  └────────────────────┘     │
└─────────────────────────────────────────────────────────────────┘

Core Components

1. Main Entry Point (main.py)

The main module orchestrates the entire system:

Key Functions:

  • main(): Entry point that initializes and runs agents
  • get_active_agents(): Determines which agents to activate based on instruction keywords
  • run_agent(): Executes a single agent with error handling and billing
  • continuous_loop(): Manages multiple agents running in parallel
  • analyze_task_completion(): Determines if a task was completed successfully

Global State Management:

class GlobalState:
    - running: bool              # System-wide running flag
    - tasks: Set[asyncio.Task]   # Active agent tasks
    - shutdown_event: Event      # Coordination for graceful shutdown
    - _lock: threading.Lock      # Thread-safe state management

2. Agent Loop (agent/loop.py)

The core agent execution engine:

Key Components:

  • sampling_loop(): Main conversation loop with the AI model
  • initialize_system_prompt(): Fetches agent configuration from APIs
  • API Provider Support: Anthropic, Bedrock, Vertex
  • Message Management: Handles context window and prompt caching

Message Flow:

  1. Initialize system prompt with agent rules and attributes
  2. Send messages to AI model
  3. Process AI response and tool calls
  4. Execute tools and collect results
  5. Continue conversation until task completion

3. Tool System

Tool Collection Architecture:

ToolCollection
├── ComputerTool     # GUI interaction
│   ├── screenshot()
│   ├── click()
│   ├── type_text()
│   └── scroll()
├── BashTool         # System commands
│   ├── run()
│   └── execute_command()
└── EditTool         # File manipulation
    ├── create()
    └── modify()

Tool Versions:

  • computer_use_20241022: Legacy version
  • computer_use_20250124: Current version with enhanced capabilities

4. Agent Types

Each agent is configured with:

@dataclass
class AgentConfig:
    org_id: str               # Organization identifier
    agent_id: str             # Unique agent identifier
    description: str          # Agent purpose
    capabilities: List[str]   # What the agent can do
    stripe_customer_id: str   # Billing identifier

5. Communication Flow

1. Instruction Processing:

User Input → Keyword Analysis → Agent Selection → Task Distribution

2. Agent Execution:

Agent → API Configuration → System Prompt → AI Model → Tool Execution → Result

3. Tool Execution:

AI Request → Tool Selection → Tool Execution → Result Collection → Response

Data Flow

1. Configuration Loading

StateSet APIs
    ├── /api/rules/get-agent-rules
    ├── /api/attributes/get-agent-attributes
    └── /api/agents/get-agent

    System Prompt Generation

    Agent Initialization

2. Message Processing

User Message

Messages Array (with history)

Anthropic API (with tools)

Response with Tool Calls

Tool Execution

Tool Results

Continue or Complete

3. Billing Flow

Task Completion Detection

Token Usage Calculation

Stripe Meter Event

Usage Logged

Concurrency Model

Asyncio-based Architecture:

  • Main Event Loop: Coordinates all async operations
  • Agent Tasks: Each agent runs as an independent asyncio task
  • Parallel Execution: Multiple agents can run simultaneously
  • Graceful Shutdown: Cancellation propagates to all tasks

Task Management:

# Parallel agent execution
tasks = []
for agent in active_agents:
    task = asyncio.create_task(run_agent(...))
    tasks.append(task)

# Wait for all agents
results = await asyncio.gather(*tasks, return_exceptions=True)

Error Handling Strategy

1. Retry Mechanism:

  • API calls: 3 retries with exponential backoff
  • Network failures: Automatic retry with delay
  • Tool failures: Error reported back to AI for recovery

2. Error Propagation:

Tool Error → ToolResult(error=...) → AI Model → Alternative Approach

3. Graceful Degradation:

  • Missing API data: Use default system prompt
  • Tool failure: AI suggests alternative approach
  • API quota: Graceful shutdown with state preservation

Security Architecture

1. API Key Management:

  • Keys stored in code (should use environment variables)
  • Separate keys for different services
  • No key transmission to external services

2. Sandbox Execution:

  • Tools run in controlled environment
  • File system access limited by permissions
  • Network access controlled by system

3. Data Privacy:

  • Screenshots stored locally
  • No automatic data transmission
  • User data stays within system boundaries

Performance Optimizations

1. Prompt Caching:

  • 3 most recent conversation turns cached
  • Reduces token usage for repeated contexts
  • Ephemeral cache for session optimization

2. Image Management:

  • Configurable image retention (default: 5 most recent)
  • Automatic cleanup of older images
  • Base64 encoding for efficient transmission

3. Parallel Processing:

  • Multiple agents run concurrently
  • Independent task execution
  • Shared resource optimization

Extension Points

1. Adding New Agents:

  1. Define AgentConfig in AGENT_CONFIGS
  2. Add keyword detection in get_active_agents()
  3. Create specialized completion detection logic

2. Adding New Tools:

  1. Implement tool class inheriting from base
  2. Add to tool version groups
  3. Update tool collection initialization

3. Custom API Providers:

  1. Add to APIProvider enum
  2. Implement client initialization
  3. Update provider-specific logic

Monitoring and Observability

Logging Architecture:

Logger Configuration
    ├── Agent-specific logging with [AGENT_TYPE] prefix
    ├── Timestamp and log level
    ├── Structured error reporting
    └── API response logging

Metrics Collection:

  • Token usage per agent
  • Task completion rates
  • Error frequency and types
  • API response times

Best Practices for Development

  1. Agent Development:

    • Keep agents focused on specific domains
    • Implement clear completion detection
    • Handle errors gracefully
  2. Tool Development:

    • Return structured ToolResult objects
    • Include helpful error messages
    • Support both success and failure paths
  3. System Integration:

    • Use async/await consistently
    • Implement proper cleanup
    • Test shutdown scenarios
  4. Performance:

    • Minimize API calls
    • Cache reusable data
    • Optimize image handling

This architecture provides a scalable, maintainable foundation for computer use automation with AI agents.