StateSet Computer Use Agents - Architecture Overview

Overview

AI-powered automation platform that leverages Claude Opus 4 to perform various tasks through computer interaction. The system uses multiple specialized agents that can control desktop environments, interact with web applications, and automate complex workflows.

🛠️ Key Features

Multi-Agent Architecture - Specialized agents for different tasks
Computer Vision & Control - Agents can see and interact with desktop environments
API Integration - Seamless integration with StateSet APIs
Metered Billing - Usage-based billing through Stripe
Parallel Processing - Multiple agents can run concurrently
Graceful Shutdown - Safe termination with Ctrl+C

📋 Requirements

Ubuntu Linux (kernel 5.15.0+)
Python 3.8+
Virtual display (DISPLAY=:1)
Anthropic API key

System Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        User Interface                            │
│                   (Command Line / Shell Scripts)                 │
└─────────────────────┬───────────────────────────────────────────┘
                      │
┌─────────────────────▼───────────────────────────────────────────┐
│                         main.py                                  │
│  ┌─────────────┐  ┌──────────────┐  ┌────────────────────┐    │
│  │   Agent     │  │   Agent      │  │   Global State     │    │
│  │  Selector   │  │   Runner     │  │   Management       │    │
│  └─────────────┘  └──────────────┘  └────────────────────┘    │
└─────────────────────┬───────────────────────────────────────────┘
                      │
┌─────────────────────▼───────────────────────────────────────────┐
│                      agent/loop.py                               │
│  ┌─────────────┐  ┌──────────────┐  ┌────────────────────┐    │
│  │  Sampling   │  │   API        │  │   System Prompt    │    │
│  │    Loop     │  │  Providers   │  │   Initialization   │    │
│  └─────────────┘  └──────────────┘  └────────────────────┘    │
└─────────────────────┬───────────────────────────────────────────┘
                      │
┌─────────────────────▼───────────────────────────────────────────┐
│                     Tool Collection                              │
│  ┌────────────┐  ┌──────────────┐  ┌────────────────────┐     │
│  │  Computer  │  │    Bash      │  │      Edit          │     │
│  │   Tool     │  │    Tool      │  │      Tool          │     │
│  └────────────┘  └──────────────┘  └────────────────────┘     │
└─────────────────────┬───────────────────────────────────────────┘
                      │
┌─────────────────────▼───────────────────────────────────────────┐
│                    External Services                             │
│  ┌────────────┐  ┌──────────────┐  ┌────────────────────┐     │
│  │ Anthropic  │  │  StateSet    │  │     Stripe         │     │
│  │    API     │  │    APIs      │  │     Billing        │     │
│  └────────────┘  └──────────────┘  └────────────────────┘     │
└─────────────────────────────────────────────────────────────────┘

Core Components

1. Main Entry Point (`main.py`)

The main module orchestrates the entire system:

Key Functions:

main(): Entry point that initializes and runs agents
get_active_agents(): Determines which agents to activate based on instruction keywords
run_agent(): Executes a single agent with error handling and billing
continuous_loop(): Manages multiple agents running in parallel
analyze_task_completion(): Determines if a task was completed successfully

Global State Management:

class GlobalState:
    - running: bool              # System-wide running flag
    - tasks: Set[asyncio.Task]   # Active agent tasks
    - shutdown_event: Event      # Coordination for graceful shutdown
    - _lock: threading.Lock      # Thread-safe state management

2. Agent Loop (`agent/loop.py`)

The core agent execution engine:

Key Components:

sampling_loop(): Main conversation loop with the AI model
initialize_system_prompt(): Fetches agent configuration from APIs
API Provider Support: Anthropic, Bedrock, Vertex
Message Management: Handles context window and prompt caching

Message Flow:

Initialize system prompt with agent rules and attributes
Send messages to AI model
Process AI response and tool calls
Execute tools and collect results
Continue conversation until task completion

3. Tool System

Tool Collection Architecture:

ToolCollection
├── ComputerTool     # GUI interaction
│   ├── screenshot()
│   ├── click()
│   ├── type_text()
│   └── scroll()
├── BashTool         # System commands
│   ├── run()
│   └── execute_command()
└── EditTool         # File manipulation
    ├── create()
    └── modify()

Tool Versions:

computer_use_20241022: Legacy version
computer_use_20250124: Current version with enhanced capabilities

4. Agent Types

Each agent is configured with:

@dataclass
class AgentConfig:
    org_id: str               # Organization identifier
    agent_id: str             # Unique agent identifier
    description: str          # Agent purpose
    capabilities: List[str]   # What the agent can do
    stripe_customer_id: str   # Billing identifier

5. Communication Flow

1. Instruction Processing:

User Input → Keyword Analysis → Agent Selection → Task Distribution

2. Agent Execution:

Agent → API Configuration → System Prompt → AI Model → Tool Execution → Result

3. Tool Execution:

AI Request → Tool Selection → Tool Execution → Result Collection → Response

Data Flow

1. Configuration Loading

StateSet APIs
    ├── /api/rules/get-agent-rules
    ├── /api/attributes/get-agent-attributes
    └── /api/agents/get-agent
         ↓
    System Prompt Generation
         ↓
    Agent Initialization

2. Message Processing

User Message
    ↓
Messages Array (with history)
    ↓
Anthropic API (with tools)
    ↓
Response with Tool Calls
    ↓
Tool Execution
    ↓
Tool Results
    ↓
Continue or Complete

3. Billing Flow

Task Completion Detection
    ↓
Token Usage Calculation
    ↓
Stripe Meter Event
    ↓
Usage Logged

Concurrency Model

Asyncio-based Architecture:

Main Event Loop: Coordinates all async operations
Agent Tasks: Each agent runs as an independent asyncio task
Parallel Execution: Multiple agents can run simultaneously
Graceful Shutdown: Cancellation propagates to all tasks

Task Management:

# Parallel agent execution
tasks = []
for agent in active_agents:
    task = asyncio.create_task(run_agent(...))
    tasks.append(task)

# Wait for all agents
results = await asyncio.gather(*tasks, return_exceptions=True)

Error Handling Strategy

1. Retry Mechanism:

API calls: 3 retries with exponential backoff
Network failures: Automatic retry with delay
Tool failures: Error reported back to AI for recovery

2. Error Propagation:

Tool Error → ToolResult(error=...) → AI Model → Alternative Approach

3. Graceful Degradation:

Missing API data: Use default system prompt
Tool failure: AI suggests alternative approach
API quota: Graceful shutdown with state preservation

Security Architecture

1. API Key Management:

Keys stored in code (should use environment variables)
Separate keys for different services
No key transmission to external services

2. Sandbox Execution:

Tools run in controlled environment
File system access limited by permissions
Network access controlled by system

3. Data Privacy:

Screenshots stored locally
No automatic data transmission
User data stays within system boundaries

Performance Optimizations

1. Prompt Caching:

3 most recent conversation turns cached
Reduces token usage for repeated contexts
Ephemeral cache for session optimization

2. Image Management:

Configurable image retention (default: 5 most recent)
Automatic cleanup of older images
Base64 encoding for efficient transmission

3. Parallel Processing:

Multiple agents run concurrently
Independent task execution
Shared resource optimization

Extension Points

1. Adding New Agents:

Define AgentConfig in AGENT_CONFIGS
Add keyword detection in get_active_agents()
Create specialized completion detection logic

2. Adding New Tools:

Implement tool class inheriting from base
Add to tool version groups
Update tool collection initialization

3. Custom API Providers:

Add to APIProvider enum
Implement client initialization
Update provider-specific logic

Monitoring and Observability

Logging Architecture:

Logger Configuration
    ├── Agent-specific logging with [AGENT_TYPE] prefix
    ├── Timestamp and log level
    ├── Structured error reporting
    └── API response logging

Metrics Collection:

Token usage per agent
Task completion rates
Error frequency and types
API response times

Best Practices for Development

Agent Development:
- Keep agents focused on specific domains
- Implement clear completion detection
- Handle errors gracefully
Tool Development:
- Return structured ToolResult objects
- Include helpful error messages
- Support both success and failure paths
System Integration:
- Use async/await consistently
- Implement proper cleanup
- Test shutdown scenarios
Performance:
- Minimize API calls
- Cache reusable data
- Optimize image handling

This architecture provides a scalable, maintainable foundation for computer use automation with AI agents.

Overview

Quickstart

StateSet One

StateSet Response

StateSet Commerce

​StateSet Computer Use Agents - Architecture Overview

​Overview

​🛠️ Key Features

​📋 Requirements

​System Architecture

​Core Components

​1. Main Entry Point (main.py)

​Key Functions:

​Global State Management:

​2. Agent Loop (agent/loop.py)

​Key Components:

​Message Flow:

​3. Tool System

​Tool Collection Architecture:

​Tool Versions:

​4. Agent Types

​5. Communication Flow

​1. Instruction Processing:

​2. Agent Execution:

​3. Tool Execution:

​Data Flow

​1. Configuration Loading

​2. Message Processing

​3. Billing Flow

​Concurrency Model

​Asyncio-based Architecture:

​Task Management:

​Error Handling Strategy

​1. Retry Mechanism:

​2. Error Propagation:

​3. Graceful Degradation:

​Security Architecture

​1. API Key Management:

​2. Sandbox Execution:

​3. Data Privacy:

​Performance Optimizations

​1. Prompt Caching:

​2. Image Management:

​3. Parallel Processing:

​Extension Points

​1. Adding New Agents:

​2. Adding New Tools:

​3. Custom API Providers:

​Monitoring and Observability

​Logging Architecture:

​Metrics Collection:

​Best Practices for Development

StateSet Computer Use Agents - Architecture Overview

Overview

🛠️ Key Features

📋 Requirements

System Architecture

Core Components

1. Main Entry Point (`main.py`)

Key Functions:

Global State Management:

2. Agent Loop (`agent/loop.py`)

Key Components:

Message Flow:

3. Tool System

Tool Collection Architecture:

Tool Versions:

4. Agent Types

5. Communication Flow

1. Instruction Processing:

2. Agent Execution:

3. Tool Execution:

Data Flow

1. Configuration Loading

2. Message Processing

3. Billing Flow

Concurrency Model

Asyncio-based Architecture:

Task Management:

Error Handling Strategy

1. Retry Mechanism:

2. Error Propagation:

3. Graceful Degradation:

Security Architecture

1. API Key Management:

2. Sandbox Execution:

3. Data Privacy:

Performance Optimizations

1. Prompt Caching:

2. Image Management:

3. Parallel Processing:

Extension Points

1. Adding New Agents:

2. Adding New Tools:

3. Custom API Providers:

Monitoring and Observability

Logging Architecture:

Metrics Collection:

Best Practices for Development