> ## Documentation Index
> Fetch the complete documentation index at: https://docs.stateset.com/llms.txt
> Use this file to discover all available pages before exploring further.

# StateSet Computer Use Agents - Architecture Overview

> Architecture overview for using StateSet Computer Use Agents

# StateSet Computer Use Agents - Architecture Overview

## Overview

AI-powered automation platform that leverages Claude Opus 4 to perform various tasks through computer interaction. The system uses multiple specialized agents that can control desktop environments, interact with web applications, and automate complex workflows.

## 🛠️ Key Features

* **Multi-Agent Architecture** - Specialized agents for different tasks
* **Computer Vision & Control** - Agents can see and interact with desktop environments
* **API Integration** - Seamless integration with StateSet APIs
* **Metered Billing** - Usage-based billing through Stripe
* **Parallel Processing** - Multiple agents can run concurrently
* **Graceful Shutdown** - Safe termination with Ctrl+C

## 📋 Requirements

* Ubuntu Linux (kernel 5.15.0+)
* Python 3.8+
* Virtual display (DISPLAY=:1)
* Anthropic API key

## System Architecture

```
┌─────────────────────────────────────────────────────────────────┐
│                        User Interface                            │
│                   (Command Line / Shell Scripts)                 │
└─────────────────────┬───────────────────────────────────────────┘
                      │
┌─────────────────────▼───────────────────────────────────────────┐
│                         main.py                                  │
│  ┌─────────────┐  ┌──────────────┐  ┌────────────────────┐    │
│  │   Agent     │  │   Agent      │  │   Global State     │    │
│  │  Selector   │  │   Runner     │  │   Management       │    │
│  └─────────────┘  └──────────────┘  └────────────────────┘    │
└─────────────────────┬───────────────────────────────────────────┘
                      │
┌─────────────────────▼───────────────────────────────────────────┐
│                      agent/loop.py                               │
│  ┌─────────────┐  ┌──────────────┐  ┌────────────────────┐    │
│  │  Sampling   │  │   API        │  │   System Prompt    │    │
│  │    Loop     │  │  Providers   │  │   Initialization   │    │
│  └─────────────┘  └──────────────┘  └────────────────────┘    │
└─────────────────────┬───────────────────────────────────────────┘
                      │
┌─────────────────────▼───────────────────────────────────────────┐
│                     Tool Collection                              │
│  ┌────────────┐  ┌──────────────┐  ┌────────────────────┐     │
│  │  Computer  │  │    Bash      │  │      Edit          │     │
│  │   Tool     │  │    Tool      │  │      Tool          │     │
│  └────────────┘  └──────────────┘  └────────────────────┘     │
└─────────────────────┬───────────────────────────────────────────┘
                      │
┌─────────────────────▼───────────────────────────────────────────┐
│                    External Services                             │
│  ┌────────────┐  ┌──────────────┐  ┌────────────────────┐     │
│  │ Anthropic  │  │  StateSet    │  │     Stripe         │     │
│  │    API     │  │    APIs      │  │     Billing        │     │
│  └────────────┘  └──────────────┘  └────────────────────┘     │
└─────────────────────────────────────────────────────────────────┘
```

## Core Components

### 1. Main Entry Point (`main.py`)

The main module orchestrates the entire system:

#### Key Functions:

* **`main()`**: Entry point that initializes and runs agents
* **`get_active_agents()`**: Determines which agents to activate based on instruction keywords
* **`run_agent()`**: Executes a single agent with error handling and billing
* **`continuous_loop()`**: Manages multiple agents running in parallel
* **`analyze_task_completion()`**: Determines if a task was completed successfully

#### Global State Management:

```python theme={null}
class GlobalState:
    - running: bool              # System-wide running flag
    - tasks: Set[asyncio.Task]   # Active agent tasks
    - shutdown_event: Event      # Coordination for graceful shutdown
    - _lock: threading.Lock      # Thread-safe state management
```

### 2. Agent Loop (`agent/loop.py`)

The core agent execution engine:

#### Key Components:

* **`sampling_loop()`**: Main conversation loop with the AI model
* **`initialize_system_prompt()`**: Fetches agent configuration from APIs
* **API Provider Support**: Anthropic, Bedrock, Vertex
* **Message Management**: Handles context window and prompt caching

#### Message Flow:

1. Initialize system prompt with agent rules and attributes
2. Send messages to AI model
3. Process AI response and tool calls
4. Execute tools and collect results
5. Continue conversation until task completion

### 3. Tool System

#### Tool Collection Architecture:

```python theme={null}
ToolCollection
├── ComputerTool     # GUI interaction
│   ├── screenshot()
│   ├── click()
│   ├── type_text()
│   └── scroll()
├── BashTool         # System commands
│   ├── run()
│   └── execute_command()
└── EditTool         # File manipulation
    ├── create()
    └── modify()
```

#### Tool Versions:

* **computer\_use\_20241022**: Legacy version
* **computer\_use\_20250124**: Current version with enhanced capabilities

### 4. Agent Types

Each agent is configured with:

```python theme={null}
@dataclass
class AgentConfig:
    org_id: str               # Organization identifier
    agent_id: str             # Unique agent identifier
    description: str          # Agent purpose
    capabilities: List[str]   # What the agent can do
    stripe_customer_id: str   # Billing identifier
```

### 5. Communication Flow

#### 1. Instruction Processing:

```
User Input → Keyword Analysis → Agent Selection → Task Distribution
```

#### 2. Agent Execution:

```
Agent → API Configuration → System Prompt → AI Model → Tool Execution → Result
```

#### 3. Tool Execution:

```
AI Request → Tool Selection → Tool Execution → Result Collection → Response
```

## Data Flow

### 1. Configuration Loading

```
StateSet APIs
    ├── /api/rules/get-agent-rules
    ├── /api/attributes/get-agent-attributes
    └── /api/agents/get-agent
         ↓
    System Prompt Generation
         ↓
    Agent Initialization
```

### 2. Message Processing

```
User Message
    ↓
Messages Array (with history)
    ↓
Anthropic API (with tools)
    ↓
Response with Tool Calls
    ↓
Tool Execution
    ↓
Tool Results
    ↓
Continue or Complete
```

### 3. Billing Flow

```
Task Completion Detection
    ↓
Token Usage Calculation
    ↓
Stripe Meter Event
    ↓
Usage Logged
```

## Concurrency Model

### Asyncio-based Architecture:

* **Main Event Loop**: Coordinates all async operations
* **Agent Tasks**: Each agent runs as an independent asyncio task
* **Parallel Execution**: Multiple agents can run simultaneously
* **Graceful Shutdown**: Cancellation propagates to all tasks

### Task Management:

```python theme={null}
# Parallel agent execution
tasks = []
for agent in active_agents:
    task = asyncio.create_task(run_agent(...))
    tasks.append(task)

# Wait for all agents
results = await asyncio.gather(*tasks, return_exceptions=True)
```

## Error Handling Strategy

### 1. Retry Mechanism:

* API calls: 3 retries with exponential backoff
* Network failures: Automatic retry with delay
* Tool failures: Error reported back to AI for recovery

### 2. Error Propagation:

```
Tool Error → ToolResult(error=...) → AI Model → Alternative Approach
```

### 3. Graceful Degradation:

* Missing API data: Use default system prompt
* Tool failure: AI suggests alternative approach
* API quota: Graceful shutdown with state preservation

## Security Architecture

### 1. API key Management:

* Keys stored in code (should use environment variables)
* Separate keys for different services
* No key transmission to external services

### 2. Sandbox Execution:

* Tools run in controlled environment
* File system access limited by permissions
* Network access controlled by system

### 3. Data Privacy:

* Screenshots stored locally
* No automatic data transmission
* User data stays within system boundaries

## Performance Optimizations

### 1. Prompt Caching:

* 3 most recent conversation turns cached
* Reduces token usage for repeated contexts
* Ephemeral cache for session optimization

### 2. Image Management:

* Configurable image retention (default: 5 most recent)
* Automatic cleanup of older images
* Base64 encoding for efficient transmission

### 3. Parallel Processing:

* Multiple agents run concurrently
* Independent task execution
* Shared resource optimization

## Extension Points

### 1. Adding New Agents:

1. Define AgentConfig in AGENT\_CONFIGS
2. Add keyword detection in get\_active\_agents()
3. Create specialized completion detection logic

### 2. Adding New Tools:

1. Implement tool class inheriting from base
2. Add to tool version groups
3. Update tool collection initialization

### 3. Custom API Providers:

1. Add to APIProvider enum
2. Implement client initialization
3. Update provider-specific logic

## Monitoring and Observability

### Logging Architecture:

```
Logger Configuration
    ├── Agent-specific logging with [AGENT_TYPE] prefix
    ├── Timestamp and log level
    ├── Structured error reporting
    └── API response logging
```

### Metrics Collection:

* Token usage per agent
* Task completion rates
* Error frequency and types
* API response times

## Best Practices for Development

1. **Agent Development**:
   * Keep agents focused on specific domains
   * Implement clear completion detection
   * Handle errors gracefully

2. **Tool Development**:
   * Return structured ToolResult objects
   * Include helpful error messages
   * Support both success and failure paths

3. **System Integration**:
   * Use async/await consistently
   * Implement proper cleanup
   * Test shutdown scenarios

4. **Performance**:
   * Minimize API calls
   * Cache reusable data
   * Optimize image handling

This architecture provides a scalable, maintainable foundation for computer use automation with AI agents.
