StateSet Computer Use Agents - Architecture Overview
Overview
AI-powered automation platform that leverages Claude Opus 4 to perform various tasks through computer interaction. The system uses multiple specialized agents that can control desktop environments, interact with web applications, and automate complex workflows.
🛠️ Key Features
- Multi-Agent Architecture - Specialized agents for different tasks
- Computer Vision & Control - Agents can see and interact with desktop environments
- API Integration - Seamless integration with StateSet APIs
- Metered Billing - Usage-based billing through Stripe
- Parallel Processing - Multiple agents can run concurrently
- Graceful Shutdown - Safe termination with Ctrl+C
📋 Requirements
- Ubuntu Linux (kernel 5.15.0+)
- Python 3.8+
- Virtual display (DISPLAY=:1)
- Anthropic API key
System Architecture
┌─────────────────────────────────────────────────────────────────┐
│ User Interface │
│ (Command Line / Shell Scripts) │
└─────────────────────┬───────────────────────────────────────────┘
│
┌─────────────────────▼───────────────────────────────────────────┐
│ main.py │
│ ┌─────────────┐ ┌──────────────┐ ┌────────────────────┐ │
│ │ Agent │ │ Agent │ │ Global State │ │
│ │ Selector │ │ Runner │ │ Management │ │
│ └─────────────┘ └──────────────┘ └────────────────────┘ │
└─────────────────────┬───────────────────────────────────────────┘
│
┌─────────────────────▼───────────────────────────────────────────┐
│ agent/loop.py │
│ ┌─────────────┐ ┌──────────────┐ ┌────────────────────┐ │
│ │ Sampling │ │ API │ │ System Prompt │ │
│ │ Loop │ │ Providers │ │ Initialization │ │
│ └─────────────┘ └──────────────┘ └────────────────────┘ │
└─────────────────────┬───────────────────────────────────────────┘
│
┌─────────────────────▼───────────────────────────────────────────┐
│ Tool Collection │
│ ┌────────────┐ ┌──────────────┐ ┌────────────────────┐ │
│ │ Computer │ │ Bash │ │ Edit │ │
│ │ Tool │ │ Tool │ │ Tool │ │
│ └────────────┘ └──────────────┘ └────────────────────┘ │
└─────────────────────┬───────────────────────────────────────────┘
│
┌─────────────────────▼───────────────────────────────────────────┐
│ External Services │
│ ┌────────────┐ ┌──────────────┐ ┌────────────────────┐ │
│ │ Anthropic │ │ StateSet │ │ Stripe │ │
│ │ API │ │ APIs │ │ Billing │ │
│ └────────────┘ └──────────────┘ └────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
Core Components
1. Main Entry Point (main.py
)
The main module orchestrates the entire system:
Key Functions:
main()
: Entry point that initializes and runs agents
get_active_agents()
: Determines which agents to activate based on instruction keywords
run_agent()
: Executes a single agent with error handling and billing
continuous_loop()
: Manages multiple agents running in parallel
analyze_task_completion()
: Determines if a task was completed successfully
Global State Management:
class GlobalState:
- running: bool # System-wide running flag
- tasks: Set[asyncio.Task] # Active agent tasks
- shutdown_event: Event # Coordination for graceful shutdown
- _lock: threading.Lock # Thread-safe state management
2. Agent Loop (agent/loop.py
)
The core agent execution engine:
Key Components:
sampling_loop()
: Main conversation loop with the AI model
initialize_system_prompt()
: Fetches agent configuration from APIs
- API Provider Support: Anthropic, Bedrock, Vertex
- Message Management: Handles context window and prompt caching
Message Flow:
- Initialize system prompt with agent rules and attributes
- Send messages to AI model
- Process AI response and tool calls
- Execute tools and collect results
- Continue conversation until task completion
ToolCollection
├── ComputerTool # GUI interaction
│ ├── screenshot()
│ ├── click()
│ ├── type_text()
│ └── scroll()
├── BashTool # System commands
│ ├── run()
│ └── execute_command()
└── EditTool # File manipulation
├── create()
└── modify()
- computer_use_20241022: Legacy version
- computer_use_20250124: Current version with enhanced capabilities
4. Agent Types
Each agent is configured with:
@dataclass
class AgentConfig:
org_id: str # Organization identifier
agent_id: str # Unique agent identifier
description: str # Agent purpose
capabilities: List[str] # What the agent can do
stripe_customer_id: str # Billing identifier
5. Communication Flow
1. Instruction Processing:
User Input → Keyword Analysis → Agent Selection → Task Distribution
2. Agent Execution:
Agent → API Configuration → System Prompt → AI Model → Tool Execution → Result
AI Request → Tool Selection → Tool Execution → Result Collection → Response
Data Flow
1. Configuration Loading
StateSet APIs
├── /api/rules/get-agent-rules
├── /api/attributes/get-agent-attributes
└── /api/agents/get-agent
↓
System Prompt Generation
↓
Agent Initialization
2. Message Processing
User Message
↓
Messages Array (with history)
↓
Anthropic API (with tools)
↓
Response with Tool Calls
↓
Tool Execution
↓
Tool Results
↓
Continue or Complete
3. Billing Flow
Task Completion Detection
↓
Token Usage Calculation
↓
Stripe Meter Event
↓
Usage Logged
Concurrency Model
Asyncio-based Architecture:
- Main Event Loop: Coordinates all async operations
- Agent Tasks: Each agent runs as an independent asyncio task
- Parallel Execution: Multiple agents can run simultaneously
- Graceful Shutdown: Cancellation propagates to all tasks
Task Management:
# Parallel agent execution
tasks = []
for agent in active_agents:
task = asyncio.create_task(run_agent(...))
tasks.append(task)
# Wait for all agents
results = await asyncio.gather(*tasks, return_exceptions=True)
Error Handling Strategy
1. Retry Mechanism:
- API calls: 3 retries with exponential backoff
- Network failures: Automatic retry with delay
- Tool failures: Error reported back to AI for recovery
2. Error Propagation:
Tool Error → ToolResult(error=...) → AI Model → Alternative Approach
3. Graceful Degradation:
- Missing API data: Use default system prompt
- Tool failure: AI suggests alternative approach
- API quota: Graceful shutdown with state preservation
Security Architecture
1. API Key Management:
- Keys stored in code (should use environment variables)
- Separate keys for different services
- No key transmission to external services
2. Sandbox Execution:
- Tools run in controlled environment
- File system access limited by permissions
- Network access controlled by system
3. Data Privacy:
- Screenshots stored locally
- No automatic data transmission
- User data stays within system boundaries
1. Prompt Caching:
- 3 most recent conversation turns cached
- Reduces token usage for repeated contexts
- Ephemeral cache for session optimization
2. Image Management:
- Configurable image retention (default: 5 most recent)
- Automatic cleanup of older images
- Base64 encoding for efficient transmission
3. Parallel Processing:
- Multiple agents run concurrently
- Independent task execution
- Shared resource optimization
Extension Points
1. Adding New Agents:
- Define AgentConfig in AGENT_CONFIGS
- Add keyword detection in get_active_agents()
- Create specialized completion detection logic
- Implement tool class inheriting from base
- Add to tool version groups
- Update tool collection initialization
3. Custom API Providers:
- Add to APIProvider enum
- Implement client initialization
- Update provider-specific logic
Monitoring and Observability
Logging Architecture:
Logger Configuration
├── Agent-specific logging with [AGENT_TYPE] prefix
├── Timestamp and log level
├── Structured error reporting
└── API response logging
Metrics Collection:
- Token usage per agent
- Task completion rates
- Error frequency and types
- API response times
Best Practices for Development
-
Agent Development:
- Keep agents focused on specific domains
- Implement clear completion detection
- Handle errors gracefully
-
Tool Development:
- Return structured ToolResult objects
- Include helpful error messages
- Support both success and failure paths
-
System Integration:
- Use async/await consistently
- Implement proper cleanup
- Test shutdown scenarios
-
Performance:
- Minimize API calls
- Cache reusable data
- Optimize image handling
This architecture provides a scalable, maintainable foundation for computer use automation with AI agents.
Responses are generated using AI and may contain mistakes.