Agent Development
StateSet Computer Use Agents - Architecture Overview
Architecture overview for using StateSet Computer Use Agents
StateSet Computer Use Agents - Architecture Overview
Overview
AI-powered automation platform that leverages Claude Opus 4 to perform various tasks through computer interaction. The system uses multiple specialized agents that can control desktop environments, interact with web applications, and automate complex workflows.
🛠️ Key Features
- Multi-Agent Architecture - Specialized agents for different tasks
- Computer Vision & Control - Agents can see and interact with desktop environments
- API Integration - Seamless integration with StateSet APIs
- Metered Billing - Usage-based billing through Stripe
- Parallel Processing - Multiple agents can run concurrently
- Graceful Shutdown - Safe termination with Ctrl+C
📋 Requirements
- Ubuntu Linux (kernel 5.15.0+)
- Python 3.8+
- Virtual display (DISPLAY=:1)
- Anthropic API key
System Architecture
Core Components
1. Main Entry Point (main.py
)
The main module orchestrates the entire system:
Key Functions:
main()
: Entry point that initializes and runs agentsget_active_agents()
: Determines which agents to activate based on instruction keywordsrun_agent()
: Executes a single agent with error handling and billingcontinuous_loop()
: Manages multiple agents running in parallelanalyze_task_completion()
: Determines if a task was completed successfully
Global State Management:
2. Agent Loop (agent/loop.py
)
The core agent execution engine:
Key Components:
sampling_loop()
: Main conversation loop with the AI modelinitialize_system_prompt()
: Fetches agent configuration from APIs- API Provider Support: Anthropic, Bedrock, Vertex
- Message Management: Handles context window and prompt caching
Message Flow:
- Initialize system prompt with agent rules and attributes
- Send messages to AI model
- Process AI response and tool calls
- Execute tools and collect results
- Continue conversation until task completion
3. Tool System
Tool Collection Architecture:
Tool Versions:
- computer_use_20241022: Legacy version
- computer_use_20250124: Current version with enhanced capabilities
4. Agent Types
Each agent is configured with:
5. Communication Flow
1. Instruction Processing:
2. Agent Execution:
3. Tool Execution:
Data Flow
1. Configuration Loading
2. Message Processing
3. Billing Flow
Concurrency Model
Asyncio-based Architecture:
- Main Event Loop: Coordinates all async operations
- Agent Tasks: Each agent runs as an independent asyncio task
- Parallel Execution: Multiple agents can run simultaneously
- Graceful Shutdown: Cancellation propagates to all tasks
Task Management:
Error Handling Strategy
1. Retry Mechanism:
- API calls: 3 retries with exponential backoff
- Network failures: Automatic retry with delay
- Tool failures: Error reported back to AI for recovery
2. Error Propagation:
3. Graceful Degradation:
- Missing API data: Use default system prompt
- Tool failure: AI suggests alternative approach
- API quota: Graceful shutdown with state preservation
Security Architecture
1. API Key Management:
- Keys stored in code (should use environment variables)
- Separate keys for different services
- No key transmission to external services
2. Sandbox Execution:
- Tools run in controlled environment
- File system access limited by permissions
- Network access controlled by system
3. Data Privacy:
- Screenshots stored locally
- No automatic data transmission
- User data stays within system boundaries
Performance Optimizations
1. Prompt Caching:
- 3 most recent conversation turns cached
- Reduces token usage for repeated contexts
- Ephemeral cache for session optimization
2. Image Management:
- Configurable image retention (default: 5 most recent)
- Automatic cleanup of older images
- Base64 encoding for efficient transmission
3. Parallel Processing:
- Multiple agents run concurrently
- Independent task execution
- Shared resource optimization
Extension Points
1. Adding New Agents:
- Define AgentConfig in AGENT_CONFIGS
- Add keyword detection in get_active_agents()
- Create specialized completion detection logic
2. Adding New Tools:
- Implement tool class inheriting from base
- Add to tool version groups
- Update tool collection initialization
3. Custom API Providers:
- Add to APIProvider enum
- Implement client initialization
- Update provider-specific logic
Monitoring and Observability
Logging Architecture:
Metrics Collection:
- Token usage per agent
- Task completion rates
- Error frequency and types
- API response times
Best Practices for Development
-
Agent Development:
- Keep agents focused on specific domains
- Implement clear completion detection
- Handle errors gracefully
-
Tool Development:
- Return structured ToolResult objects
- Include helpful error messages
- Support both success and failure paths
-
System Integration:
- Use async/await consistently
- Implement proper cleanup
- Test shutdown scenarios
-
Performance:
- Minimize API calls
- Cache reusable data
- Optimize image handling
This architecture provides a scalable, maintainable foundation for computer use automation with AI agents.