Create and manage evaluations to improve AI response quality and train custom models
Search and Filter
Bulk Operations
View Modes
Human Evaluations
LLM-as-Judge
Code-based Evaluations
Clear Naming
Realistic Scenarios
Comprehensive Coverage
Consistent Standards
Regular Review
Evaluations not appearing
Export failing
Performance issues