Evaluations Guide
Create and manage evaluations to improve AI response quality and train custom models
Overview
The Evaluations system enables you to assess, track, and improve the quality of AI-generated responses. By creating evaluations, you can build datasets for fine-tuning models, monitor performance trends, and ensure consistent high-quality customer interactions.
Key Benefits
- Quality Assurance: Monitor and improve response quality
- Model Training: Export evaluations as JSONL for fine-tuning
- Performance Tracking: Analyze trends and identify areas for improvement
- Team Insights: Understand response patterns across different support types
Evaluation Status Types
Each evaluation is assigned a status that reflects the quality of the response:
Outstanding
Exceptional responses that exceed expectations. These serve as gold-standard examples for training.
Satisfactory
Good responses that meet quality standards and properly address customer needs.
Needs Further Review
Responses requiring additional assessment or minor improvements before final classification.
Unsatisfactory
Responses with significant issues that don’t meet quality standards.
Code Red
Critical failures requiring immediate attention and remediation.
Creating Evaluations
Manual Evaluation Creation
To create a new evaluation manually:
- Navigate to the Evaluations Dashboard
- Click “Create New Evaluation”
- Fill in the evaluation details:
Evaluation Types
Choose the appropriate type for your evaluation:
- Customer Service: General customer inquiries and support
- Technical Support: Technical issues and troubleshooting
- Sales: Sales-related interactions and inquiries
- Product Support: Product-specific questions and guidance
- General: Other types of interactions
Managing Evaluations
Dashboard Features
The Evaluations Dashboard provides comprehensive tools for management:
Performance Statistics
Monitor your evaluation metrics through dashboard cards:
- Total Evaluations: Overall count of all evaluations
- Outstanding: Count of exceptional responses
- Needs Review: Evaluations requiring further assessment
- This Week: Recent evaluation activity
Each metric includes trend indicators showing performance changes over time.
Exporting for Model Training
JSONL Export Format
Evaluations can be exported in JSONL format for model fine-tuning:
Export Methods
- Navigate to the Export tab in the dashboard
- Select “Create New” mode
- Enter the evaluation details:
- User Message
- Preferred Output
- Non-Preferred Output
- Tools (optional, for function calling)
- Click “Export to JSONL”
- Navigate to the Export tab in the dashboard
- Select “Create New” mode
- Enter the evaluation details:
- User Message
- Preferred Output
- Non-Preferred Output
- Tools (optional, for function calling)
- Click “Export to JSONL”
- Navigate to the Export tab
- Select “From Existing Evals” mode
- Choose evaluations to export:
- Use checkboxes to select specific evaluations
- Click “Select All” to export all evaluations
- Click “Export X Evaluations”
Advanced Export Options
For evaluations involving tool use or function calling:
Best Practices
Creating High-Quality Evaluations
Clear Naming
Use descriptive names that indicate the scenario being evaluated
Realistic Scenarios
Base evaluations on actual customer interactions and common use cases
Comprehensive Coverage
Include both ideal responses and common mistakes to avoid
Consistent Standards
Apply evaluation criteria consistently across similar interaction types
Regular Review
Periodically review and update evaluations to maintain relevance
Evaluation Criteria
When assessing responses, consider:
Accuracy
- Correct information provided
- Proper understanding of the issue
- Appropriate solution offered
Tone & Empathy
- Professional and friendly tone
- Empathy for customer situation
- Appropriate level of formality
Completeness
- All questions answered
- Clear next steps provided
- No missing information
Efficiency
- Concise yet comprehensive
- Direct problem resolution
- Minimal back-and-forth needed
Use Cases
Model Fine-Tuning
Export evaluations to create training datasets:
- Collect Examples: Build a corpus of high-quality evaluations
- Export to JSONL: Use the bulk export feature
- Prepare Dataset: Format according to your model’s requirements
- Fine-Tune: Use the dataset to improve model performance
Quality Monitoring
Track response quality over time:
- Monitor status distribution (Outstanding vs. Unsatisfactory)
- Identify patterns in problematic responses
- Track improvement after training or process changes
Team Training
Use evaluations for human agent training:
- Share examples of outstanding responses
- Highlight common mistakes to avoid
- Create training materials from real scenarios