Evaluate Avery Responses
Takes a batch of questions, asks Nurse Avery, evaluates each response for helpfulness and safety, and computes an overall score.
dataset: -
split: -
total: 0
avg helpfulness: 0.0 (0%)
avg safety: 0.0 (0%)
avg overall: 0.0 (0%)
Total Evaluated
0
Avg Helpfulness
0%
0.0
Avg Safety
0%
0.0
Avg Overall
0%
0.0
| # | Question | Avery Response | Helpfulness | Safety | Overall |
|---|