Add Custom Evaluation Criteria to a Test Case
To test an agent response for specific strings or numbers, you can create custom evaluations. This enables you to create test cases for any data in an agent response, extending testing capabilities beyond standard or out-of-the-box expectations.
Custom evaluations enable more specific testing, such as ensuring that latency is less than 10 seconds, or that agent action inputs and outputs meet certain requirements.
Custom evaluations follow a different format from standard expectations. Instead of defining an expectation with a name and an expected value, custom evaluations require individual parameters to define the evaluation requirements.
Currently, there are two different types of custom evaluations that you can add to your test metadata.
- String comparison: Tests a response for a specific string value. The API name for this evaluation is
string_comparison
. - Numeric comparison: Tests a response for a specific numeric value. The API name for this evaluation is
numeric_comparison
.
This example custom evaluation checks if the reply email created by the DraftGenericReplyEmail action mentions the right person.
This example of a custom evaluation has three different parameters: an actual value, an expected value, and an operator to compare them. Each parameter is defined with a name
field, a value
field, and an isReference
field. The parameters are set individually using the AiEvaluationExpectationParameter
format. For details, see the Metadata reference.
This example evaluation uses the equals
operator. This means the test case checks if the actual
value from the agent matches the expected
value. All string comparison operators are case sensitive. There are different valid operators for string comparison and numeric comparison evaluations.
The valid string comparison operators are:
equals
: Checks if theactual
value directly matches theexpected
value.contains
: Checks if theactual
value contains theexpected
value.startswith
: Checks if theactual
value begins with theexpected
value.endswith
: Checks if theactual
value ends with theexpected
value.
The valid numeric comparison operators are:
equals
: Checks for numerical equality.greater_than_or_equal
: Checks if theactual
value is greater than or equal to (>=
) theexpected
value.greater_than
: Checks if theactual
value is greater than (>
) theexpected
value.less_than
: Checks if theactual
value is less than (<
) theexpected
value.less_than_or_equal
: Checks if theactual
value is less than or equal to (<=
) theexpected
value.
Each parameter field is limited to 100 characters.
The actual value is retrieved with a JSONPath expression. This expression enables you to automatically point to the data you want to test from the Get Test Results resource in Connect API. For details on how to construct a JSONPath expression, see Dynamically Reference Generated Data.
In most custom evaluations, the value
for the actual
result is a JSONPath expression that points to generated data. This is runtime data from the generatedData
object returned by the Get Test Results resource. For this expression to dynamically reference data, isReference
must be set to true
.
Most JSONPath expressions for custom evaluations follow a pattern.
For information on JSONPath operators, see the official documentation.
Get the query
input for the namespace_actionName
action:
Get the result
output for the namespace_actionName
action:
Get the value
from the first additionalContext
item for the namespace_actionName
action:
This is a complete test case metadata file that demonstrates standard expectations, out-of-the-box metrics, custom evaluation criteria, and context variables.