Add Custom Evaluation Criteria to a Test Case

To test an agent response for specific strings or numbers, you can create custom evaluations. This enables you to create test cases for any data in an agent response, extending testing capabilities beyond standard or out-of-the-box expectations.

Custom evaluations enable more specific testing, such as ensuring that latency is less than 10 seconds, or that agent action inputs and outputs meet certain requirements.

Custom evaluations follow a different format from standard expectations. Instead of defining an expectation with a name and an expected value, custom evaluations require individual parameters to define the evaluation requirements.

Currently, there are two different types of custom evaluations that you can add to your test metadata.

  • String comparison: Tests a response for a specific string value. The API name for this evaluation is string_comparison.
  • Numeric comparison: Tests a response for a specific numeric value. The API name for this evaluation is numeric_comparison.

This example custom evaluation checks if the reply email created by the DraftGenericReplyEmail action mentions the right person.

This example of a custom evaluation has three different parameters: an actual value, an expected value, and an operator to compare them. Each parameter is defined with a name field, a value field, and an isReference field. The parameters are set individually using the AiEvaluationExpectationParameter format. For details, see the Metadata reference.

This example evaluation uses the equals operator. This means the test case checks if the actual value from the agent matches the expected value. All string comparison operators are case sensitive. There are different valid operators for string comparison and numeric comparison evaluations.

The valid string comparison operators are:

  • equals: Checks if the actual value directly matches the expected value.
  • contains: Checks if the actual value contains the expected value.
  • startswith: Checks if the actual value begins with the expected value.
  • endswith: Checks if the actual value ends with the expected value.

The valid numeric comparison operators are:

  • equals: Checks for numerical equality.
  • greater_than_or_equal: Checks if the actual value is greater than or equal to (>=) the expected value.
  • greater_than: Checks if the actual value is greater than (>) the expected value.
  • less_than: Checks if the actual value is less than (<) the expected value.
  • less_than_or_equal: Checks if the actual value is less than or equal to (<=) the expected value.

Each parameter field is limited to 100 characters.

The actual value is retrieved with a JSONPath expression. This expression enables you to automatically point to the data you want to test from the Get Test Results resource in Connect API. For details on how to construct a JSONPath expression, see Dynamically Reference Generated Data.

In most custom evaluations, the value for the actual result is a JSONPath expression that points to generated data. This is runtime data from the generatedData object returned by the Get Test Results resource. For this expression to dynamically reference data, isReference must be set to true.

Most JSONPath expressions for custom evaluations follow a pattern.

For information on JSONPath operators, see the official documentation.

Get the query input for the namespace_actionName action:

Get the result output for the namespace_actionName action:

Get the value from the first additionalContext item for the namespace_actionName action:

This is a complete test case metadata file that demonstrates standard expectations, out-of-the-box metrics, custom evaluation criteria, and context variables.