Build Tests in Metadata API

To define tests, use the AiEvaluationDefinition Metadata API type. To learn how to use Metadata API, see Quick Start: Metadata API.

To use Salesforce CLI to create agent tests instead of directly using Metadata API, see Test an Agent with Agentforce DX.

The AiEvaluationDefinition metadata type contains a set of test cases. Each test case takes inputs (including an utterance) and contains a set of expectations (such as an expected action sequence) for the response.

Test cases can include various types of inputs. The primary input is an utterance, but you can also include context variables and conversation history to evaluate agent responses in complex scenarios.

In addition to an utterance, a test case input can contain context variables. These variables allow you to create more nuanced tests on how agents behave in different contexts, and determine the overall robustness of an agent in scenarios that better simulate a production environment.

For more information on context variables, see the Standard Variable Reference.

By default, context variables are immutable and are only set at the beginning of an agent session. The only context variable that is editable after a session begins is EndUserLangauge.

In addition to context variables and an utterance, you can add the conversation history to the test definition as input. Passing the conversation history into the testing service adds additional context that enables multi-turn testing. Instead of testing single shot utterance-response pairs, you can now test utterances within the context of a conversation.

To add conversation history to your test definition, add the messages from the conversation within inputs as shown. Each conversationHistory includes the role of the message sender, the message text, the topic used if the role is agent, and the index of the message in the conversation.

For more information on the fields required for conversation history, see the Testing API Metadata Reference.

‌This sample XML AiEvaluationDefinition has two test cases for the Agentforce_for_Salesforce agent. The first test case provides an utterance (“Summarize the Global Media account”) and defines multiple expectations for the response.

The first expectation verifies that the OOTBSingleRecordSummary topic is used.
The second expectation verifies that the IdentifyRecordByName action is used.
The third expectation includes a string that's expected in the test response.
The fourth expectation uses the conciseness quality metric to gauge whether the generated answer is brief but comprehensive. Shorter is better.

See AiEvaluationDefinition.

To deploy metadata components with Salesforce CLI, Deploy and Run Tests in the Command Line.