Build Tests in Metadata API
To define tests, use the AiEvaluationDefinition
Metadata API type. To learn how to use Metadata API, see Quick Start: Metadata API.
To use Salesforce CLI to create agent tests instead of directly using Metadata API, see Test an Agent with Agentforce DX.
The AiEvaluationDefinition
metadata type contains a set of test cases. Each test case takes inputs (including an utterance) and contains a set of expectations (such as an expected action sequence) for the response.
Test cases can include various types of inputs. The primary input is an utterance, but you can also include context variables and conversation history to evaluate agent responses in complex scenarios.
In addition to an utterance, a test case input can contain context variables. These variables allow you to create more nuanced tests on how agents behave in different contexts, and determine the overall robustness of an agent in scenarios that better simulate a production environment.
For more information on context variables, see the Standard Variable Reference.
By default, context variables are immutable and are only set at the beginning of an agent session. The only context variable that is editable after a session begins is EndUserLangauge
.
In addition to context variables and an utterance, you can add the conversation history to the test definition as input. Passing the conversation history into the testing service adds additional context that enables multi-turn testing. Instead of testing single shot utterance-response pairs, you can now test utterances within the context of a conversation.
To add conversation history to your test definition, add the messages from the conversation within inputs
as shown. Each conversationHistory
includes the role of the message sender, the message text, the topic used if the role is agent, and the index of the message in the conversation.
For more information on the fields required for conversation history, see the Testing API Metadata Reference.
This sample XML AiEvaluationDefinition
has two test cases for the Agentforce_for_Salesforce
agent. The first test case provides an utterance (“Summarize the Global Media account”) and defines multiple expectations for the response.
- The first expectation verifies that the
OOTBSingleRecordSummary
topic is used. - The second expectation verifies that the
IdentifyRecordByName
action is used. - The third expectation includes a string that's expected in the test response.
- The fourth expectation uses the
conciseness
quality metric to gauge whether the generated answer is brief but comprehensive. Shorter is better.
To deploy metadata components with Salesforce CLI, Deploy and Run Tests in the Command Line.