Get Started
Considerations
Build Tests in Metadata API
Refine Test Cases with Custom Evaluation Criteria
Run Tests in Connect API
Deploy and Run Tests in the Command Line
Use Test Results to Improve Your Agent
MCP Solutions
Agentforce Vibes
To test an agent response for specific strings or numbers, you can create custom evaluations. This enables you to create test cases for any data in an agent response, extending testing capabilities beyond standard or out-of-the-box expectations.
Custom evaluations enable more specific testing, such as ensuring that latency is less than 10 seconds, or that agent action inputs and outputs meet certain requirements.
Custom evaluations follow a different format from standard expectations. Instead of defining an expectation with a name and an expected value, custom evaluations require individual parameters to define the evaluation requirements.
Currently, there are two different types of custom evaluations that you can add to your test metadata.
string_comparison.numeric_comparison.This example custom evaluation checks if the reply email created by the DraftGenericReplyEmail action mentions the right person.
<expectation>
<label>expected recipient match</label>
<name>string_comparison</name>
<parameter>
<name>operator</name>
<value>equals</value>
<isReference>false</isReference>
</parameter>
<parameter>
<name>actual</name>
<value>$.generatedData.invokedActions[*][?(@.function.name == 'DraftGenericReplyEmail')].function.input.recipient</value>
<isReference>true</isReference>
</parameter>
<parameter>
<name>expected</name>
<value>Jon</value>
<isReference>false</isReference>
</parameter>
</expectation>This example of a custom evaluation has three different parameters: an actual value, an expected value, and an operator to compare them. Each parameter is defined with a name field, a value field, and an isReference field. The parameters are set individually using the AiEvaluationExpectationParameter format. For details, see the Metadata reference.
This example evaluation uses the equals operator. This means the test case checks if the actual value from the agent matches the expected value. All string comparison operators are case sensitive. There are different valid operators for string comparison and numeric comparison evaluations.
The valid string comparison operators are:
equals: Checks if the actual value directly matches the expected value.contains: Checks if the actual value contains the expected value.startswith: Checks if the actual value begins with the expected value.endswith: Checks if the actual value ends with the expected value.The valid numeric comparison operators are:
equals: Checks for numerical equality.greater_than_or_equal: Checks if the actual value is greater than or equal to (>=) the expected value.greater_than: Checks if the actual value is greater than (>) the expected value.less_than: Checks if the actual value is less than (<) the expected value.less_than_or_equal: Checks if the actual value is less than or equal to (<=) the expected value.Each parameter field is limited to 100 characters.
Note
The actual value is retrieved with a JSONPath expression. This expression enables you to automatically point to the data you want to test from the Get Test Results resource in Connect API. For details on how to construct a JSONPath expression, see Dynamically Reference Generated Data.
In most custom evaluations, the value for the actual result is a JSONPath expression that points to generated data. This is runtime data from the generatedData object returned by the Get Test Results resource. For this expression to dynamically reference data, isReference must be set to true.
Most JSONPath expressions for custom evaluations follow a pattern.
$.generatedData.invokedActions[*][?(@.function.name == '{ACTION}')].{DYNAMIC_DATA}You can show the generated JSON data when retrieving test results via Agentforce DX. To show the generated JSON, add the --verbose flag to a agent test run command. For more information, see Customize the Agent Test Spec.
Get the query input for the namespace_actionName action:
$.generatedData.invokedActions[*][?(@.function.name == {namespace_actionName})]
.function.input.queryGet the result output for the namespace_actionName action:
$.generatedData.invokedActions[*][?(@.function.name == 'namespace_actionName')]
.function.output.resultGet the value from the first additionalContext item for the namespace_actionName action:
$.generatedData.invokedActions[*][?(@.function.name == {namespace_actionName})]
.function.output.additionalContext[0].valueFor information on JSONPath operators, see the official documentation.
This is a complete test case metadata file that demonstrates standard expectations, out-of-the-box metrics, custom evaluation criteria, and context variables.
<?xml version="1.0" encoding="UTF-8"?>
<AiEvaluationDefinition xmlns="http://soap.sforce.com/2006/04/metadata">
<description>My first Salesforce Agent test</description>
<name>Agent_Sanity</name>
<subjectType>AGENT</subjectType>
<subjectName>Sales_Agent</subjectName>
<subjectVersion>v1</subjectVersion>
<testCase>
<number>1</number>
<inputs>
<utterance>Summarize the Global Media account</utterance>
<contextVariable>
<variableName>OrchestrationStage</variableName>
<variableValue>001SB00000MC0yrYAD_test</variableValue>
</contextVariable>
<contextVariable>
<variableName>EndUserLanguage</variableName>
<variableValue>Spanish</variableValue>
</contextVariable>
</inputs>
<expectation>
<name>topic_sequence_match</name>
<expectedValue>OOTBSingleRecordSummary</expectedValue>
</expectation>
<expectation>
<name>action_sequence_match</name>
<expectedValue>['IdentifyRecordByName', 'SummarizeRecord']</expectedValue>
</expectation>
<expectation>
<name>bot_response_rating</name>
<expectedValue>Summarization of the Global Media account including important points</expectedValue>
</expectation>
<expectation>
<name>coherence</name>
</expectation>
<expectation>
<name>output_latency_milliseconds</name>
</expectation>
</testCase>
<testCase>
<number>2</number>
<inputs>
<utterance>List contact names associated with Global Media account</utterance>
</inputs>
<expectation>
<name>topic_sequence_match</name>
<expectedValue>GeneralCRM</expectedValue>
</expectation>
<expectation>
<name>action_sequence_match</name>
<expectedValue>['IdentifyRecordByName', 'QueryRecords']</expectedValue>
</expectation>
<expectation>
<name>bot_response_rating</name>
<expectedValue>should respond with list of contacts</expectedValue>
</expectation>
<expectation>
<label>expected recipient match</label>
<name>string_comparison</name>
<parameter>
<name>operator</name>
<value>equals</value>
<isReference>false</isReference>
</parameter>
<parameter>
<name>actual</name>
<value>$.generatedData.invokedActions[*][?(@.function.name == 'DraftGenericReplyEmail')].function.input.recipient</value>
<isReference>true</isReference>
</parameter>
<parameter>
<name>expected</name>
<value>Jon</value>
<isReference>false</isReference>
</parameter>
</expectation>
</testCase>
</AiEvaluationDefinition>