Use Test Results to Improve Your Agent

If your tests all pass, congratulations! However, if one or more tests fail, you have some work to do. For more information, look at the errorMessage field in a failed test result and the score for each test result.

Use the conversation preview panel in the Agent Builder UI to talk to the active agent and test the words and responses in a conversational way. Then use the information to fine tune your agent instructions, actions, or topics.

A topic test checks if the agent responded with the expected topic when it received the utterance. A topic test is defined by an expectation name of topic_sequence_match in the AiEvaluationDefinition metadata component. If the test fails, check the topic's expectedValue defined in AiEvaluationDefinition versus the topic that the agent actually used.

The score field returns a value of 0 if the result is FAILED or 1 if the result is PASS.

An action test verifies if the agent used the expected action or actions. An action test is defined by an expectation name of action_sequence_match in the AiEvaluationDefinition metadata component. If the test fails, check the action's expectedValue defined in AiEvaluationDefinition versus the action that the agent actually used.

The score field returns a value of 0 if the result is FAILED or 1 if the result is PASS.

An outcome test uses a semantic comparison between the expected and actual values using natural language. Even if the text of the actual outcome differs from the expected outcome, the test can still pass if the core meaning is the same. However, if the actual outcome is significantly different, the test fails. An outcome test is defined by an expectation name of bot_response_rating in the AiEvaluationDefinition metadata component. If the test fails, check the action's expectedValue defined in AiEvaluationDefinition versus the actual agent response.

The score field returns a value between 0 and 5. A test passes if the value is greater than or equal to 3.

A test is coherent if the response is easy to understand and has no grammatical errors. This test type is defined by an expectation name of coherence in the AiEvaluationDefinition metadata component. If you use this quality check, you don't need an expectedValue field value.

The score field returns a value between 0 and 1. A test passes if the value is greater than or equal to 0.6.

A test is complete if the response includes all the essential information. This test type is defined by an expectation name of completeness in the AiEvaluationDefinition metadata component. If you use this quality check, you don't need an expectedValue field value.

The score field returns a value between 0 and 1. A test passes if the value is greater than or equal to 0.6.

A test is concise if the response is brief but comprehensive. Shorter is better. This test type is defined by an expectation name of conciseness in the AiEvaluationDefinition metadata component. If you use this quality check, you don't need an expectedValue field value.

The score field returns a value of 0 if the result is FAILED or 1 if the result is PASS.

A latency test returns the latency in milliseconds from sending a request until a response is received. This test type is defined by an expectation name of output_latency_milliseconds in the AiEvaluationDefinition metadata component. If you use this quality check, you don't need an expectedValue field value. There is no pass or fail result for this type of test.

The score field returns the latency in milliseconds.