agent test run-eval (Beta)

Run rich evaluation tests against an Agentforce agent.

This feature is a Beta Service. Customers may opt to try such Beta Service in its sole discretion. Any use of the Beta Service is subject to the applicable Beta Services Terms provided at Agreements and Terms.

Specify the tests you want to run with one of these inputs to the –spec flag:

YAML test spec generated by the `agent generate test-spec` CLI command
JSON payload

When you provide a YAML test spec, this command automatically translates test cases into internal state-based evaluation framework calls and infers the agent name from the test spec’s subjectName field. As a result, you can use the same test spec with both the agent test run and agent test run-eval commands. YAML test specs also support context variables, which allow you to inject contextual data (such as CaseId or RoutableId) into agent sessions for testing with different contexts.

When you provide a JSON payload, it’s sent directly to the evaluation framework with optional normalization. The normalizer auto-corrects common field name mistakes, converts shorthand references to JSONPath, and injects defaults. Use --no-normalize to disable this auto-normalization. JSON payloads can also include context_variables on agent.create_session steps for the same contextual testing capabilities as when you use a YAML test spec.

This command supports more than 8 evaluator types, including subagent routing assertions, action invocation checks, string/numeric assertions, semantic similarity scoring, and LLM-based quality ratings.

Flag Name (Long)	Flag Name (Short)	Description
`‑‑api‑name`	`‑n`	Type: Value Agent API name (also called DeveloperName) used to resolve agent_id and agent_version_id. Auto-inferred from the YAML spec’s subjectName.
`‑‑api‑version`	N/A	Type: Value Override the api version used for api requests made by this command
`‑‑batch‑size`	N/A	Type: Value Default value: `5` Number of tests per API request (max 5).
`‑‑flags‑dir`	N/A	Type: Value Import flag values from a directory.
`‑‑json`	N/A	Type: Boolean Format output as json.
`‑‑no‑normalize`	N/A	Type: Boolean Disable auto-normalization of field names and shorthand references.
`‑‑result‑format`	N/A	Type: Value Valid Values: `json`, `human`, `junit`, `tap` Default value: `human` Format of the agent test run results.
`‑‑spec`	`‑s`	Type: Value Required Path to test spec file (YAML or JSON). Supports reading from stdin when piping content.
`‑‑target‑org`	`‑o`	Type: Value Required Username or alias of the target org. Not required if the `target-org` configuration variable is already set.

Run tests using a YAML test spec on the org with alias “my-org”:

Run tests using a YAML spec with explicit agent name override; use your default org:

Run tests using a JSON payload:

Run tests and output results in JUnit format; useful for continuous integration and deployment (CI/CD):

Run tests with contextVariables to inject contextual data into agent sessions (add contextVariables to test cases in your YAML spec):

Pipe JSON payload from stdin (–spec flag is automatically populated from stdin):