Build a Salesforce Agent Skill with Claude Code

Getting AI to write code is easy. Getting it to write code that passes your security review, respects governor limits, follows your preferred frameworks, and scores well on static analysis, consistently, is often problematic. A good way to solve this is to use an agent skill: a structured prompt package that encodes your team’s definition of “production-ready” into something that an agent follows and a validator enforces. Because LLMs are probabilistic, outliers are inevitable and the architecture assumes this and catches them.

This post breaks down the structure of a Salesforce agent skill using Apex as an example. For clarity, we’ll use Claude Code to highlight specific examples, but the principles apply to any agent. We’ll cover what a skill is, the project structure, and how to get started creating your first skill.

What is an agent skill?

An agent skill is a structured prompt package that transforms Claude from a general-purpose assistant into a specialist. Rather than relying on the model’s training data alone, a skill provides explicit workflow contracts, reference material, code templates, and automated validators that load into context when the skill activates.

The large language models (LLMs) that power Claude already know the programming languages across the Salesforce ecosystem: Apex, LWC, SOQL, etc. But knowing a language and consistently producing high-quality code that adheres to best practices are two different things. A skill bridges that gap.

Each skill lives in a folder with a predictable structure:

1skills/authoring-apex/
2├── SKILL.md              # The execution contract
3├── references/           # Decision trees, patterns, guardrails
4├── assets/               # Example code
5└── hooks/scripts/        # Python validators

At minimum, you need a SKILL.md, which is the entry point that Claude reads when the skill activates. Everything else is scaffolding that you add as the skill grows. Start with one SKILL.md and one reference doc, and add hooks when you want mechanical enforcement rather than relying on instructions alone.

Skills activate based on frontmatter. Frontmatter is the YAML metadata block between the ‘---’ delimiters at the top of a markdown file. It’s not rendered as content, but it’s there to be read as machine-readable configuration. In a skill, the frontmatter tells an agent when to activate: the name field identifies the skill, and the description field contains the trigger rules that the agent uses to decide whether this skill is relevant to the current task.

In the example below, the description uses a TRIGGER when / DO NOT TRIGGER when structure that gives the agent both positive and negative match criteria. This eliminates the ambiguous middle ground (like “is this SOQL query part of Apex authoring or the SOQL skill?”) that causes misfires when you only tell the model what to activate on.

1---
2name: authoring-apex
3description: >
4  TRIGGER when: user writes, edits, or reviews Salesforce Apex code —
5  .cls or .trigger files including service classes, selector classes,
6  trigger handlers, test classes, batch jobs, queueable classes,
7  invocable methods, or Apex REST endpoints.
8  DO NOT TRIGGER when: user works on LWC JavaScript, Flow XML,
9  standalone SOQL queries, or running existing tests.
10---

This is a simplified version. In practice, the description is more comprehensive: listing every class type (schedulable, AuraEnabled, HttpCalloutMock) and naming which skill handles each excluded case (e.g. “use authoring-lwc” for LWC JavaScript). The more specific the boundary, the fewer misfires.

When a matching task is detected, Claude loads the SKILL.md and follows the workflow. You can also trigger a skill manually with /skill-name in the prompt. Either way, the full skill context — workflow, references, templates — is injected automatically. You never paste the SKILL.md content into the conversation yourself.

The skill format itself (a SKILL.md with frontmatter, references, and assets) is an open specification maintained by Anthropic under Apache 2.0 and open to community contributions. The SKILL.md, references, and templates that you write are portable. The hooks mechanism (hooks.yaml with PostToolUse lifecycle events) is Claude Code-specific — if you use another agent runtime, you’d wire validation differently, but the skill content travels with you. This makes skills headless. Use them with Agentforce Vibes, Claude Code, Cursor, VS Code, Gemini CLI, OpenAI Codex, Windsurf, Roo Code, Goose, or any of the 30+ agents that support the open specification.

The structure of an agent skill

SKILL.md: The execution contract

This is the key, non-reorderable workflow with hard exit criteria for each phase that will be followed when the skill is used. Here is a simplified example showing the structure. Your real SKILL.md would have more detail in each phase, but the shape is the same: ordered phases with exit criteria.

1Follow this workflow in order. Do not skip, merge, or reorder steps.
2If blocked, stop and ask for missing context. If not applicable, mark `N/A`
3with a one-line justification in the report.
4
5## Required Inputs
6
7Gather or infer before authoring:
8
9- Class type (service, selector, batch, queueable, invocable, trigger handler)
10- Target object(s) and business goal
11- Sharing default (`with sharing` unless justified)
12- Trigger framework already in use (or explicit choice)
13
14The phases:
15
16### Phase 1 — Author
171. **Discover project conventions** — scan for existing patterns, trigger
18     framework, naming style
192. **Choose the smallest correct pattern**:
20
21  | Need               | Pattern                              |
22  | ------------------ | ------------------------------------ |
23  | Business logic     | Service class                        |
24  | Data access        | Selector class                       |
25  | Trigger logic      | Trigger handler (framework required) |
26  | Flow integration   | @InvocableMethod                     |
27  | Bulk processing    | Batch Apex                           |
28  | Async work         | Queueable                            |
29
303. **Read the matching template** from `assets/` before authoring
314. **Author with guardrails** — apply every rule in the Rules section below
325. **Generate test class** — delegate to testing skill
33
34### Phase 2 — Validate
35
366. **Run code analyzer** — remediate all blocking violations; re-run until clean
377. **Execute tests** — capture pass/fail and coverage percentage
38
39### Phase 3 — Report
40
418. **Report** — files, design decisions, analyzer output, test results, deploy note

If a phase doesn’t apply, Claude must document it as N/A with justification. This maps directly to Anthropic’s best practice of providing “instructions as sequential steps using numbered lists when order or completeness matters.”

references/: The knowledge library

These are your reference documents that target a specific failure mode that general training handles inconsistently. Below are some best practices from Anthropic to follow since the temptation will be to include too much information.

Best Practice	Reasoning	Example for Apex
File size	Claude prioritizes unevenly as documents grow. Short, focused files get more consistent attention.	Split by concern: core patterns (Factory, Strategy, Selector, Service) in one file, and advanced patterns (Unit of Work, Domain Model, Facade) in another.
Structured hierarchy	Table of contents and headers let Claude locate relevant sections without reading everything.	SKILL.md says “read best-practices.md before authoring” and Claude navigates to the heading it needs, not the whole file.
One rule per section	One point per section eliminates ambiguity about which rule applies.	“SOQL in Loops” covers SOQL in loops, not DML, sharing, or null safety.
Short sections, many of them	Self-contained sections (~20-40 lines) let Claude apply one rule without loading the full document.	One anti-pattern entry = one heading, one “Why this fails” paragraph, one BAD block, one “Fix” line, and one GOOD block.
Concrete over abstract	BAD/GOOD pairs are implicit few-shot examples. Anthropic says examples are “one of the most reliable ways to steer output.” Abstract rules (“avoid governor limits”) get interpreted loosely; concrete pairs anchor the behavior.	// BAD: 1 SOQL per iteration. Hits 100-query limit at record 101. <your demonstrable example> // GOOD: 1 SOQL total. <your demonstrable example>
Consequences over commands	“Don’t do X” is weaker than “X fails because Y.” Anthropic notes “explaining why helps Claude generalize” to novel cases. If Claude knows the reason, it applies the rule even to patterns that look different from the example.	“Salesforce allows only 150 DML statements per transaction. A trigger on 200 records with DML inside the loop hits 150 at record 151 and the entire batch rolls back.”
Positive framing	“Don’t do X” forces Claude to infer what to do. Stating the desired behavior directly gives Claude a single target to hit.	Declare sharing explicitly on every class, with sharing for user-facing logic, and inherited sharing for utility classes called from both contexts.
Diverse examples	Cover edge cases so Claude doesn’t over-fit to one shape. Anthropic recommends 3-5 varied examples to prevent narrow pattern matching.	`selector.cls` could show four methods: `getByIds` (bulk), `getByName` (LIKE with sanitization), `getWithContacts` (parent-child subquery), `getContactsWithAccount` (child-to-parent). Same pattern, four shapes. Claude then learns “selector” and not “single query method.”

For an authoring-apex skill let’s explore the reference files you might consider including.

Reference	What the doc contains	What goes wrong without it
`best-practices.md`	Your coding conventions (4-space indent, 120-char lines), naming rules, ApexDoc requirements, API version management, and general guidance	Drift from project conventions – Claude generates valid code that doesn’t look like your code.
`design-patterns.md`	Decision trees for patterns (Factory, Strategy, Selector, Service, Batch, Queueable, Unit of Work, Domain Model, etc.). Each pattern includes a “when to use” table so Claude doesn’t default to the most common pattern	Without this, Claude tends to produce Service classes for everything, even when a Selector or Strategy would be more appropriate.
`anti-patterns.md`	Common Apex mistakes, each with a BAD/GOOD code pair and an explanation of why it fails at scale	Covers common mistakes Claude makes more often than human developers (e.g., generating database.query() without AccessLevel hints, or using legacy System.assertEquals instead of Assert.areEqual).
`security-guide.md`	CRUD/FLS enforcement using USER_MODE (API 56+) and Security.stripInaccessible() for backward compatibility. SOQL injection prevention. XSS protection patterns	Without this reference, Claude sometimes generates code that works for someone assigned an admin profile but throws INSUFFICIENT_ACCESS for standard users
`transaction-security-policy.md`	Specialized references. This example covers TxnSecurity.EventCondition implementations (Enhanced Transaction Security). These classes are global, run in system context, and evaluate monitoring events	These are special cases. They break every normal rule about sharing and visibility. Without this reference, Claude applies with sharing to these types of classes and breaks them.

This leverages what Anthropic calls providing “context and motivation behind instructions.” The references don’t just say what to do, they explain why, enabling Claude to generalize correctly to novel situations rather than pattern-matching blindly.

As an example the anti-patterns.md could look like this. This structure is designed for practices that are likely to go wrong repeatedly. Every section follows the same BAD/GOOD formula because Claude needs to recognise and avoid independent mistakes. Learn more about these anti-patterns in the documentation.

1# Apex Anti-Patterns
2
3## Table of Contents
4
5- [SOQL in Loops](#soql-in-loops)
6- 
7
8---
9
10## SOQL in Loops
11**Anti-pattern:** Executing SOQL queries inside a `for` or `while` loop.
12
13**Why this fails:** Salesforce enforces a hard limit of 100 SOQL queries
14per synchronous transaction. Triggers fire in batches of up to 200 records,
15so a single SOQL inside a loop exhausts the limit after just 100 records.
16
17```apex
18// BAD: 1 SOQL per iteration — hits 100-query limit at record 101
19for (Account acc : accounts) {
20    List contacts = [SELECT Id FROM Contact WHERE AccountId = :acc.Id];
21}
22
23Fix: Query once in bulk, then iterate over results.
24
25// GOOD: 1 SOQL total
26Map<Id, List> contactsByAccount = new Map<Id, List>();
27for (Contact c : [SELECT Id, AccountId FROM Contact WHERE AccountId IN :accountIds]) {
28    ...
29}
30// add other entries to align with the anti-patterns in salesforce well-architected

For speciality examples, like transaction security policies, the markdown could follow this structure and cover one specific Apex feature that breaks all the normal rules. It needs to teach Claude when the normal rules don’t apply, what to do instead, and why. There’s no BAD/GOOD pair because the “bad” code (e.g., global without sharing) is actually the correct code in this context. Note that the example in this case refers to a worked example in the assets folder.

1# Transaction Security Policy (Apex condition)
2
3## Table of Contents
4
5- [Overview](#overview)
6- [Class shape](#class-shape)
7- [Sharing](#sharing)
8- [Guardrails](#guardrails)
9- [Example](#example)
10
11---
12
13## Overview
14
15Enhanced Transaction Security policies call Apex when Condition Builder
16is not enough. Implement `TxnSecurity.EventCondition` with a single
17`evaluate(SObject event)` method.
18
19## Class shape
20
21- Declare the class **`global`** (required for Setup selection)
22- Implement `TxnSecurity.EventCondition`
23- Signature: `global Boolean evaluate(SObject event)`
24
25## Sharing
26
27These classes are **not** user-facing controllers. Use `without sharing`.
28The normal `with sharing` default does NOT apply here -- applying it
29will break the policy silently.
30
31## Guardrails
32
33- One bulk SOQL query max (no SOQL in loops -- governor limits still apply)
34- Return `true` to block the transaction, `false` to allow
35
36## Example
37
38See assets/transaction-security-policy.cls

assets/: Few-shot by example

Anthropic highlights that one of the most reliable ways to steer Claude’s output format, tone, and structure is with implicit few-shot examples. Few-shot just means that you provide a few good examples as a template. Each template embeds the non-negotiable requirements: ApexDoc comments, bulk-safe logic, explicit sharing declarations, CRUD/FLS enforcement, and dependency injection for testability. When Claude adapts a template, it inherits these properties by construction rather than needing to recall them from training.

For Apex specifically, the world is your oyster here, but let’s explore some good use cases.

Asset	What the doc contains	What goes wrong without it
`service.cls`	Business logic orchestrator. Delegates queries, collects DML, handles errors.	Claude puts query logic, DML, and orchestration in one huge method instead of looking into a separation of concerns.
`selector.cls`	Centralized SOQL access. One selector per sObject.	Claude scatters ad-hoc queries throughout service and trigger code. Makes SOQL difficult to audit for security/performance in code review.
`batch.cls`	Large-volume async processing (10,000+ records). `Database.Batchable` and `Database.Stateful`.	Claude writes batch jobs that swallow errors silently, or using non-stateful batches with no way to report failures.
`queueable.cls`	Async processing with object passing and job chaining.	Claude chains without depth limits (infinite recursion), or drops errors because partial DML isn’t used.
`transaction-security-policy.cls`	Enhanced Transaction Security `EventCondition` implementation.	Claude applies normal rules (with sharing, `USER_MODE`) to a class that must break them. Transaction Security Policy (TSP) classes are global without sharing by platform requirement.

Each template isn’t just “how to write X.” It’s a pre-loaded, few-shot example with the non-negotiable requirements baked into the structure.

hooks.yaml and validator scripts: The safety net

A hook is a shell command that Claude Code runs automatically at a specific lifecycle moment. You configure them in hooks.yaml. The script is whatever that command executes, typically a Python validator that inspects the tool’s input/output and returns structured feedback.

Hooks close the feedback loop without human intervention. Claude writes something, the hook evaluates it immediately, and Claude sees the result in the same conversation turn. The correction happens in context while Claude still has the full problem loaded.

The hooks file wires the validator to the right moment. For example, a preflight check runs when the user submits a prompt.

1{
2  "hooks": {
3    "UserPromptSubmit": [
4      {
5        "hooks": [
6          {
7            "type": "command",
8            "command": "python3 ${CLAUDE_SKILL_DIR}/hooks/scripts/preflight-apex-check.py",
9            "timeout": 10000
10          }
11        ]
12      }
13    ]
14  }
15}

Or a PostToolUse hook fires after every file write or edit.

1---
2name: authoring-apex
3description: ...
4hooks:
5  PostToolUse:
6    - matcher: "Write|Edit"
7      hooks:
8        - type: command
9          command: "python3 ${CLAUDE_SKILL_DIR}/hooks/scripts/preflight-apex-check.py"
10          timeout: 90000
11---

Claude Code delivers the hook context as JSON on stdin. The payload shape varies by lifecycle event: PostToolUse includes the tool name, input parameters, and output; UserPromptSubmit includes the user’s prompt text.

Consider a preflight check for your development team. The SKILL.md workflow says “run Code Analyzer” in the validate phase. But what if Code Analyzer isn’t installed or is the wrong version? Without a hook, Claude attempts the command mid-workflow, gets an error, and the developer has to diagnose a missing plugin. With a hook, the check runs the moment Apex work begins.

1import json, subprocess, sys
2
3MINIMUM_VERSION = "5.5.0"
4
5def parse_version(v):
6    """Parse version string into comparable tuple."""
7    return tuple(int(x) for x in v.split(".")[:3])
8
9def main():
10    hook_input = json.load(sys.stdin)
11    prompt = hook_input.get("prompt", "").lower()
12
13    # Only check when Apex work is likely
14    apex_keywords = ["apex", ".cls", "class", "trigger", "service", "selector", "batch"]
15    if not any(kw in prompt for kw in apex_keywords):
16        sys.exit(0)
17
18    try:
19        result = subprocess.run(
20            ["sf", "plugins", "--json"],
21            capture_output=True, text=True, timeout=15
22        )
23        if result.returncode == 0:
24            plugins = json.loads(result.stdout)
25            for plugin in plugins:
26                if plugin.get("name") == "@salesforce/plugin-code-analyzer":
27                    version = plugin.get("version", "0.0.0")
28                    if parse_version(version) >= parse_version(MINIMUM_VERSION):
29                        sys.exit(0)
30                    else:
31                        print("=== Preflight Check ===")
32                        print(f"  [WARNING] Code Analyzer {version} is below minimum {MINIMUM_VERSION}.")
33                        print(f"  Install: sf plugins install @salesforce/plugin-code-analyzer@latest")
34                        sys.exit(0)
35    except (FileNotFoundError, subprocess.TimeoutExpired, json.JSONDecodeError):
36        pass
37
38    print("=== Preflight Check ===")
39    print("  [WARNING] Salesforce Code Analyzer is not installed.")
40    print("  The authoring-apex skill requires it for validation.")
41    print("  Install: sf plugins install @salesforce/plugin-code-analyzer")
42    sys.exit(0)
43
44if __name__ == "__main__":
45    main()

The developer is told up front that their toolchain is incomplete rather than discovering it mid-workflow when the validate phase fails.

That’s what separates an instruction from a hook. The SKILL.md says “run Code Analyzer in the validate phase.” The hook proves that it can run before the workflow even starts. Instructions are aspirational; hooks are mechanical. You want both capabilities.

Why agent skills need human and mechanical validation

Skills are not a silver bullet. They dramatically narrow the probability distribution of Claude’s outputs, but they operate on a fundamentally non-deterministic system. Two things work against you:

You can’t fully specify behavior. No matter how precise your SKILL.md, real-world requirements contain novel combinations that aren’t covered by templates or references. Claude must still make judgement calls and newer models interpret prompts more literally, so slight underspecification produces inconsistent results. If your SKILL.md says “add error handling” without specifying the pattern, you’ll get different approaches each time.

Outputs are probabilistic. Even with structured prompts, role assignment, examples, and validation hooks, you’re optimizing a hit rate. You move from 60% correct to 95% correct — a massive improvement — but the remaining 5% is why validators exist. This is why the architecture includes both preventive measures (structured prompts, references, templates) and detective measures (automated validators, scoring rubrics, blocking on errors). The skill assumes Claude will sometimes get it wrong and builds correction into the workflow rather than pretending perfection is achievable.

Getting started

Anthropic’s golden rule for prompts applies directly: “Show your prompt to a colleague with minimal context and ask them to follow it. If they’d be confused, Claude would be too.” A well-built agent skill is unambiguous to both a human reader and the model.

You don’t need to build a skill from scratch. Anthropic provides a skill-creator skill that walks you through the full process: capturing intent, writing the SKILL.md, creating test cases, running evals, and iterating until the output meets your bar. Install it and tell Claude what you want the skill to do; it handles the scaffolding, interviews you on edge cases, and generates a working draft you can refine.

If you’d prefer to work from an existing Salesforce-specific example, the Agentforce Vibes skill library includes production-ready skills for Apex, LWC, Flow, and more. Install them, use them, and look at how they’re structured; they follow similar patterns described in this post. The Agentforce Vibes skill library puts rules and rationale inline in the SKILL.md and uses complete Apex source files as templates rather than Markdown reference docs. This keeps the skill self-contained in fewer files, though it trades modularity: updating one rule means editing the main workflow file.

The skill grows with your needs. Start with the skill-creator or an existing skill, customize what doesn’t fit your project, and build new skills only when you have a gap not covered by existing ones.

Resources

GitHub: Salesforce’s collection of agent skills
Trailhead: Prompt Fundamentals
Trailhead: Prompt Engineering Techniques
GitHub: Anthropics Skill Creator
Documentation: Claude Code

About the author

Dave Norris is a Developer Advocate at Salesforce. He’s passionate about making technical subjects broadly accessible to a diverse audience. Dave has been with Salesforce for over a decade, has over 40 Salesforce and MuleSoft certifications, and became a Salesforce Certified Technical Architect in 2013.