Unlock Multi-Modal AI with File Inputs in Prompt Builder

Ever wished your AI could “see” the images you’re working with or instantly “read” the PDFs you need to analyze? Large language models (LLMs) are already masters at understanding user intent, dissecting text, and crafting natural language responses. When integrated with Agentforce platform tools and safeguards, they become powerful AI agents ready to tackle customer requests. But traditionally, their world has been text-only. That changes now — as of the Spring ‘25 release, you can enable your AI workflows and agents to accept and process files to generate even smarter responses, plans, and actions.

Now, you can enrich your prompt templates in Prompt Builder by using files like images and PDFs with select large language models. These models can now interpret visual information from images or extract text and structure from PDFs, all alongside your text prompt and record merge fields. Multi-modal prompting helps AI craft responses that are not just accurate, but also deeply context-aware. Prompt Builder supports various file types, including PNGs, JPEGs, and PDFs, for select models such as OpenAI GPT-4 Omni Mini and Vertex AI Gemini 2.0 Flash 001.

Note: It’s always a good practice to check the specific model’s limitations directly within Prompt Builder to confirm all supported file types and any constraints.

There are two primary ways to add files like images and PDFs as context:

File object (Flex prompt template only): Pass files directly as File (ContentDocument) records. This method is ideal when you’re invoking prompts programmatically (from Apex, Lightning web components, REST services, or Agentforce actions) and your target image or PDF is a File object.
Notes & Attachments related list: Add the file to a record’s Notes & Attachments. This approach is well-suited for simpler integrations, such as using images to power Field Generation prompt templates where the file is directly associated with the parent record.

Let’s explore three use cases illustrating how file context enhances Prompt Builder and Agentforce: writing better product descriptions using images, providing improved troubleshooting guidance with visual context, and comparing a contract record to a contract PDF.

Writing better product descriptions with image analysis

Compelling product descriptions are vital for e-commerce. They don’t just showcase your product’s value, they also improve search engine visibility. By combining a product image with textual details, you can generate accurate, enticing product specifications at scale that go far beyond simple categorization. Images allow LLMs to describe not just what a product is, but also what it looks like. Visual information is helpful for describing a variety of products, but it’s especially useful for those that come in a variety of styles and materials, such as clothing.

With the Field Generation prompt template, you can automate product description generation using both specifications and images in one step. First, create a Field Generation prompt template for the Product object, selecting the description field as the output target.

Salesforce Prompt Builder interface showing a Field Generation template setup.

In the prompt template’s configuration window, select an LLM that supports image inputs. You can verify the supported input types for the selected model under the Model Limitations section in the configuration panel. Note the limitations, such as maximum file size (e.g., 10 MB) and image count (e.g., 10 images).

Next, craft a prompt template designed to generate a product description. The prompt template would use the product record’s name, color, gender, and material fields as merge fields to ground the model with core product information.

Finally, add the object’s Notes & Attachments related list field to the prompt template. Any supported file attached via this related list will automatically be included as context when the prompt template runs. You can preview and test the template with a specific product record to see the LLM’s generated description. The screenshot below shows the text results for a men’s Nordic pullover sweater. Inputs include product details (name, color, etc.) and an image file. The generated output description incorporates the sweater’s visual style (e.g., cable knit).

Prompt Builder test results for a men's Nordic pullover sweater.

While the record fields provide the LLM with fundamental information (name, color, material, gender), the image conveys the sweater’s specific visual style, such as its cable knit design.

Now you can create compelling product descriptions at scale by invoking your Field Generation prompt template from quick actions, autolaunched flows, or a Lightning record page.

Providing improved troubleshooting guidance with visual context

A picture is often worth a thousand words, especially in technical troubleshooting. Using Prompt Builder’s Flex template, you can build automations and Agentforce actions that leverage visual context from images when handling technical support questions.

First, create a Flex prompt template accepting a File object (for the image) and a Free Text field (for the user’s question) as inputs.

Salesforce Prompt Builder interface showing a Flex template setup for troubleshooting.

In the prompt template, select a model supporting image inputs. Write a prompt that incorporates both the free text input and the File object. To test, provide both a question and an image in the test inputs section. You can use existing Salesforce image files or upload new ones. In the example below, the free text input is “What’s wrong with my computer?” and the test image is a screenshot of a stop error (the “Blue Screen of Death”).

Prompt Builder test for a troubleshooting prompt template.

Here, the image provides almost all the necessary context, allowing the AI agent to parse the screen text, identify the stop error, and recommend next steps. If the agent also had access to internal troubleshooting guides, it could use the image context to look up relevant knowledge articles via the Answer with Knowledge action. Providing an Agentforce agent with visual context significantly improves its ability to understand issues where customers might struggle to describe accurately in words alone.

Efficiently parse and analyze PDFs with File Inputs

PDFs are the backbone of business communication — contracts, presentations, research papers, and more. With Prompt Builder’s File Inputs feature, you can now create prompt templates that power automations to parse, categorize, and summarize these documents with remarkable efficiency.

The steps for building a prompt template that accepts a PDF are no different than building one that accepts images. For example, you could build a Flex template designed to analyze a Contract record and compare its fields to the actual contract document (saved to the Notes & Attachments related list) to identify discrepancies and inconsistencies automatically. The screenshot below shows a prompt template instructing the Gemini 2.0 Flash model to identify inconsistencies between a record snapshot and its attached contract (from the ‘Notes & Attachments’ list). The output reveals two such discrepancies: the effective date and start date in the Contract record do not match the information in the contract document.

Prompt Builder test for a contract processing template.

Processing PDFs is a powerful capability that allows you to augment your automations and Agentforce agents in new and creative ways. You can build flows that parse contract PDFs to populate records, develop research agents that summarize and analyze extensive research papers and white papers, or even create Agentforce actions that let users effectively “chat” with their documents. Imagine asking specific questions directly to an uploaded contract or a lengthy report and getting immediate answers.

Conclusion: Unlock new AI possibilities

Prompt Builder’s File Inputs is more than just a new feature; it’s a gateway to empowering your automations and Agentforce agents by allowing them to process rich context from various file types, including images and PDFs. This capability unlocks a multitude of possibilities: from troubleshooting with screenshots and generating richer product descriptions using visual details, to analyzing and summarizing PDF documents and verifying image-description alignment.

We encourage you to explore this new functionality. Think about the visual or document-based information that could supercharge your existing workflows or inspire entirely new AI-driven solutions on the Salesforce Platform. Dive into the resources below to get started.

Resources

About the Author

Charles Watkins is a Lead Developer Advocate at Salesforce. You can follow him on LinkedIn and GitHub.