Step 3: Create Search Index and Retriever

After confirming that Data 360 ingests your website content, help Agentforce search your data at run time to provide your users with relevant answers reliably. Create a customized search index with parsing and chunking tuned to your content formats and page structure. Also create a retriever that Agentforce uses to pull relevant passages back into the model.

The search index helps Data 360 find relevant passages in your content. Use the next procedure as a pattern, but apply parsing, file-type, and chunking choices that match the content types and page shapes of your site. Consider your site pages’ typical length and complexity (plain text versus tables, diagrams, or dense layouts). For example, long pages or pages with heavy structure often need a different chunking strategy than short, sectioned help-center HTML.

  1. On the Data Cloud app, select the Search Index tab.
  2. To create a search index, click New, and then select Advanced Setup.
  3. For Search Type, select Hybrid Search.
  4. Under Select Source Object, select the relevant data space and your connector name, and click Next to proceed to Parsing.

Make sure that your data is broken down correctly so the LLM can process it effectively.

Data 360 splits your data into chunks, converts them into vector embeddings, and stored this in a Vector Data Model Object (VDMO). This process enables the retriever to search efficiently and improve answer quality. Consequently, make sure to select an embedding model that supports your chunk size.

The following section-aware chunking strategy helps preserve the structure of the source page. The 1,200 token limitation gives the chunks enough context without making them unnecessarily large, and the embedded model supports this chunk size. No overlap tokens keeps the chunk set cleaner when site pages are well structured. Title prepending gives each chunk more context, which can improve retrieval quality.

  1. On Parsing, select the default option, Default Parser.
  2. Select the default option, No Pre-Processing.
  3. On Select Files to Chunk, remove unnecessary file extensions to keep only the ones you want to ingest.
  4. Per file extension, configure chunking:
    • Chunking Strategy: Section Aware Chunking
    • Max Tokens: 1,200
    • Overlap tokens: 0
    • Prepend fields to each chunk: Select Title
  5. Select the embedding model (Vectorization Strategy): Salesforce Embedding V2 Small. This embedding model supports 1,200-token chunk size (also referred to as SFR-v2-small). If you choose a smaller chunk size, like 512 tokens, use a model with a smaller supported sequence length.
  6. Click Next and continue to save your changes.

After the index finishes successfully, confirm that chunk records are created and that they appear consistently with the source content you ingested. This validation step is important to confirm that the expected chunk data exists and is ready for retrieval.

  1. In Data 360, go to Data Explorer.
  2. Select a Data Space.
  3. For Object, select Data Lake Object.
  4. Locate the chunks object associated with your index.

The retriever queries your indexed content and returns the relevant chunks to the LLM at run time. In this scenario, create an individual retriever and configure it to return the fields that are most useful for grounding and citations.

  1. From the Data 360 app, find and select Agentforce Studio on the quick search bar.
  2. Go to the Agents tab.
  3. From the Build menu in the left, select Data and then select Retrievers.
  4. Click New Retriever.
  5. For retriever type, select Individual Retriever and click Next.
  6. Select Data Cloud as the data source for the new retriever.
  7. Define Retrieval Details:
    • Data Space: Select the relevant data space.
    • Source Object: Select your connector.
    • Search Index: Select the search index you created and click Next.
  8. Define filters: Select All Documents and click Next.
  9. Under Fields to Return, select:
    • URL
    • Title
    • Chunks
  10. Enable Citation Settings to ensure the agent can show its sources and click Next.
  11. Save your changes and retriever details, and then Activate.

By the end of this step, make sure that you have:

  • A search index configured for the ingested HTML content
  • Verified chunk records created from that content
  • An active retriever connected to the index
  • Citation-ready fields available for the agent