Create a Box Unstructured Data Stream (Beta)

Create an unstructured data stream and unstructured data lake object (UDLO) in Data Cloud to ingest your organization’s content from Box Drive into Data Cloud. See the Unstructured Data Reference for a list of supported file formats.

User Permissions Needed 
To connect unstructured data from an external blob store:Data Cloud Architect

Before you begin:

  • Make sure that you’ve set up a Box connection, and you know the name of the Box connection.
  • Verify you have a list of Folder IDs for all the folders you want to ingest.
  1. From App Launcher, select Data Cloud.

  2. Click Data Lake Objects and then click New.

  3. Select the From External Files tile, and click Next.

  4. From the New Data Lake Object screen, choose the Box connector tile and click Next.

  5. From the Connection Details dropdown, choose the Box connection you previously created. Data Cloud auto-populates the source based on the connection that you select.

  6. Configure what content is ingested from the Box account you connected to Data Cloud. From Box Folders to Include: add a comma-separated list of the folder IDs containing the content you want to ingest. If left empty, all content in all folders is ingested.

  7. Optionally, you can apply filters to limit ingestion to the content you want. There are several filters available to use. When you use more than one filter (even more than one filter of the same type), each filter applied removes or includes content from the number of files that are ingested. If you apply filters and you see that no content is being ingested, it may be because your filters are too restrictive. If you see content you didn’t intend to see, it may be because your filters are too broad.

    Apply any or all of the following filters:

    • File Types: Provide a comma-separated list of file extensions. For example, .pdf, .docx. Any file that matches the listed file extensions is included. If left empty, all file types are ingested.
    • Included Labels: Provide a comma-separated list of labels. Any content tagged with the provided label is ingested. If multiple labels are listed, only content tagged with both labels is ingested. Note that this field is case-sensitive. If you misspell a label, it is ignored.
    • Excluded Labels: Provide a comma-separated list of labels. Any content tagged with the provided label isn't ingested. If multiple labels are listed, only content tagged with both labels is excluded. Note that this field is case-sensitive. If you misspell a label, it is ignored.
    • Creation Date: Select a date from the calendar widget. Any content created on or after the provided date is ingested. Only one date can be used.
    • Last Update Date: Select a date from the calendar widget. Any content updated on or after the provided date is ingested. Only one date can be used.
  8. Click Next. The connector runs by default every two hours. You can monitor sync status in Data Stream status.

  9. To set up your unstructured lake object and its associated data model object, add an Object Name and an Object API Name for the UDLO. See Data Lake Object Naming Standards.

  10. Map the UDLO to a UDMO.

    1. To create a new UDMO, click New. Then select from the Data Space dropdown list a data space in which to create it. Add an Object Name and an Object API Name for the UDLO. See Data Lake Object Naming Standards.
    2. To use an existing UDMO, click Existing, and select a data space and a UDMO from the list from which to select the existing UDMO.
  11. Optionally, leave the checkbox selected to create a search index configuration for the UDMO using system defaults that automatically selects text fields and a chunking strategy for each field. You can deselect the checkbox and create a search index configuration later if you choose not to do so now.

  12. Click Next, or if you created a search index configuration, review the details, and save your work. The data stream ingests data from your Box drive into an unstructured data lake object (UDLO) and maps it to an unstructured data model object (UDMO). From this UDMO, a search index is created which can now be used to ground AI-generated responses.