Create a Box Unstructured Data Lake Object (UDLO)

Create an unstructured data stream and unstructured data lake object (UDLO) in Data 360 to ingest your organization’s content from Box Drive into Data 360. See the Unstructured Data File Formats and Connectors for a list of supported content.

User Permissions Needed 
To connect unstructured data from an external blob store:Data Cloud Architect

Before you begin:

  • Make sure that you’ve set up a Box connection, and you know the name of the Box connection.
  • Verify you have a list of Folder IDs for all the folders you want to ingest.
  1. From App Launcher, select Data Cloud.

  2. Click Data Lake Objects and then click New.

  3. Select the From External Files tile, and click Next.

  4. From the New Data Lake Object screen, choose the Box connector tile and click Next.

  5. From the Connection Details dropdown, choose the Box connection you previously created. Data Cloud auto-populates the source based on the connection that you select.

  6. Configure what content is ingested from the Box account you connected to Data Cloud. From Box Folders to Include: add a comma-separated list of the folder IDs containing the content you want to ingest. If left empty, all content in all folders is ingested.

  7. Optionally, you can apply filters to limit ingestion to the content you want. There are several filters available to use. When you use more than one filter (even more than one filter of the same type), each filter applied removes or includes content from the number of files that are ingested. If you apply filters and you see that no content is being ingested, it may be because your filters are too restrictive. If you see content you didn’t intend to see, it may be because your filters are too broad.

    Apply any or all of these filters:

    • File Types: Provide a comma-separated list of file extensions. Any file that matches the listed file extensions is included. If left empty, all file types are ingested. All file extensions are supported.
    • Included Labels: Provide a comma-separated list of labels. If one label is provided, any content tagged with the provided label is ingested. If multiple labels are listed, only content tagged with all labels is ingested. Note that this field is case-sensitive. With all labels, if you misspell a label, it is ignored.
    • Excluded Labels: Provide a comma-separated list of labels. If one label is provided, any content tagged with the provided label isn't ingested. If multiple labels are listed, only content tagged with all labels is excluded. Note that this field is case-sensitive. With all labels, if you misspell a label, it is ignored.
    • Creation Date: Select a date from the calendar widget. Any content created on, or after the selected date is ingested. You can only select one date.
    • Last Update Date: Select a date from the calendar widget. Any content updated on, or after the selected date is ingested. You can only select one date.
  8. Click Next. The connector runs by default every two hours. You can monitor sync status in Data Stream status.

  9. Add an Object Name and an Object API Name for the UDLO. See Data Lake Object Naming Standards. Make sure that the object API name is unique. The API name field autopopulates based on the object name.

  10. In the Unstructured Data Model Object Mapping section, select New.

  11. From the Data Space Dropdown, leave the selection as Default.

  12. For the UDMO mapping, enter an Object Name and an Object API Name. See Data Lake Object Naming Standards. Make sure that the object API name is unique. The API name field autopopulates based on the object name.

  13. Optionally, select the Enable Unstructured Content Harmonization with system defaults checkbox to turn on content harmonization for the UDMO. You can leave content harmonization turned off for now and turn on content harmonization later.
    If you are using this feature, go to the Feature Manager and turn on both content harmonization and rendering.

    When you turn on content harmonization, you turn on collection of content viewer engagement data.

  14. Select Next.

  15. In the Search Index Configuration section, leave the checkbox selected to Enable Semantic Search with System Defaults. The system default settings automatically select text fields and apply a chunking strategy for each field. Deselect the checkbox to create a search index configuration later.

  16. Leave the remaining fields as-is to use the default settings, or rename and change the Search Configuration details and objects to make changes.

  17. Save your work. The data stream ingests data from your Box drive into an unstructured data lake object (UDLO) and maps it to an unstructured data model object (UDMO). From this UDMO, a search index is created that can now be used to ground AI-generated responses.