Create a GitHub Unstructured Data Lake Object (UDLO)

Create an Unstructured Data Lake Object (UDLO) in Data 360 to ingest your organization’s content from GitHub into Data 360. See the Unstructured Data File Formats and Connectors for a list of supported file formats.

User Permissions Needed
To create a connection:	System Admin profile or Data Cloud Architect permission set

Before you begin:

Make sure that you’ve set up a GitHub connection and know the name of the connection.
Verify you have a list of Repository IDs.
Verify that you have a list of labels you want to filter with.

From the App Launcher, select Data Cloud.
Select Data Lake Objects, and then select New.
Select From External Files tile, and then select Next.
From the New Data Lake Object screen, choose the GitHub connector tile and click Next.
From the Connection Details dropdown, choose the GitHub connection you created previously. Data Cloud automatically populates the source based on the connection that you select.
From the Core Content section, in the Repository ID field, provide the comma-separated list of Repo IDs you collected earlier. For example, if your repository URL is: https://github.com/GHUser/RepoID, copy both the GHUser and RepoID. For example: GHUser/RepoID. This field is context-sensitive and all mistakes are ignored.
From the Content Types section, select the content type you want to ingest. Data 360 ingests repository content, including files.
Based on the supplied Repo ID, you can select (include) or deselect (exclude) any or all of the content types:
- Issues (included by default, but you can exclude)
- Pull Requests (included by default, but you can exclude)
- Commits
- Milestones
- Releases
- Branches
  
  When the connector runs, any selected content type located in the specified Repo (as defined by the RepoID), is included.
Apply filters to limit the items you want to ingest. By default, everything (all files in the repository) is ingested. While this is optional, we recommended setting at least one filter. You can’t change this setting. Several filters are available. When you use more than one filter (even more than one filter of the same type), each filter applied removes or includes content from the number of files that are ingested. If you apply filters and no content is being ingested, your filters are likely too restrictive. If you see content you don’t want to see, your filters are likely too broad.

Apply any or all of the following filters:
- Issue/PR State: Provide a comma-separated list of issue or PR states by using the search. Any item with the provided state is included. If you list multiple states, items in any specified state are included.
- Labels: Provide a comma-separated list of labels by using the search. Any item tagged with the provided label is included. If you list multiple labels, articles tagged with any label are included.
- Assignees: Filter by issues assigned to a specific user, such as yourself. Provide a comma-separated list of assignees by using the search. Any item with the specified assignee is included. If you list multiple assignees, all items assigned to any assignee are included.
- Branches: Filter by branches to include your release branches but not your in-progress branches. Provide a comma-separated list of branches by using the search. Any item in a matching branch is included. If you list multiple branches, all items in any listed branch are included.
- Creation Date: Select a date from the calendar widget. Any item created on, or after that date is ingested. You can select only one date.
- Last Update Date: Select a date from the calendar widget. Any item updated on, or after that date is ingested. You can select only one date.
Click Save. The connector runs by default every two hours. You can monitor sync status in Data Stream status.
To set up your Unstructured Data Lake Object (UDLO) and its associated Data Model Object (DMO), add an Object Name and an Object API Name for the UDLO. See Data Lake Object Naming Standards.
Leave the checkbox selected to create a search index configuration for the DMO by using system defaults that automatically select text fields and a chunking strategy for each field. Deselect the checkbox to create a search index configuration later.
Select the checkbox to enable Content Harmonization for the UDLO. Leave it deselected to enable content harmonization later.

If you enable content harmonization now, AI Enrichments (Summary and Q&A) are disabled by default. To enable AI Enrichments, create a Harmonization Configuration and enable Einstein.

When you enable content harmonization, you enable collection of Content Viewer engagement data.
Click Next, or if you created a search index configuration, review the details, and Save your work.

The data stream ingests data from your selected repositories into a UDLO and maps it to an unstructured DMO. From this DMO, a search index is created that can now be used to ground AI-generated responses. If you enabled Content Harmonization, you can get a consistent view of the harmonized content in Data Explorer.