Create a GitHub Unstructured Data Stream (Beta)
Create an Unstructured Data Lake Object (UDLO) in Data Cloud to ingest your organization’s content from GitHub into Data Cloud. See the Unstructured Data Reference for a list of supported file formats.
This feature is a Beta Service. A customer may opt to try a Beta Service in its sole discretion. Any use of the Beta Service is subject to the applicable Beta Services Terms provided at Agreements and Terms. If you have questions or feedback about this Beta Service, contact the Data Cloud Connector team at datacloud-connectors-beta@salesforce.com.
User Permissions Needed | |
---|---|
To create a connection: | System Admin profile or Data Cloud Architect permission set |
Before you begin:
- Make sure that you’ve set up a GitHub connection and know the name of the connection.
- Verify you have a list of Repository IDs.
- Verify that you have a list of labels or you want to filter with.
-
From the App Launcher, select Data Cloud.
-
Select Data Lake Objects, and then select New.
-
Select From External Files tile, and then select Next.
-
From the New Data Lake Object screen, choose the GitHub connector tile and click Next.
-
From the Connection Details dropdown, choose the GitHub connection you created previously. Data Cloud auto-populates the source based on the connection that you select.
-
From the Core Content section, in the Repository ID field, provide the comma-separated list of user/repo IDs you collected earlier. This field is context-sensitive and all mistakes are ignored.
-
From the Content Types section, select the content type you want to ingest. The choices you make depend on the Repository IDs you supplied previously. If you supply more than one Repository ID, the selections are applied to all supplied repositories. Based on the repository ID, you can select a checkbox, which ingests any or all of these content types:
- Issues
- Commits
- Milestones
- ReadMe Files
- Pull Requests
- Releases
- Branches
-
Apply filters to limit the items you want to ingest. By default, everything is ingested. While this is optional, it is recommended to use at least one filter. Several filters are available.
When you use more than one filter (even more than one filter of the same type), each filter applied removes or includes content from the number of files that are ingested. If you apply filters and no content is being ingested, your filters are likely too restrictive. If you see content you don’t want to see, your filters are likely too broad.Apply any or all of the following filters:
- Issue/PR State: Provide a comma-separated list of issue or PR states by using the search. Any item with the provided state is included. If you list multiple states, items in any specified state are included.
- Labels: Provide a comma-separated list of labels by using the search. Any item tagged with the provided label is included. If you list multiple labels, articles tagged with any label are included.
- Assignees: Filter by issues assigned to a specific user, such as yourself. Provide a comma-separated list of assignees by using the search. Any item with the specified assignee is included. If you list multiple assignees, all items assigned to any assignee are included.
- Branches: Filter by branches to include your release branches but not your in-progress branches. Provide a comma-separated list of branches by using the search. Any item in a matching branch is included. If you list multiple branches, all items in any listed branch are included.
- Creation Date: Select a date from the calendar widget. Any item created on or after thatdate is ingested. You can select only one date.
- Last Update Date: Select a date from the calendar widget. Any item updated on or after that date is ingested. You can select only one date.
-
Click Save. The connector runs by default every two hours. You can monitor sync status in Data Stream status.
-
To set up your Unstructured Data Lake Object (UDLO) and its associated Data Model Object (DMO), add an Object Name and an Object API Name for the UDLO. See Data Lake Object Naming Standards.
-
Leave the checkbox selected to create a search index configuration for the UDMO by using system defaults that automatically select text fields and a chunking strategy for each field. Deselect the checkbox to create a search index configuration later.
-
Click Next, or if you created a search index configuration, review the details, and Save your work.
The data stream ingests data from your selected repositories into a UDLO and maps it to an Unstructured UDMO. From this UDMO, a search index is created that can now be used to ground AI-generated responses.