Set Up a Web Content Connection (Sitemap)
Set up the Web Content connection (Sitemap) to start the flow of data into Data 360.
| User Permissions Needed | |
|---|---|
| To create a connection: | System Admin profile or Data 360 Architect permission set |
It’s your obligation to ensure that you have the rights to the data collected using this feature. Salesforce disclaims all liability with respect to such data collected.
Before you begin:
- Verify that your admin has enabled firewalls on the system you want Data 360 to connect to by including these IP addresses to your allowlists.
-
In Data 360, click Setup, and select Data 360 Setup.
-
Under External Integrations, select Other Connectors.
-
Click New.
-
On the Source tab, select Web Content (Sitemap) and click Next.
-
Enter a connection name and a connection API name.
-
To allow the sitemap connector to access your website, select an authentication method from the dropdown menu. If your website requires credentials, enter a username and password. Make sure that the username you provide has the necessary access to the content you want to ingest. The connector gets the same access permissions as this user.
-
Provide a valid URL of an XML sitemap ending with
/sitemap.xml(for example:https://help.yourdomain.com/sitemap.xml). URLs for Gzip-compressed XML sitemaps are also supported. -
If your website requires a specific user agent, customize the User Agent field string. Otherwise, the connector uses the default user agent.
-
To enhance content ingestion, override any of the site’s robots.txt file directives. You can ignore the website’s robots.txt file only if you have rights to its contents and IT approval.
- To prevent ingestion issues caused by your website’s robots.txt file, select the checkbox next to Ignore robots.txt entirely.
- To prevent ingestion failures caused by crawl delay exceeding 1 second, select the checkbox next to Ignore only Crawl-Delay in robots.txt.
- To prevent site page exclusion from ingestion, select the checkbox next to Ignore only Disallow in robots.txt.
-
To prevent ingestion timeouts on large sites, increase the value for Requests per Second (crawling rate). On small websites, prevent ingestion blocking by disabling or configuring a lower value for Requests per Second. By default, the connector performs five requests per second.
-
To review your configuration, click Test Connection.
-
Click Save.
At this point, Data 360 creates the connection, and you can now create data streams.