Common Web Content (Crawler) Errors and Solutions
Tips for troubleshooting issues that may occur when you ingest a website content using the Web Content (Crawler) connector.
When you create a new Web Content (Crawler) connector, you can enable search indexing.
Search Index or chunks weren't created.
- From App Launcher, select Data Cloud.
- Open the Search Index tab, select the Search Index you created, and click Rebuild from the dropdown menu on the right.
- Manually create a new Search Index from the Search Index tab: Click New -> Easy Setup, and select the corresponding Data Model Object.
The last run of the Data Stream failed.
The connector’s rate limit is 5 records (pages) per second. If your website doesn’t support such a rate limit, your connector may fail.
Ask your IT team to authorize the IP addresses listed in IP Addresses Used By Data Cloud Services to avoid rate-limit blockages.
When you create a Web Content (Crawler) connector, your website pages are ingested into Data Cloud.
Website pages were not ingested.
- Confirm the starting URL you entered isn't a redirect, and make sure it doesn’t change when accessed. If it's a redirect, replace the starting URL with the new target URL and re-run the connector.
- If the issue is caused by client-side rendering, where links to subpages are rendered via a script, this is a known limitation of the Sitemap connector. Content created from client-side rendering isn't ingested.
When you create a Web Content (Crawler) connector, Data Cloud syncs your records.
Zero records synced.
Success with zero records happens when the server doesn’t allow Data Cloud to crawl on any page, because a robots.txt file has “disallow: /” property. To allow the connector to ingest content from your website, try removing this property from your robots.txt file and run the connector again.
When you create a connection, you can view your synced records in the Data Streams tab.
The data stream shows errors.
Wait for the data stream to finish its run, which can take up to 12 hours. Then access the Data Stream tab to see if records were ingested.
- If records were ingested, the connector ran successfully, but you should contact your Salesforce admin to start an investigation.
- If records weren't ingested, retry the connector: Click the error status, and then click Retry Now.
- If data is still not being ingested, contact your Salesforce admin to start an investigation.