Web Content (Sitemap) Connector Limitations
Learn about the functional limitations of the Web Content (Sitemap) connector that affect certain behaviors and outcomes.
- Yoast sitemaps aren't supported, see example
- Some non-XML sitemaps are not supported.
- HTML and PDF pages are supported.
- Supported image formats: .jpg, .jpeg, .png.
- Video and audio files (.mp3, .mp4, etc.) aren’t supported.
When parsing links, the connector automatically removes the entire query string (all text following the ?) from the URL. This limitation applies to all pages except the root page.
As the connector removes everything that follows the query string ?, any pages included in the URL following the query string are not ingested.
The Web Content (Sitemap) connector doesn’t support content that is dynamically rendered via JavaScript at runtime. Any content rendered in this manner isn't ingested.
The connector’s rate limit is 5 records (pages) per second. If your website doesn't support this rate limit, the connector may fail.
The ingested site needs to allow Apache HttpClient Requests.
Learn about Data 360's general limitations which may create issues for this specific connector.