Common Sitemap Errors and How to Fix Them
Diagnose and fix the most common sitemap errors: couldn't fetch, 404 URLs, invalid XML, blocked by robots.txt, redirect chains, size limits, and encoding issues.
Your sitemap has an error. Maybe Google Search Console is showing a warning, maybe your URLs aren't getting indexed, or maybe a validation tool is flagging problems you don't understand. Sitemap errors range from trivial typos to structural problems that silently prevent indexing. Here's every common error, what causes it, and exactly how to fix it.
1. "Couldn't Fetch" in Google Search Console
What it looks like: You submit your sitemap in GSC and get the status "Couldn't fetch" or "General HTTP error."
What causes it:
- The sitemap URL returns a 4xx or 5xx HTTP status code
- The server is blocking Googlebot via firewall, rate limiting, or IP restrictions
- The sitemap URL has a typo (you submitted
sitemap.xmlbut the file is atsitemap_index.xml) - DNS isn't resolving correctly for Google's crawlers
- The server requires authentication to access the sitemap
How to fix it:
Verify the URL is accessible
Open the sitemap URL in your browser. If you get a 404, the URL is wrong. If you get a 403 or 500, the server is blocking access or has an error.
Test as Googlebot
Use GSC's URL Inspection tool to test the sitemap URL. This shows you exactly what Google sees when it tries to fetch the file.
Check server logs
Look for requests from Googlebot to your sitemap URL. If there are no requests, Google can't resolve your domain. If there are 403 responses, your firewall or CDN is blocking it.
Resubmit
After fixing the issue, resubmit the sitemap in GSC. It can take a few hours for Google to re-fetch and update the status.
2. 404 URLs in Your Sitemap
What it looks like: Google indexes your sitemap successfully, but reports that some URLs in it return 404 Not Found. The Coverage report shows "Submitted URL not found (404)."
What causes it:
- Products, pages, or posts were deleted but the sitemap still references them
- URL structure changed (e.g., you migrated from
/blog/post-titleto/articles/post-title) without updating the sitemap - A static sitemap file wasn't regenerated after content changes
- Typos in manually-created sitemaps
How to fix it:
Remove the dead URLs from your sitemap. If the content moved, either update the sitemap to use the new URLs or set up 301 redirects from the old URLs to the new ones. If you're using a static sitemap, regenerate it. If you're using a CMS that auto-generates sitemaps, the 404 URLs usually indicate a caching issue -- clear your sitemap cache and regenerate.
Redirects are a temporary fix
Setting up 301 redirects for deleted content is fine as a stopgap, but your sitemap should only contain URLs that return 200. Google prefers clean sitemaps with only canonical, live URLs.
3. URLs Blocked by robots.txt
What it looks like: GSC shows "Submitted URL blocked by robots.txt" for URLs that appear in your sitemap.
What causes it:
This is a direct contradiction: your sitemap says "please index this URL" while your robots.txt says "don't crawl this URL." Common causes include:
- A blanket
Disallowrule that's too broad (e.g.,Disallow: /collections/blocks all collection pages) - Staging or development robots.txt rules accidentally deployed to production
- CMS-generated sitemaps including admin or private URLs that robots.txt correctly blocks
How to fix it:
Either remove the URLs from your sitemap or remove the robots.txt restriction. Don't send mixed signals. If the URL should be indexed, allow crawling. If it shouldn't be indexed, remove it from the sitemap and use a noindex meta tag instead.
# Bad: blocking URLs that are in your sitemap
User-agent: *
Disallow: /collections/
# Better: allow crawling, use noindex on specific pages you don't want indexed
User-agent: *
Allow: /collections/
4. Invalid XML Syntax
What it looks like: Validation tools report XML parsing errors. Google may show "Sitemap is an HTML page" or "XML parsing error."
What causes it:
- Unescaped special characters in URLs (
&instead of&, spaces instead of%20) - Missing XML declaration or namespace
- Unclosed tags or incorrect nesting
- BOM (byte order mark) at the beginning of the file
- The server is returning HTML (a 404 page or login page) instead of XML
How to fix it:
The most common culprit is unescaped ampersands in URLs. This is wrong:
<!-- Invalid: unescaped & -->
<loc>https://example.com/products?color=red&size=large</loc>
This is correct:
<!-- Valid: & escaped as & -->
<loc>https://example.com/products?color=red&size=large</loc>
For other XML issues, run your sitemap through an XML validator. Fix any structural errors, ensure the file starts with <?xml version="1.0" encoding="UTF-8"?>, and verify the correct namespace is declared:
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
Validate your sitemap XML instantly
Catch syntax errors, encoding issues, and structural problems before they affect your indexing.
5. Wrong Encoding
What it looks like: Characters appear garbled in the sitemap, or validation tools report encoding errors.
What causes it:
- The file declares
UTF-8encoding but contains characters in a different encoding (likeISO-8859-1orWindows-1252) - URLs contain non-ASCII characters that aren't properly encoded
- The server's
Content-Typeheader specifies a different encoding than the XML declaration
How to fix it:
Ensure your sitemap file is saved as UTF-8 (without BOM). Non-ASCII characters in URLs should be percent-encoded:
<!-- Correct: non-ASCII characters percent-encoded -->
<loc>https://example.com/caf%C3%A9-menu</loc>
Also verify that your server sends the correct Content-Type header: Content-Type: application/xml; charset=UTF-8.
6. URLs with Redirects
What it looks like: URLs in your sitemap return 301 or 302 redirects instead of 200 OK.
What causes it:
- HTTP URLs in the sitemap that redirect to HTTPS
- URLs without
wwwredirecting towww(or vice versa) - Old URLs that have been redirected to new locations
- Trailing slash inconsistencies (
/pageredirecting to/page/)
How to fix it:
Every URL in your sitemap should return a 200 status code directly, without any redirects. Update the sitemap to use the final destination URLs:
| In Your Sitemap | Redirects To | Fix |
|---|---|---|
| http://example.com/page | https://example.com/page | Use the https:// URL |
| https://example.com/page | https://example.com/page/ | Use the trailing-slash URL |
| https://example.com/old-slug | https://example.com/new-slug | Use the new URL |
| https://www.example.com/page | https://example.com/page | Use the non-www URL |
Google will follow redirects, but a clean sitemap with only final URLs is a signal of a well-maintained site and avoids wasting crawl budget on redirect chains.
7. Exceeding Size Limits
What it looks like: Google rejects your sitemap or only partially processes it.
What causes it:
The sitemap protocol has two hard limits:
- 50,000 URLs per sitemap file
- 50 MB uncompressed file size per sitemap
Most sites hit the URL limit before the file size limit.
How to fix it:
Split your sitemap into multiple files and use a sitemap index:
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://example.com/sitemap-products-1.xml</loc>
</sitemap>
<sitemap>
<loc>https://example.com/sitemap-products-2.xml</loc>
</sitemap>
<sitemap>
<loc>https://example.com/sitemap-pages.xml</loc>
</sitemap>
</sitemapindex>
Each child sitemap follows the same 50,000 URL / 50 MB limits. You can also compress sitemaps with gzip (.xml.gz) to reduce file size, and Google handles gzipped sitemaps just fine.
8. Noindex Pages in Your Sitemap
What it looks like: GSC reports "Submitted URL marked 'noindex'" -- pages that are in your sitemap but have a noindex meta tag or X-Robots-Tag header.
What causes it:
- CMS auto-generates the sitemap and includes pages that have been manually set to noindex
- A developer added noindex to certain page templates without updating the sitemap
- A global noindex tag was applied during development and only partially removed
How to fix it:
This is a clear contradiction. Either the page should be indexed (remove the noindex tag) or it shouldn't (remove it from the sitemap). Decide which is correct and make them consistent.
If you're using a CMS, check whether there's a setting to exclude noindexed pages from the sitemap. Most SEO plugins (Yoast, Rank Math, etc.) handle this automatically.
9. HTTP vs HTTPS Mismatches
What it looks like: Your site runs on HTTPS, but your sitemap contains http:// URLs. Or you've verified the HTTPS version of your site in GSC but your sitemap references HTTP URLs.
What causes it:
- The sitemap was generated before the HTTPS migration and never updated
- The sitemap generation script uses a hardcoded
http://base URL - A CMS setting still references the HTTP version of the site
How to fix it:
Every URL in your sitemap must use the same protocol as your live site. If your site runs on HTTPS (which it should), every <loc> value must start with https://. Update your sitemap generation to use the correct protocol, and verify the sitemap URL itself is also served over HTTPS.
Quick Diagnosis Table
| Symptom | Most Likely Error | First Step |
|---|---|---|
| GSC says 'Couldn't fetch' | Sitemap URL returns non-200 status | Check the URL in your browser |
| 'Submitted URL not found (404)' | Dead URLs in sitemap | Remove or redirect the URLs |
| 'Blocked by robots.txt' | robots.txt contradicts sitemap | Align robots.txt with sitemap |
| 'Sitemap is an HTML page' | Server returning HTML, not XML | Check Content-Type header |
| Garbled characters | Encoding mismatch | Save as UTF-8, check headers |
| 'Submitted URL has redirect' | 301/302 URLs in sitemap | Use final destination URLs |
| Partial sitemap processing | Exceeds size limits | Split into sitemap index |
| 'Marked noindex' | noindex tag contradicts sitemap | Remove from sitemap or remove noindex |
| URLs not getting indexed | HTTP/HTTPS mismatch | Ensure all URLs use HTTPS |
Preventing Sitemap Errors
Most sitemap errors come from one root cause: the sitemap is out of sync with the actual state of your site. The fix is to make sitemap generation automatic and validate regularly.
- Use dynamic sitemaps that generate from your actual content database, not static files that go stale.
- Validate after every deployment by running your sitemap through a validator as part of your CI/CD pipeline.
- Monitor GSC regularly for new coverage errors related to your sitemap.
- Audit quarterly to catch drift between your sitemap and your live site.
A clean sitemap isn't a one-time task. It's ongoing maintenance, but the payoff is reliable indexing and no wasted crawl budget.
Related Articles
Every sitemap error has a fix. The trick is finding the error before Google does.
Validate your XML sitemap
Check your sitemap for errors, broken URLs, and indexing issues. Free instant validation.