Sitemap Validation and Troubleshooting Guide

A complete guide to sitemap validation and troubleshooting. Covers validation checks, common errors, Google Search Console issues, debugging XML sitemaps, and ongoing monitoring.

A sitemap is only useful if search engines can read it. A single malformed tag, a wrong URL, or an invalid XML character can cause Google to silently ignore your entire sitemap, leaving hundreds or thousands of pages undiscovered.

The frustrating part is that sitemaps fail quietly. Google does not send you an email when it cannot parse your sitemap. You find out weeks later when pages are not getting indexed, traffic drops, and you start digging through Search Console to figure out why.

This guide covers what validation actually checks, how to validate your sitemap using different tools, every common error you are likely to encounter and how to fix each one, and how to set up ongoing monitoring so problems do not slip by unnoticed.


What sitemap validation checks

Sitemap validation verifies that your sitemap file conforms to the sitemaps.org protocol and that the URLs it contains are valid and accessible. [1] A thorough validation covers several layers.

XML well-formedness

At the most basic level, your sitemap must be valid XML. This means:

  • Every opening tag has a corresponding closing tag
  • Tags are properly nested (not overlapping)
  • Special characters are properly escaped (& becomes &amp;, < becomes &lt;)
  • The XML declaration is present and correct
  • The document uses UTF-8 encoding

A single unescaped ampersand in a URL can make the entire XML document unparseable. See what is an XML sitemap for the format specification.

Schema validation

Beyond well-formedness, the sitemap must conform to the sitemap XML schema. This means:

  • The root element is <urlset> (for regular sitemaps) or <sitemapindex> (for sitemap index files)
  • The correct namespace is declared: xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
  • Each URL entry is wrapped in a <url> element
  • Each <url> contains a <loc> element with the full URL
  • Optional elements (<lastmod>, <changefreq>, <priority>) use the correct format if present
  • <lastmod> uses W3C Datetime format (YYYY-MM-DD or full ISO 8601)

See XML sitemap examples for correctly formatted samples.

URL validation

Each URL in the sitemap should be:

  • A fully qualified URL (including protocol: https://)
  • Accessible (returns a 200 status code, not a 404, 301, or 5xx)
  • On the same domain as the sitemap (or a verified associated domain)
  • Not blocked by robots.txt
  • Not marked as noindex (pages with noindex in the sitemap send conflicting signals)
  • Using the canonical URL (not a URL that redirects to another)

Size and count limits

The sitemaps protocol imposes limits: [1]

  • Maximum 50,000 URLs per sitemap file
  • Maximum 50 MB (uncompressed) per sitemap file
  • No limit on the number of sitemap files (use a sitemap index to reference multiple sitemaps)

How to validate your sitemap

Google Search Console

Google Search Console is the most authoritative validation tool because it shows you exactly how Google sees your sitemap.

To check your sitemap in Search Console:

  1. Navigate to Sitemaps in the left sidebar
  2. Enter your sitemap URL and click Submit (if not already submitted)
  3. Check the Status column for your submitted sitemaps

Search Console shows:

  • Success: The sitemap was fetched and parsed without errors
  • Has errors: The sitemap has issues that prevent some or all URLs from being processed
  • Couldn't fetch: Google could not download the sitemap at all

Click on a sitemap to see the number of discovered URLs versus the number of indexed URLs. A large gap between these numbers indicates issues with the URLs themselves (not the sitemap format). See how to submit a sitemap to Google for the complete process.

The Pages report (formerly Coverage report) provides more detail about why specific URLs are or are not indexed. Cross-reference this with your sitemap URLs to identify problems.

Online validation tools

Several online tools validate sitemaps without requiring Search Console access:

  • XML Sitemap Validator tools check XML well-formedness and schema compliance
  • Sitemap checkers crawl the URLs in your sitemap to verify they return 200 responses
  • SEO audit tools (Screaming Frog, Sitebulb, Ahrefs) include sitemap validation as part of broader site audits

For a comparison of available tools, see sitemap checker tools compared.

Command-line validation

For developers and automation, command-line tools provide quick validation:

xmllint (part of libxml2) validates XML well-formedness and schema compliance:

# Check well-formedness
xmllint --noout sitemap.xml

# Validate against the sitemap schema
xmllint --noout --schema sitemap.xsd sitemap.xml

curl combined with xmllint lets you validate a remote sitemap:

curl -s https://example.com/sitemap.xml | xmllint --noout -

Custom scripts can iterate through URLs in a sitemap and check each one's status code:

# Extract URLs and check status codes
curl -s https://example.com/sitemap.xml | \
  grep -oP '<loc>\K[^<]+' | \
  while read url; do
    code=$(curl -s -o /dev/null -w "%{http_code}" "$url")
    echo "$code $url"
  done

For a more complete approach, see the sitemap validator guide.

Automated validation in CI/CD

For dynamic sitemaps generated by your CMS or application, add validation to your deployment pipeline:

  1. After building the site, generate the sitemap
  2. Validate XML well-formedness
  3. Check that the URL count is within limits
  4. Spot-check a sample of URLs for 200 responses
  5. Compare against the previous sitemap to detect unexpected changes (large drops in URL count, new domains appearing)

This catches sitemap regressions before they reach production.

Google Search Console data is delayed by several days. If you submit a sitemap and check the next day, you may not see results yet. Give it 3 to 5 days before concluding that something is wrong.

Common errors and fixes

These are the sitemap errors that account for the vast majority of validation failures. Each one includes the symptoms, the cause, and the fix.

Could not fetch sitemap

Symptoms: Google Search Console shows "Couldn't fetch" status. The sitemap URL returns an error when accessed.

Common causes:

  • The sitemap URL is wrong (typo, wrong path)
  • The server returns a 404 for the sitemap URL
  • The sitemap is blocked by robots.txt. Check whether your robots.txt Disallow rules accidentally cover the sitemap path.
  • The server requires authentication to access the sitemap
  • A firewall or WAF is blocking Google's crawler (user-agent Googlebot)
  • The server is down or returning 5xx errors

Fixes:

  1. Verify the sitemap URL is correct by accessing it in your browser
  2. Check your robots.txt for rules that block the sitemap path
  3. Check server logs for requests from Googlebot to your sitemap URL
  4. Ensure the sitemap is accessible without authentication
  5. If using a CDN, ensure it does not block or cache-error the sitemap

404 URLs in sitemap

Symptoms: Search Console's Pages report shows URLs from your sitemap as "Not found (404)."

Common causes:

  • Pages were deleted but the sitemap was not updated
  • The sitemap was generated with incorrect URLs (wrong domain, wrong path structure)
  • URL format mismatch (trailing slash vs no trailing slash, www vs non-www)

Fixes:

  1. Remove 404 URLs from the sitemap
  2. If the pages moved, update the sitemap with the new URLs. See how to update a sitemap.
  3. If using a CMS, regenerate the sitemap to reflect current pages
  4. Set up redirects for moved pages and update the sitemap to use the final destination URLs

Invalid XML

Symptoms: Validation tools report XML parsing errors. Google cannot read the sitemap at all.

Common causes:

  • Unescaped special characters in URLs (especially &, <, >, ", ')
  • Missing closing tags
  • BOM (Byte Order Mark) at the start of the file
  • Non-UTF-8 characters
  • HTML content mixed into the XML file
  • PHP or server-side errors outputting text before the XML declaration

Fixes:

  1. Run the sitemap through an XML validator to identify the exact error location
  2. Escape special characters in URLs: & to &amp;, < to &lt;, etc.
  3. Ensure the file is saved as UTF-8 without BOM
  4. If the sitemap is dynamically generated, check for PHP warnings or errors that output text before the XML
  5. Verify the Content-Type header is text/xml or application/xml

Redirects in sitemap

Symptoms: URLs in the sitemap return 301 or 302 status codes instead of 200.

Why this matters: While Google can follow redirects, including redirect URLs in your sitemap sends a confusing signal. You are telling Google "index this URL" while simultaneously telling it "this URL has moved." Google may process it correctly, but it wastes crawl budget and delays indexing of the actual destination.

Fixes:

  1. Replace redirect URLs with their final destination URLs
  2. Use Redirect Tracer to find the final destination for each redirecting URL
  3. After a site migration, regenerate the sitemap entirely rather than trying to update individual URLs
  4. If redirects are temporary (302), decide whether the original URL or the redirect target belongs in the sitemap

Blocked by robots.txt

Symptoms: URLs appear in the sitemap but Search Console shows them as "Blocked by robots.txt" in the Pages report.

Why this matters: Including a URL in your sitemap tells Google "please crawl and index this." Blocking it in robots.txt tells Google "do not crawl this." These are contradictory instructions, and Google will follow robots.txt (it will not crawl the page). [2]

Fixes:

  1. If the page should be indexed, remove the robots.txt block. See how to fix blocked by robots.txt.
  2. If the page should not be indexed, remove it from the sitemap
  3. Audit your sitemap against your robots.txt to find all conflicts. See robots.txt and sitemaps for how they should work together.

Over size limit

Symptoms: Google reports that the sitemap exceeds the allowed size.

Causes: The sitemap contains more than 50,000 URLs or the uncompressed file is larger than 50 MB.

Fixes:

  1. Split the sitemap into multiple files, each under the limits
  2. Create a sitemap index file that references all the individual sitemaps
  3. Use gzip compression (.xml.gz). Google supports compressed sitemaps and the compressed size can be much smaller (the 50 MB limit applies to the uncompressed size)
  4. If you have more than 50,000 URLs, this is expected behavior for large sites. Multiple sitemaps via a sitemap index are the standard approach.

Duplicate URLs

Symptoms: The same URL appears multiple times in the sitemap, or URLs appear that differ only in trailing slash, protocol (http vs https), or www prefix.

Why this matters: Duplicate URLs waste your sitemap's 50,000-URL budget and can confuse search engines about which version is canonical.

Fixes:

  1. Deduplicate URLs in the sitemap
  2. Choose a canonical URL format (with or without trailing slash, www or non-www, always https) and only include canonical URLs
  3. Ensure your CMS is generating consistent URL formats
  4. See sitemap best practices for canonical URL conventions

The most common sitemap issue is not a formatting error. It is including URLs that should not be there: 404 pages, redirected URLs, noindex pages, and pages blocked by robots.txt. A sitemap should only contain URLs that you want indexed and that return a clean 200 response.

Noindex pages in sitemap

Symptoms: URLs in the sitemap have a noindex meta tag or X-Robots-Tag header.

Why this matters: Like the robots.txt conflict, this sends contradictory signals. The sitemap says "index this" while the page says "do not index this." Google will respect the noindex directive, but including these pages wastes crawl budget.

Fixes:

  1. Remove noindex pages from the sitemap
  2. If the page should be indexed, remove the noindex directive
  3. Audit your sitemap against your page-level directives regularly

Incorrect lastmod dates

Symptoms: The <lastmod> dates in the sitemap are wrong (all set to the same date, set to the current date on every generation, or using an invalid format).

Why this matters: Google uses <lastmod> as a hint about when to recrawl a page. If every page shows today's date, Google learns that your <lastmod> is unreliable and may ignore it entirely. [3]

Fixes:

  1. Set <lastmod> to the date the page content was actually last modified
  2. Use W3C Datetime format: YYYY-MM-DD (e.g., 2026-05-05) or full ISO 8601
  3. If you cannot determine accurate modification dates, omit <lastmod> entirely. No value is better than a wrong value.
  4. See sitemap priority and changefreq for how Google treats these optional elements

Platform-specific validation

WordPress sitemaps

WordPress has built-in sitemap generation since version 5.5. Popular SEO plugins (Yoast, Rank Math, All in One SEO) also generate sitemaps.

Common WordPress sitemap issues:

  • Plugin conflicts: two plugins generating sitemaps at different URLs
  • Caching plugins serving stale sitemaps
  • Permalink structure changes breaking sitemap URLs
  • Post type inclusion/exclusion settings not matching your indexing strategy

See the WordPress sitemap guide for configuration details.

Shopify sitemaps

Shopify automatically generates sitemaps at /sitemap.xml. You have limited control over the content.

Common Shopify sitemap issues:

  • Including pages you do not want indexed (admin-generated pages, filtered collection pages)
  • Sitemap not updating after product changes
  • Duplicate product URLs from multiple collections

See the Shopify sitemap guide for platform-specific fixes.

Dynamic sitemaps

Sites using frameworks like Next.js, Gatsby, or custom CMSs generate sitemaps dynamically.

Common dynamic sitemap issues:

  • Server errors during generation causing incomplete sitemaps
  • Memory issues with very large sitemaps
  • Race conditions where the sitemap is generated before all pages are built
  • Staging or development URLs leaking into production sitemaps

See dynamic sitemaps guide and dynamic sitemaps in Next.js for framework-specific approaches.

Ongoing sitemap monitoring

Validation is not a one-time task. Sitemaps change as your site changes, and each change can introduce errors.

What to monitor

  • Sitemap accessibility. Can Google (and your monitoring tool) still fetch the sitemap? Server changes, robots.txt updates, and CDN configuration changes can block access.
  • URL count trends. A sudden drop in the number of URLs might indicate a generation error. A sudden spike might indicate duplicate or unwanted URLs being added.
  • Error rates. Track the percentage of URLs in your sitemap that return non-200 status codes.
  • Index coverage. Monitor the gap between "Discovered" and "Indexed" URLs in Search Console.
  • Freshness. For sitemaps that should update regularly, verify that <lastmod> dates are changing.

Monitoring cadence

  • Weekly: Check Search Console for sitemap errors and index coverage trends
  • After deployments: Validate the sitemap after any code deployment that might affect URL structure or sitemap generation
  • After content changes: Large content additions or removals should trigger a sitemap review
  • Monthly: Full audit comparing sitemap URLs against actual site URLs to find gaps and stale entries

Integration with other monitoring

Your sitemap health is part of a broader SEO and site health picture. Coordinate sitemap monitoring with:

  • Redirect monitoring to catch redirecting URLs that should be removed from the sitemap
  • robots.txt monitoring to catch new blocks that conflict with sitemap URLs
  • Uptime monitoring to detect server issues that prevent sitemap fetching

Sitemap validation checklist

Use this checklist when validating or auditing your sitemap:

Format:

  • [ ] Valid XML (well-formed, passes xmllint)
  • [ ] Correct namespace declaration
  • [ ] UTF-8 encoding without BOM
  • [ ] Content-Type header is text/xml or application/xml
  • [ ] Under 50,000 URLs per file
  • [ ] Under 50 MB uncompressed per file

URLs:

  • [ ] All URLs return 200 status codes
  • [ ] No redirect URLs (301, 302)
  • [ ] No 404 or 410 URLs
  • [ ] All URLs use canonical format (consistent protocol, www, trailing slash)
  • [ ] No duplicate URLs
  • [ ] No URLs blocked by robots.txt
  • [ ] No URLs with noindex directive

Metadata:

  • [ ] lastmod dates are accurate (or omitted)
  • [ ] lastmod uses W3C Datetime format
  • [ ] changefreq and priority are reasonable (or omitted)

Accessibility:

  • [ ] Sitemap URL is accessible to Googlebot
  • [ ] Sitemap is referenced in robots.txt (Sitemap: directive)
  • [ ] Sitemap is submitted in Google Search Console
  • [ ] Sitemap is submitted in Bing Webmaster Tools

For sitemap indexes:

  • [ ] Index file references all child sitemaps
  • [ ] All child sitemap URLs are accessible
  • [ ] No child sitemap exceeds size limits

References

  1. sitemaps.org, "Sitemaps XML Format," https://www.sitemaps.org/protocol.html
  2. Google Search Central, "Learn about sitemaps," https://developers.google.com/search/docs/crawling-indexing/sitemaps/overview
  3. Google Search Central, "Build and submit a sitemap," https://developers.google.com/search/docs/crawling-indexing/sitemaps/build-sitemap
  4. Google Search Central, "Manage your sitemaps using the Sitemaps report," https://support.google.com/webmasters/answer/7451001
  5. Bing Webmaster Tools, "Bing Webmaster Guidelines: Sitemaps," https://www.bing.com/webmasters/help/sitemaps-3b5cf6ed
  6. W3C, "Date and Time Formats," https://www.w3.org/TR/NOTE-datetime
  7. Google Search Central, "Ask Googlebot: Sitemaps," YouTube, Google Search Central channel. https://www.youtube.com/googleSearchCentral

Generate and validate your sitemap

Create a valid XML sitemap for your site in seconds. Catch errors before search engines do.

Try Instant Sitemap