How to Check Your Sitemap

Step-by-step guide to checking your XML sitemap: browser inspection, validation tools, Google Search Console status, and fixing common issues like 404 URLs and encoding problems.

Your sitemap is the roadmap you hand to search engines. If it's broken, outdated, or full of errors, Google is working with bad directions. The result: pages that should be indexed aren't, and pages that shouldn't be indexed are.

Here's how to check your sitemap properly -- from a quick browser test to a full validation audit.

Start with the Basics: Can You Access It?

Before you check the contents, confirm the sitemap actually exists and loads.

1

Try the default URL

Open your browser and go to https://yoursite.com/sitemap.xml. Most sites use this standard location. If that doesn't work, try https://yoursite.com/sitemap_index.xml or https://yoursite.com/sitemap/.

2

Check your robots.txt

Open https://yoursite.com/robots.txt and look for a Sitemap: directive. This tells search engines where to find your sitemap, and it tells you too. If there's no sitemap line in robots.txt, search engines might not know it exists.

3

Confirm it loads without errors

The sitemap should display as structured XML in your browser. If you see a 404, 500, or blank page, the sitemap isn't being served correctly. If it loads but looks like a wall of unformatted text, that's fine -- it's still valid XML.

Sitemap index files

Large sites often use a sitemap index file that references multiple individual sitemaps. If you see <sitemapindex> as the root element instead of <urlset>, you're looking at an index file. Each <sitemap> entry inside it points to a separate sitemap you'll want to check individually.

Check the XML Structure

A sitemap that loads doesn't mean it's valid. Invalid XML will cause search engines to ignore the entire file.

What to look for:

  • Proper XML declaration: The file should start with <?xml version="1.0" encoding="UTF-8"?>.
  • Correct namespace: The <urlset> element should include xmlns="http://www.sitemaps.org/schemas/sitemap/0.9".
  • Well-formed XML: Every opening tag needs a closing tag. No unclosed elements, no unescaped special characters.
  • Valid encoding: Special characters like &, <, and > must be XML-encoded (&amp;, &lt;, &gt;). URLs with query parameters are a common source of encoding errors.

Example of a common encoding error:

<!-- Wrong: unescaped ampersand -->
<loc>https://example.com/page?id=1&lang=en</loc>

<!-- Correct: ampersand escaped -->
<loc>https://example.com/page?id=1&amp;lang=en</loc>

Validate with a Tool

Manual inspection catches obvious problems, but you need a validator for a thorough check. A good sitemap validator will test XML syntax, URL format, response codes, and protocol compliance all at once.

XML syntax validation

Catches malformed XML, missing closing tags, and encoding errors that make the entire sitemap unreadable.

URL format checking

Verifies that every <loc> entry contains a properly formatted, absolute URL with the correct protocol.

HTTP status checking

Fetches each URL and flags 404s, 301 redirects, 500 errors, and other non-200 responses that shouldn't be in your sitemap.

Size and count limits

Confirms your sitemap stays under the 50,000 URL limit and the 50MB uncompressed file size limit defined in the protocol.

Protocol compliance

Checks that your sitemap follows the sitemaps.org protocol, including valid <lastmod> date formats and proper XML namespaces.

Validate your sitemap instantly

Check your XML sitemap for errors, broken URLs, and protocol issues. Free instant validation.

Check Google Search Console

Google Search Console shows you how Google actually sees your sitemap -- not just whether it's valid XML, but whether Google has processed it and what it found.

1

Open the Sitemaps report

In Google Search Console, go to Indexing > Sitemaps. You'll see a list of sitemaps Google knows about for your property.

2

Check the status

Each sitemap shows a status: Success, Has errors, or Couldn't fetch. "Success" means Google was able to read the sitemap. It does not mean every URL is indexed.

3

Review discovered URLs

The "Discovered URLs" count tells you how many URLs Google found in the sitemap. Compare this to the number you expect. If Google found 50 URLs but your sitemap has 500, something is wrong -- possibly a parsing error that caused Google to stop reading partway through.

4

Cross-reference with the Pages report

Go to Indexing > Pages and filter by sitemap. This shows you which sitemap URLs are indexed, which are excluded, and why. Common exclusions include "Duplicate without user-selected canonical," "Crawled - currently not indexed," and "Page with redirect."

Common Issues and How to Spot Them

404 URLs in the Sitemap

Your sitemap should only contain URLs that return a 200 status code. If pages have been deleted or moved, their old URLs need to come out of the sitemap. A validator will flag these, but you can also spot them by checking the Pages report in Search Console for "Not found (404)" errors.

Redirect URLs

URLs in your sitemap should point to the final destination, not to a URL that redirects. If https://example.com/old-page redirects to https://example.com/new-page, the sitemap should contain the new URL. Search engines will follow the redirect, but it wastes crawl budget and signals a poorly maintained sitemap.

Non-Canonical URLs

Every URL in your sitemap should be the canonical version. If a page has a <link rel="canonical"> tag pointing to a different URL, the non-canonical URL shouldn't be in the sitemap. This is one of the most common sitemap mistakes and it confuses search engines about which version to index.

Mixed Protocols

If your site uses HTTPS, every URL in your sitemap should use HTTPS. Mixing http:// and https:// URLs is a red flag. This usually happens when the sitemap was generated before a site migrated to HTTPS and wasn't updated.

Invalid Date Formats

The <lastmod> element should use W3C Datetime format. Valid formats include 2025-01-15 (date only) or 2025-01-15T10:30:00+00:00 (full datetime with timezone). If you see dates like January 15, 2025 or 01/15/2025, they're invalid and will be ignored.

Checking Sitemaps for Large Sites

If your site has thousands of pages, manual checking isn't practical. Here's a more systematic approach:

CheckSmall Sites (<100 URLs)Large Sites (1,000+ URLs)
XML validationBrowser + online validatorAutomated validator tool
URL status codesSpot-check manuallyCrawl all URLs with a tool
Canonical matchingManual reviewScript comparison against canonical tags
FreshnessCompare to CMSMonitor lastmod dates automatically
Ongoing monitoringMonthly manual checkAutomated validation on schedule

For large sites, one-off checks aren't enough. You need a process that catches sitemap problems before search engines do. That means either scripting your own checks or using a monitoring tool that validates your sitemap regularly.

Quick Checklist

Before you move on, run through this list:

  • The sitemap loads at its expected URL and returns a 200 status
  • The Sitemap: directive in robots.txt points to the correct URL
  • The XML is well-formed with no syntax errors
  • All URLs use HTTPS (if your site is HTTPS)
  • No URLs return 404, 500, or 3xx status codes
  • All URLs are the canonical version
  • <lastmod> dates use valid W3C format
  • The sitemap has fewer than 50,000 URLs and is under 50MB
  • Google Search Console shows "Success" status
  • The discovered URL count matches your expectations

A sitemap that passes all of these checks is doing its job. One that fails any of them is actively hurting your site's ability to get indexed.


A sitemap that hasn't been checked is a sitemap you're hoping works.

Validate your XML sitemap

Check your sitemap for errors, broken URLs, and indexing issues. Free instant validation.