How to Check Your Sitemap
Step-by-step guide to checking your XML sitemap: browser inspection, validation tools, Google Search Console status, and fixing common issues like 404 URLs and encoding problems.
Your sitemap is the roadmap you hand to search engines. If it's broken, outdated, or full of errors, Google is working with bad directions. The result: pages that should be indexed aren't, and pages that shouldn't be indexed are.
Here's how to check your sitemap properly -- from a quick browser test to a full validation audit.
Start with the Basics: Can You Access It?
Before you check the contents, confirm the sitemap actually exists and loads.
Try the default URL
Open your browser and go to https://yoursite.com/sitemap.xml. Most sites use this standard location. If that doesn't work, try https://yoursite.com/sitemap_index.xml or https://yoursite.com/sitemap/.
Check your robots.txt
Open https://yoursite.com/robots.txt and look for a Sitemap: directive. This tells search engines where to find your sitemap, and it tells you too. If there's no sitemap line in robots.txt, search engines might not know it exists.
Confirm it loads without errors
The sitemap should display as structured XML in your browser. If you see a 404, 500, or blank page, the sitemap isn't being served correctly. If it loads but looks like a wall of unformatted text, that's fine -- it's still valid XML.
Sitemap index files
Large sites often use a sitemap index file that references multiple individual sitemaps. If you see <sitemapindex> as the root element instead of <urlset>, you're looking at an index file. Each <sitemap> entry inside it points to a separate sitemap you'll want to check individually.
Check the XML Structure
A sitemap that loads doesn't mean it's valid. Invalid XML will cause search engines to ignore the entire file.
What to look for:
- Proper XML declaration: The file should start with
<?xml version="1.0" encoding="UTF-8"?>. - Correct namespace: The
<urlset>element should includexmlns="http://www.sitemaps.org/schemas/sitemap/0.9". - Well-formed XML: Every opening tag needs a closing tag. No unclosed elements, no unescaped special characters.
- Valid encoding: Special characters like
&,<, and>must be XML-encoded (&,<,>). URLs with query parameters are a common source of encoding errors.
Example of a common encoding error:
<!-- Wrong: unescaped ampersand -->
<loc>https://example.com/page?id=1&lang=en</loc>
<!-- Correct: ampersand escaped -->
<loc>https://example.com/page?id=1&lang=en</loc>
Validate with a Tool
Manual inspection catches obvious problems, but you need a validator for a thorough check. A good sitemap validator will test XML syntax, URL format, response codes, and protocol compliance all at once.
XML syntax validation
URL format checking
<loc> entry contains a properly formatted, absolute URL with the correct protocol.HTTP status checking
Size and count limits
Protocol compliance
<lastmod> date formats and proper XML namespaces.Validate your sitemap instantly
Check your XML sitemap for errors, broken URLs, and protocol issues. Free instant validation.
Check Google Search Console
Google Search Console shows you how Google actually sees your sitemap -- not just whether it's valid XML, but whether Google has processed it and what it found.
Open the Sitemaps report
In Google Search Console, go to Indexing > Sitemaps. You'll see a list of sitemaps Google knows about for your property.
Check the status
Each sitemap shows a status: Success, Has errors, or Couldn't fetch. "Success" means Google was able to read the sitemap. It does not mean every URL is indexed.
Review discovered URLs
The "Discovered URLs" count tells you how many URLs Google found in the sitemap. Compare this to the number you expect. If Google found 50 URLs but your sitemap has 500, something is wrong -- possibly a parsing error that caused Google to stop reading partway through.
Cross-reference with the Pages report
Go to Indexing > Pages and filter by sitemap. This shows you which sitemap URLs are indexed, which are excluded, and why. Common exclusions include "Duplicate without user-selected canonical," "Crawled - currently not indexed," and "Page with redirect."
Common Issues and How to Spot Them
404 URLs in the Sitemap
Your sitemap should only contain URLs that return a 200 status code. If pages have been deleted or moved, their old URLs need to come out of the sitemap. A validator will flag these, but you can also spot them by checking the Pages report in Search Console for "Not found (404)" errors.
Redirect URLs
URLs in your sitemap should point to the final destination, not to a URL that redirects. If https://example.com/old-page redirects to https://example.com/new-page, the sitemap should contain the new URL. Search engines will follow the redirect, but it wastes crawl budget and signals a poorly maintained sitemap.
Non-Canonical URLs
Every URL in your sitemap should be the canonical version. If a page has a <link rel="canonical"> tag pointing to a different URL, the non-canonical URL shouldn't be in the sitemap. This is one of the most common sitemap mistakes and it confuses search engines about which version to index.
Mixed Protocols
If your site uses HTTPS, every URL in your sitemap should use HTTPS. Mixing http:// and https:// URLs is a red flag. This usually happens when the sitemap was generated before a site migrated to HTTPS and wasn't updated.
Invalid Date Formats
The <lastmod> element should use W3C Datetime format. Valid formats include 2025-01-15 (date only) or 2025-01-15T10:30:00+00:00 (full datetime with timezone). If you see dates like January 15, 2025 or 01/15/2025, they're invalid and will be ignored.
Checking Sitemaps for Large Sites
If your site has thousands of pages, manual checking isn't practical. Here's a more systematic approach:
| Check | Small Sites (<100 URLs) | Large Sites (1,000+ URLs) |
|---|---|---|
| XML validation | Browser + online validator | Automated validator tool |
| URL status codes | Spot-check manually | Crawl all URLs with a tool |
| Canonical matching | Manual review | Script comparison against canonical tags |
| Freshness | Compare to CMS | Monitor lastmod dates automatically |
| Ongoing monitoring | Monthly manual check | Automated validation on schedule |
For large sites, one-off checks aren't enough. You need a process that catches sitemap problems before search engines do. That means either scripting your own checks or using a monitoring tool that validates your sitemap regularly.
Quick Checklist
Before you move on, run through this list:
- The sitemap loads at its expected URL and returns a 200 status
- The
Sitemap:directive in robots.txt points to the correct URL - The XML is well-formed with no syntax errors
- All URLs use HTTPS (if your site is HTTPS)
- No URLs return 404, 500, or 3xx status codes
- All URLs are the canonical version
<lastmod>dates use valid W3C format- The sitemap has fewer than 50,000 URLs and is under 50MB
- Google Search Console shows "Success" status
- The discovered URL count matches your expectations
A sitemap that passes all of these checks is doing its job. One that fails any of them is actively hurting your site's ability to get indexed.
Related Articles
A sitemap that hasn't been checked is a sitemap you're hoping works.
Validate your XML sitemap
Check your sitemap for errors, broken URLs, and indexing issues. Free instant validation.