Sitemap Best Practices
XML sitemap best practices for SEO: canonical URLs only, size limits, accurate lastmod dates, sitemap indexes, excluding noindex pages, and ongoing monitoring.
A sitemap is a simple concept -- a list of URLs you want search engines to know about. But simple doesn't mean there aren't ways to get it wrong. A well-maintained sitemap helps search engines discover and index your content efficiently. A poorly maintained one wastes crawl budget, sends mixed signals, and can actively harm your SEO.
These are the practices that actually matter, based on how search engines use sitemaps in practice.
Only Include Canonical URLs
This is the single most important rule. Every URL in your sitemap should be the canonical version of that page.
If a page at https://example.com/products/shoes has a canonical tag pointing to https://example.com/products/shoes?ref=homepage, neither URL should be in the sitemap -- only the canonical version should be. If the canonical tag on both points to https://example.com/products/shoes, then that's the URL that belongs in the sitemap.
Why it matters: When your sitemap includes non-canonical URLs, you're telling Google "index this URL" while simultaneously telling Google (via the canonical tag) "actually, index this other URL instead." These conflicting signals confuse crawlers and can delay indexing of the correct page.
How to check: Compare your sitemap URLs against the canonical tags on each page. If they don't match, the sitemap is wrong.
URL variations to watch for
Common sources of non-canonical URLs in sitemaps: trailing slash vs. no trailing slash (/page/ vs /page), www vs. non-www, HTTP vs. HTTPS, uppercase vs. lowercase paths, and URLs with tracking parameters. Pick one version, set it as canonical, and use only that version in the sitemap.
Keep Under 50,000 URLs Per File
The sitemap protocol allows a maximum of 50,000 URLs per sitemap file and a maximum uncompressed file size of 50MB. But just because you can have 50,000 URLs in one file doesn't mean you should.
Better approach: Split your sitemap into logical groups:
sitemap-pages.xml-- Static pagessitemap-posts.xml-- Blog postssitemap-products.xml-- Product pagessitemap-categories.xml-- Category and tag pages
Then reference them all from a sitemap index file. This makes your sitemaps easier to manage, easier to debug, and easier for search engines to process incrementally.
Use a Sitemap Index for Large Sites
If your site has more than a few hundred URLs, use a sitemap index file. A sitemap index references multiple individual sitemaps:
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://example.com/sitemap-pages.xml</loc>
<lastmod>2025-06-15</lastmod>
</sitemap>
<sitemap>
<loc>https://example.com/sitemap-posts.xml</loc>
<lastmod>2025-06-14</lastmod>
</sitemap>
<sitemap>
<loc>https://example.com/sitemap-products.xml</loc>
<lastmod>2025-06-15</lastmod>
</sitemap>
</sitemapindex>
Benefits of sitemap indexes:
Faster processing
Easier debugging
Better organization
Scalability
Use Accurate Lastmod Dates
The <lastmod> element is the most misused optional field in sitemaps. When accurate, it's genuinely useful -- it tells search engines which pages have changed so they can prioritize re-crawling. When inaccurate, it's actively harmful.
Rules for lastmod:
- Set it to the date the page content was meaningfully changed
- Don't set every URL to today's date (Google will ignore all your lastmod values)
- Don't update lastmod for trivial changes (template updates, CSS changes, footer edits)
- Use W3C Datetime format:
2025-06-15or2025-06-15T14:30:00+00:00 - Let your CMS handle this automatically based on actual content modification dates
Google's John Mueller has confirmed that Google uses lastmod when it's reliable. If your lastmod dates consistently match real content changes, Google will trust them. If they don't, Google learns to ignore them -- not just for individual URLs, but for your entire sitemap.
Validate your sitemap's lastmod dates
Check that your sitemap uses valid date formats and follows protocol requirements.
Don't Include Noindex Pages
If a page has a <meta name="robots" content="noindex"> tag or an X-Robots-Tag: noindex header, it should not be in your sitemap.
Including noindex pages in your sitemap creates a direct contradiction: the sitemap says "please index this," while the page itself says "don't index this." Google will respect the noindex directive, but the conflicting signal is unnecessary noise.
Common noindex pages that end up in sitemaps:
- Thank-you/confirmation pages
- Internal search result pages
- Login and account pages
- Paginated archive pages (sometimes)
- Tag pages with thin content
- Staging or draft pages that leaked into production
Audit your sitemap against your noindex directives regularly. Many CMS plugins add new pages to the sitemap by default, even if they're set to noindex elsewhere.
Match URLs to Their Canonical Versions Exactly
URL matching is case-sensitive and protocol-sensitive. These are all different URLs to a search engine:
https://example.com/About-Us
https://example.com/about-us
https://Example.com/about-us
http://example.com/about-us
https://www.example.com/about-us
https://example.com/about-us/
Your sitemap should use the exact URL format that matches your canonical tags. If your site uses lowercase URLs with a trailing slash and HTTPS on the non-www domain, every URL in the sitemap should follow that pattern.
| URL Aspect | Pick One | Use Consistently |
|---|---|---|
| Protocol | HTTPS (always) | Every URL starts with https:// |
| www vs non-www | Either, but choose one | All URLs use the same |
| Trailing slash | Either, but choose one | All URLs follow the same pattern |
| Case | Lowercase (recommended) | All paths use lowercase |
Don't Include Redirect URLs
Every URL in your sitemap should return a 200 status code. If a URL redirects (301 or 302) to another page, replace it with the destination URL.
Redirect URLs in sitemaps are a signal of poor maintenance. Search engines will follow the redirect, but they'll also note that your sitemap contains stale URLs. Over time, this erodes the trust signals that a well-maintained sitemap provides.
Keep the Sitemap Updated After Content Changes
A stale sitemap is almost as bad as no sitemap. If you published 50 new blog posts last month and none of them are in the sitemap, you're relying entirely on internal linking and crawling for discovery. That's slower and less reliable.
Update triggers:
- New page published
- Page deleted or unpublished
- URL structure changed
- Major content update (update lastmod)
- Site migration (all URLs change)
Best approach: Automate it. Use your CMS or framework's built-in sitemap generation so updates happen without manual intervention.
Monitor for Errors
Creating a correct sitemap is step one. Keeping it correct is the ongoing challenge.
Check Google Search Console monthly
Go to Indexing > Sitemaps and confirm your sitemap shows "Success" status. Check the discovered URL count -- unexpected drops or spikes indicate a problem.
Validate after deployments
Run your sitemap through a validator after any deployment that changes site structure, URL patterns, or the sitemap generation logic.
Audit URLs quarterly
Compare your sitemap URLs against your actual site. Look for pages on the site that aren't in the sitemap and URLs in the sitemap that no longer exist on the site.
Set up automated monitoring
Use a tool that validates your sitemap on a regular schedule and alerts you when errors appear. Catching a broken sitemap the day it breaks is better than discovering it a month later.
Quick Reference
| Practice | Do | Don't |
|---|---|---|
| URL selection | Include only canonical, indexable URLs | Include every URL on the site |
| File size | Split into multiple files via sitemap index | Cram everything into one giant file |
| Lastmod dates | Set based on actual content changes | Set all dates to today |
| Noindex pages | Exclude from sitemap | Include and hope for the best |
| Redirect URLs | Replace with destination URLs | Leave old URLs in place |
| URL format | Match canonical tags exactly | Mix www/non-www, http/https |
| Updates | Automate sitemap generation | Edit the XML file manually |
| Monitoring | Validate regularly and after deploys | Set it and forget it |
A sitemap that follows these practices is a sitemap that search engines can trust. And a trusted sitemap means faster discovery, more efficient crawling, and better indexing of the pages that matter to your business.
Related Articles
Best practices aren't optional. They're the difference between a sitemap that helps and one that doesn't.
Validate your XML sitemap
Check your sitemap for errors, broken URLs, and indexing issues. Free instant validation.