Sitemap Best Practices

XML sitemap best practices for SEO: canonical URLs only, size limits, accurate lastmod dates, sitemap indexes, excluding noindex pages, and ongoing monitoring.

A sitemap is a simple concept -- a list of URLs you want search engines to know about. But simple doesn't mean there aren't ways to get it wrong. A well-maintained sitemap helps search engines discover and index your content efficiently. A poorly maintained one wastes crawl budget, sends mixed signals, and can actively harm your SEO.

These are the practices that actually matter, based on how search engines use sitemaps in practice.

Only Include Canonical URLs

This is the single most important rule. Every URL in your sitemap should be the canonical version of that page.

If a page at https://example.com/products/shoes has a canonical tag pointing to https://example.com/products/shoes?ref=homepage, neither URL should be in the sitemap -- only the canonical version should be. If the canonical tag on both points to https://example.com/products/shoes, then that's the URL that belongs in the sitemap.

Why it matters: When your sitemap includes non-canonical URLs, you're telling Google "index this URL" while simultaneously telling Google (via the canonical tag) "actually, index this other URL instead." These conflicting signals confuse crawlers and can delay indexing of the correct page.

How to check: Compare your sitemap URLs against the canonical tags on each page. If they don't match, the sitemap is wrong.

URL variations to watch for

Common sources of non-canonical URLs in sitemaps: trailing slash vs. no trailing slash (/page/ vs /page), www vs. non-www, HTTP vs. HTTPS, uppercase vs. lowercase paths, and URLs with tracking parameters. Pick one version, set it as canonical, and use only that version in the sitemap.

Keep Under 50,000 URLs Per File

The sitemap protocol allows a maximum of 50,000 URLs per sitemap file and a maximum uncompressed file size of 50MB. But just because you can have 50,000 URLs in one file doesn't mean you should.

Better approach: Split your sitemap into logical groups:

  • sitemap-pages.xml -- Static pages
  • sitemap-posts.xml -- Blog posts
  • sitemap-products.xml -- Product pages
  • sitemap-categories.xml -- Category and tag pages

Then reference them all from a sitemap index file. This makes your sitemaps easier to manage, easier to debug, and easier for search engines to process incrementally.

Use a Sitemap Index for Large Sites

If your site has more than a few hundred URLs, use a sitemap index file. A sitemap index references multiple individual sitemaps:

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>https://example.com/sitemap-pages.xml</loc>
    <lastmod>2025-06-15</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://example.com/sitemap-posts.xml</loc>
    <lastmod>2025-06-14</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://example.com/sitemap-products.xml</loc>
    <lastmod>2025-06-15</lastmod>
  </sitemap>
</sitemapindex>

Benefits of sitemap indexes:

Faster processing

Search engines can process smaller files more quickly and re-crawl only the sitemaps that have changed (based on lastmod).

Easier debugging

If there's an error in your product sitemap, it doesn't affect your blog sitemap. Problems are isolated.

Better organization

Separating by content type makes it obvious what's in each file and makes maintenance straightforward.

Scalability

Add new content types by adding new sitemap files to the index. No need to restructure existing files.

Use Accurate Lastmod Dates

The <lastmod> element is the most misused optional field in sitemaps. When accurate, it's genuinely useful -- it tells search engines which pages have changed so they can prioritize re-crawling. When inaccurate, it's actively harmful.

Rules for lastmod:

  • Set it to the date the page content was meaningfully changed
  • Don't set every URL to today's date (Google will ignore all your lastmod values)
  • Don't update lastmod for trivial changes (template updates, CSS changes, footer edits)
  • Use W3C Datetime format: 2025-06-15 or 2025-06-15T14:30:00+00:00
  • Let your CMS handle this automatically based on actual content modification dates

Google's John Mueller has confirmed that Google uses lastmod when it's reliable. If your lastmod dates consistently match real content changes, Google will trust them. If they don't, Google learns to ignore them -- not just for individual URLs, but for your entire sitemap.

Validate your sitemap's lastmod dates

Check that your sitemap uses valid date formats and follows protocol requirements.

Don't Include Noindex Pages

If a page has a <meta name="robots" content="noindex"> tag or an X-Robots-Tag: noindex header, it should not be in your sitemap.

Including noindex pages in your sitemap creates a direct contradiction: the sitemap says "please index this," while the page itself says "don't index this." Google will respect the noindex directive, but the conflicting signal is unnecessary noise.

Common noindex pages that end up in sitemaps:

  • Thank-you/confirmation pages
  • Internal search result pages
  • Login and account pages
  • Paginated archive pages (sometimes)
  • Tag pages with thin content
  • Staging or draft pages that leaked into production

Audit your sitemap against your noindex directives regularly. Many CMS plugins add new pages to the sitemap by default, even if they're set to noindex elsewhere.

Match URLs to Their Canonical Versions Exactly

URL matching is case-sensitive and protocol-sensitive. These are all different URLs to a search engine:

https://example.com/About-Us
https://example.com/about-us
https://Example.com/about-us
http://example.com/about-us
https://www.example.com/about-us
https://example.com/about-us/

Your sitemap should use the exact URL format that matches your canonical tags. If your site uses lowercase URLs with a trailing slash and HTTPS on the non-www domain, every URL in the sitemap should follow that pattern.

URL AspectPick OneUse Consistently
ProtocolHTTPS (always)Every URL starts with https://
www vs non-wwwEither, but choose oneAll URLs use the same
Trailing slashEither, but choose oneAll URLs follow the same pattern
CaseLowercase (recommended)All paths use lowercase

Don't Include Redirect URLs

Every URL in your sitemap should return a 200 status code. If a URL redirects (301 or 302) to another page, replace it with the destination URL.

Redirect URLs in sitemaps are a signal of poor maintenance. Search engines will follow the redirect, but they'll also note that your sitemap contains stale URLs. Over time, this erodes the trust signals that a well-maintained sitemap provides.

Keep the Sitemap Updated After Content Changes

A stale sitemap is almost as bad as no sitemap. If you published 50 new blog posts last month and none of them are in the sitemap, you're relying entirely on internal linking and crawling for discovery. That's slower and less reliable.

Update triggers:

  • New page published
  • Page deleted or unpublished
  • URL structure changed
  • Major content update (update lastmod)
  • Site migration (all URLs change)

Best approach: Automate it. Use your CMS or framework's built-in sitemap generation so updates happen without manual intervention.

Monitor for Errors

Creating a correct sitemap is step one. Keeping it correct is the ongoing challenge.

1

Check Google Search Console monthly

Go to Indexing > Sitemaps and confirm your sitemap shows "Success" status. Check the discovered URL count -- unexpected drops or spikes indicate a problem.

2

Validate after deployments

Run your sitemap through a validator after any deployment that changes site structure, URL patterns, or the sitemap generation logic.

3

Audit URLs quarterly

Compare your sitemap URLs against your actual site. Look for pages on the site that aren't in the sitemap and URLs in the sitemap that no longer exist on the site.

4

Set up automated monitoring

Use a tool that validates your sitemap on a regular schedule and alerts you when errors appear. Catching a broken sitemap the day it breaks is better than discovering it a month later.

Quick Reference

PracticeDoDon't
URL selectionInclude only canonical, indexable URLsInclude every URL on the site
File sizeSplit into multiple files via sitemap indexCram everything into one giant file
Lastmod datesSet based on actual content changesSet all dates to today
Noindex pagesExclude from sitemapInclude and hope for the best
Redirect URLsReplace with destination URLsLeave old URLs in place
URL formatMatch canonical tags exactlyMix www/non-www, http/https
UpdatesAutomate sitemap generationEdit the XML file manually
MonitoringValidate regularly and after deploysSet it and forget it

A sitemap that follows these practices is a sitemap that search engines can trust. And a trusted sitemap means faster discovery, more efficient crawling, and better indexing of the pages that matter to your business.


Best practices aren't optional. They're the difference between a sitemap that helps and one that doesn't.

Validate your XML sitemap

Check your sitemap for errors, broken URLs, and indexing issues. Free instant validation.