How to Validate Your XML Sitemap

Complete guide to XML sitemap validation: why it matters, what validators check, how to use online and CLI tools, interpreting results, and fixing common validation errors.

A sitemap with invalid XML is worse than no sitemap at all. Search engines will attempt to parse it, hit an error, and stop reading. Every URL after the error is invisible. You could have 10,000 perfectly valid URLs, but a single malformed tag on line 12 means everything after it gets ignored.

Validation catches these problems before search engines do.

Why Validation Matters

Your sitemap is an XML document, and XML has strict rules. HTML browsers are forgiving -- they'll render a page even with unclosed tags and missing attributes. XML parsers are not. A single syntax error causes the entire document to fail.

Here's what happens when you serve an invalid sitemap:

  • Google stops parsing at the error. URLs listed after the broken line are never discovered through the sitemap.
  • Google Search Console reports "Has errors." You might not notice for days or weeks.
  • Crawl budget is wasted. Google spends resources fetching a file it can't fully process.
  • New pages don't get indexed. If you added new URLs below the error, they're effectively hidden.

Validation takes seconds. Debugging a mysterious indexing problem takes hours.

What Validators Check

A proper sitemap validator goes beyond basic XML syntax. Here's the full scope:

XML Syntax

The foundation. The validator checks that your document is well-formed XML:

  • Every opening tag has a matching closing tag
  • Tags are properly nested (no overlapping elements)
  • Attribute values are quoted
  • Special characters are escaped (&amp; not &, &lt; not <)
  • The XML declaration is present and correct
  • Character encoding matches what's declared

URL Format

Each <loc> element must contain a valid, absolute URL:

  • URLs must include the protocol (https:// or http://)
  • URLs must be properly encoded (spaces as %20, special characters escaped)
  • URLs should not contain fragments (#section)
  • URLs must not exceed 2,048 characters

Sitemap Protocol Compliance

The validator checks conformance with the sitemaps.org protocol:

  • The <urlset> element includes the correct namespace
  • Optional elements (<lastmod>, <changefreq>, <priority>) use valid values
  • <lastmod> dates follow W3C Datetime format
  • <changefreq> uses one of the allowed values (always, hourly, daily, weekly, monthly, yearly, never)
  • <priority> is a decimal between 0.0 and 1.0

Size and Count Limits

The protocol defines hard limits:

  • Maximum 50,000 URLs per sitemap file
  • Maximum 50MB uncompressed file size
  • Sitemap index files also limited to 50,000 sitemap references

A validator will warn you when you're approaching these limits, not just when you exceed them.

HTTP Response Codes

Advanced validators go a step further and actually fetch the URLs in your sitemap:

  • Flag URLs that return 404 (not found)
  • Flag URLs that return 301/302 (redirects)
  • Flag URLs that return 500 (server errors)
  • Identify slow-responding URLs that might time out during crawling

Validate your sitemap now

Check XML syntax, URL validity, protocol compliance, and more. Instant results, no signup required.

Online Validators

Online validators are the fastest way to check a sitemap. Paste your sitemap URL or upload the file, and get results in seconds.

What to Look For in a Validator

Not all validators are equal. A good one should:

Check XML syntax thoroughly

Basic validators just confirm the file is well-formed XML. Better ones check against the sitemap XSD schema, catching protocol-specific issues that generic XML validators miss.

Validate URLs, not just format

The best validators actually fetch your URLs and report status codes. This catches 404s and redirects that look fine in the XML but fail in practice.

Handle sitemap indexes

If you submit a sitemap index, the validator should follow the references and validate each child sitemap individually.

Report errors clearly

A good validator tells you the line number, the specific element, and what's wrong. A bad one just says "invalid XML."

Handle large files

Some online validators choke on sitemaps with tens of thousands of URLs. Make sure the tool can handle your sitemap's size.

CLI and Developer Tools

For developers who want validation in their workflow, command-line tools integrate with build processes and CI/CD pipelines.

xmllint

The xmllint utility (part of libxml2) validates XML against a schema:

# Validate XML syntax
xmllint --noout sitemap.xml

# Validate against the sitemap schema
xmllint --noout --schema sitemap.xsd sitemap.xml

You'll need to download the sitemap XSD schema file from sitemaps.org. If xmllint returns no output, the file is valid. Errors are printed to stderr with line numbers.

Custom Scripts

For URL-level validation, a simple script can check every URL in your sitemap:

# Extract URLs and check status codes
grep -oP '<loc>\K[^<]+' sitemap.xml | while read url; do
  status=$(curl -o /dev/null -s -w "%{http_code}" "$url")
  if [ "$status" != "200" ]; then
    echo "[$status] $url"
  fi
done

This is slow for large sitemaps but catches real-world issues that XML validation alone misses.

Build Pipeline Integration

Add sitemap validation to your CI/CD pipeline so broken sitemaps never make it to production:

# Example GitHub Actions step
- name: Validate sitemap
  run: |
    xmllint --noout public/sitemap.xml
    echo "Sitemap XML is valid"

Interpreting Validation Results

Validators report different types of issues. Here's how to prioritize them:

SeverityIssue TypeImpactAction
CriticalMalformed XMLEntire sitemap unreadableFix immediately
CriticalWrong namespaceSearch engines may reject the fileFix immediately
High404 URLsWasted crawl budget, poor signalRemove from sitemap
HighRedirect URLsInefficient crawlingReplace with final URLs
MediumInvalid lastmod formatDates ignored by search enginesFix date format
MediumOver 50k URLsExcess URLs ignoredSplit into multiple sitemaps
LowMissing optional fieldsLess information for crawlersAdd if practical

Critical errors break everything. If the XML is malformed, nothing else matters. Fix these first.

High-severity issues degrade performance. 404s and redirects won't break your sitemap, but they signal to search engines that your sitemap isn't well-maintained. A sitemap full of dead links tells Google to trust it less.

Medium and low issues are improvements. Fix them when you can, but they won't cause indexing failures on their own.

Fixing Common Validation Errors

Unescaped Ampersands

The most common error. URLs with query parameters contain & characters that must be escaped in XML:

<!-- Invalid -->
<loc>https://example.com/search?q=shoes&color=red</loc>

<!-- Valid -->
<loc>https://example.com/search?q=shoes&amp;color=red</loc>

Missing XML Declaration

The file must start with the XML declaration. Some sitemap generators omit it:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  ...
</urlset>

Invalid Date Format in Lastmod

Dates must follow W3C Datetime format. These are all valid:

<lastmod>2025-06-15</lastmod>
<lastmod>2025-06-15T14:30:00+00:00</lastmod>
<lastmod>2025-06-15T14:30:00Z</lastmod>

These are not:

<lastmod>06/15/2025</lastmod>
<lastmod>June 15, 2025</lastmod>
<lastmod>2025-6-15</lastmod>

Relative URLs

Every URL in <loc> must be absolute, including the protocol and domain:

<!-- Invalid -->
<loc>/blog/my-post</loc>

<!-- Valid -->
<loc>https://example.com/blog/my-post</loc>

BOM (Byte Order Mark)

Some text editors add an invisible BOM character at the start of the file. XML parsers may reject files with a BOM before the XML declaration. If you're getting a "content before XML declaration" error, check for hidden characters at the start of the file.

How Often to Validate

Validate your sitemap:

  • After every deployment that changes site structure
  • After CMS updates that might affect sitemap generation
  • After plugin updates (especially SEO plugins that manage sitemaps)
  • Monthly as part of routine SEO maintenance
  • Whenever Google Search Console reports sitemap errors

The cost of validation is seconds. The cost of an invalid sitemap is days of lost indexing.


Validation is the cheapest insurance your sitemap will ever have.

Validate your XML sitemap

Check your sitemap for errors, broken URLs, and indexing issues. Free instant validation.