Sitemap XML Structure: Tags, Attributes, and Format
A detailed breakdown of XML sitemap structure: every tag, attribute, and formatting rule explained. Covers urlset, url, loc, lastmod, changefreq, priority, and namespace extensions.
An XML sitemap follows a specific structure defined by the sitemaps.org protocol. Understanding each element helps you create valid sitemaps, debug formatting errors, and make informed decisions about which optional tags to include. For a general overview of what sitemaps do and why they matter, see our XML sitemap guide.
This guide breaks down every component of the XML sitemap format, from the XML declaration to namespace extensions.
The Complete Structure
Here is a fully featured XML sitemap with all standard tags:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/page/</loc>
<lastmod>2026-06-09T14:30:00+00:00</lastmod>
<changefreq>weekly</changefreq>
<priority>0.8</priority>
</url>
</urlset>
Let's examine each part.
XML Declaration
<?xml version="1.0" encoding="UTF-8"?>
Every XML sitemap starts with this declaration. It specifies:
- XML version: Always
1.0for sitemaps - Encoding: Must be
UTF-8. This ensures special characters in URLs (accented characters, non-Latin scripts) are handled correctly.
This line is required. A sitemap without it may still work in some parsers, but it is technically invalid XML.
The <urlset> Element
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
...
</urlset>
The <urlset> element is the root container for all URL entries. It wraps the entire sitemap content.
The xmlns attribute
The xmlns (XML namespace) attribute declares that this document follows the sitemaps.org protocol version 0.9. This is required. Without it, search engines cannot identify the file as a valid sitemap.
The value must be exactly http://www.sitemaps.org/schemas/sitemap/0.9. Do not change the version number or modify the URL.
Additional namespaces
If you use sitemap extensions (image, video, news, hreflang), you declare additional namespaces on the <urlset> element:
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:image="http://www.google.com/schemas/sitemap-image/1.1"
xmlns:video="http://www.google.com/schemas/sitemap-video/1.1"
xmlns:news="http://www.google.com/schemas/sitemap-news/0.9"
xmlns:xhtml="http://www.w3.org/1999/xhtml">
You only need to declare the namespaces you actually use. Including unused namespaces is not an error, but it adds unnecessary clutter.
The <url> Element
<url>
<loc>https://example.com/page/</loc>
<lastmod>2026-06-09</lastmod>
<changefreq>weekly</changefreq>
<priority>0.8</priority>
</url>
Each <url> element represents one page on your site. A sitemap can contain up to 50,000 <url> entries.
The <url> element contains one required child element (<loc>) and three optional ones (<lastmod>, <changefreq>, <priority>).
The <loc> Tag (Required)
<loc>https://example.com/page/</loc>
The <loc> tag specifies the URL of the page. This is the only required tag within a <url> entry.
Rules for <loc>
Must be an absolute URL. Include the full protocol, domain, and path. https://example.com/page/ is correct. /page/ is not.
Must match the canonical URL. If your page uses a canonical tag pointing to https://example.com/page/, the <loc> value should be the same URL. Do not list non-canonical URLs.
Must use the correct protocol. If your site serves over HTTPS, use https://. Do not use http:// for HTTPS sites.
Must be URL-encoded. Special characters need proper encoding:
| Character | Encoding |
|---|---|
| Ampersand (&) | & |
| Single quote (') | ' |
| Double quote (") | " |
| Greater than (>) | > |
| Less than (<) | < |
For example, a URL with a query parameter: https://example.com/search?q=test&page=2
Must be under 2,048 characters. This is a practical limit. URLs longer than this may cause issues with some crawlers and browsers.
Trailing slashes matter. https://example.com/page/ and https://example.com/page are different URLs. Use whichever version your site serves as canonical.
The <lastmod> Tag (Optional)
<lastmod>2026-06-09</lastmod>
The <lastmod> tag indicates when the page was last modified. It helps search engines decide whether to recrawl a page.
Date formats
<lastmod> accepts W3C Datetime format at various levels of precision:
| Format | Example |
|---|---|
| Date only | 2026-06-09 |
| Date and time with timezone | 2026-06-09T14:30:00+00:00 |
| Date and time in UTC | 2026-06-09T14:30:00Z |
The date-only format (YYYY-MM-DD) is the most common and works well for most sites. The full datetime format is useful when precision matters (news sites, frequently updated pages).
Best practices for lastmod
Set it to the actual modification date. If the page content was last changed on March 15, set lastmod to 2026-03-15. Do not set it to the current date on every sitemap rebuild. Google has stated that inaccurate lastmod values cause it to ignore the tag entirely for your site.
Update it when content changes. If you edit a blog post, update its lastmod date. If you fix a typo, that counts too, though search engines will not treat a minor change differently from a major one.
Do not use it for cosmetic changes. If your site template changes but the page content stays the same, do not update lastmod. The tag is about content changes, not design changes.
The <changefreq> Tag (Optional)
<changefreq>weekly</changefreq>
The <changefreq> tag suggests how frequently the page is likely to change. Valid values:
| Value | Meaning |
|---|---|
| always | Changes every time it is accessed |
| hourly | Changes every hour |
| daily | Changes every day |
| weekly | Changes every week |
| monthly | Changes every month |
| yearly | Changes every year |
| never | Archived content that will not change |
Does Google use changefreq?
Google has stated that it largely ignores <changefreq>. Google determines crawl frequency based on its own analysis of how often your pages actually change, not on what you declare in the sitemap. Bing has made similar statements.
You can include it if you want, but do not expect it to influence crawl behavior. <lastmod> is a more useful signal.
The <priority> Tag (Optional)
<priority>0.8</priority>
The <priority> tag suggests the relative importance of a page compared to other pages on your site. Values range from 0.0 (least important) to 1.0 (most important). The default is 0.5.
Does Google use priority?
No. Google ignores the <priority> tag. It determines page importance through its own algorithms (PageRank, content quality, user engagement signals). Setting priority to 1.0 on all your pages does not help.
Many sitemap generators include priority by default. It is not harmful, but it is also not useful. If you are generating sitemaps manually, you can skip it.
Sitemap Index Structure
When your site has more than 50,000 URLs or you want to organize sitemaps by type, use a sitemap index:
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://example.com/sitemap-pages.xml</loc>
<lastmod>2026-06-09</lastmod>
</sitemap>
<sitemap>
<loc>https://example.com/sitemap-blog.xml</loc>
<lastmod>2026-06-08</lastmod>
</sitemap>
</sitemapindex>
Sitemap index elements
<sitemapindex> is the root element, replacing <urlset>. It uses the same namespace.
<sitemap> wraps each referenced sitemap, similar to how <url> wraps each page.
<loc> contains the URL of the child sitemap. Same rules as the <loc> in URL entries: absolute URL, correct protocol, URL-encoded.
<lastmod> indicates when the referenced sitemap was last updated. Optional but useful.
A sitemap index can reference up to 50,000 sitemaps. Each referenced sitemap can contain up to 50,000 URLs. That gives a theoretical maximum of 2.5 billion URLs, which is more than any site needs. For details, see sitemap index files explained.
Extension Namespaces
Image extension
<url>
<loc>https://example.com/page/</loc>
<image:image>
<image:loc>https://example.com/images/photo.jpg</image:loc>
</image:image>
</url>
Requires xmlns:image="http://www.google.com/schemas/sitemap-image/1.1" on the <urlset>. The <image:loc> tag is required. Up to 1,000 images per URL entry.
Video extension
<url>
<loc>https://example.com/videos/tutorial/</loc>
<video:video>
<video:thumbnail_loc>https://example.com/thumbs/tutorial.jpg</video:thumbnail_loc>
<video:title>Tutorial Title</video:title>
<video:description>Tutorial description.</video:description>
<video:content_loc>https://example.com/video/tutorial.mp4</video:content_loc>
</video:video>
</url>
Requires xmlns:video="http://www.google.com/schemas/sitemap-video/1.1". Thumbnail, title, description, and either content_loc or player_loc are required.
News extension
<url>
<loc>https://example.com/news/article/</loc>
<news:news>
<news:publication>
<news:name>Publication Name</news:name>
<news:language>en</news:language>
</news:publication>
<news:publication_date>2026-06-09T10:00:00+00:00</news:publication_date>
<news:title>Article Title</news:title>
</news:news>
</url>
Requires xmlns:news="http://www.google.com/schemas/sitemap-news/0.9". All tags shown are required. See our news sitemap guide.
Hreflang (xhtml) extension
<url>
<loc>https://example.com/en/page/</loc>
<xhtml:link rel="alternate" hreflang="en" href="https://example.com/en/page/" />
<xhtml:link rel="alternate" hreflang="fr" href="https://example.com/fr/page/" />
</url>
Requires xmlns:xhtml="http://www.w3.org/1999/xhtml". Each URL entry needs the full set of hreflang links including self-referencing.
File size and compression
An uncompressed sitemap must be under 50 MB. For large sitemaps, use gzip compression and serve the file as sitemap.xml.gz. Google, Bing, and other search engines support gzip-compressed sitemaps. The 50,000 URL limit still applies regardless of compression.
Common Formatting Errors
Missing XML declaration. The <?xml version="1.0" encoding="UTF-8"?> line must be the very first line with no whitespace before it. Even a blank line or space before the declaration makes the XML invalid.
Missing namespace. Forgetting xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" on the <urlset> element. Search engines may not recognize the file as a sitemap.
Relative URLs. Using /page/ instead of https://example.com/page/ in <loc>. All URLs must be absolute.
Unescaped special characters. Ampersands in URLs must be escaped as &. Raw & characters break the XML.
Mixing sitemap index and urlset. A file must be either a <urlset> (listing URLs) or a <sitemapindex> (listing sitemaps), not both.
Non-UTF-8 encoding. Using a different encoding or having BOM (byte order mark) characters at the start of the file. Save as UTF-8 without BOM.
For a full validation checklist, see our sitemap validation guide.
Summary
An XML sitemap has a clear, well-defined structure. The <urlset> root element contains <url> entries, each with a required <loc> and optional <lastmod>, <changefreq>, and <priority>. Of the optional tags, only <lastmod> provides meaningful value to search engines. Namespace extensions add support for images, videos, news, and hreflang annotations. Keep the XML well-formed, use absolute URLs, and escape special characters.
Generate a valid XML sitemap
Crawl your site and get a properly structured sitemap with no formatting errors.
Try Instant Sitemap