How to Create and Optimize XML Sitemaps
Creating a sitemap is an important part of optimizing any website.Not only do sitemaps provide search engines with a blueprint of how your website is laid out, but they can also include valuable metadata such as:
- How often each page is updated.
- When they were last changed.
- How important pages are in relation to each other.
Sitemaps are especially important for websites that:
- Include a lot of archived content that’s not linked together.
- Lack external links.
- Have hundreds or even thousands of pages.
As the name implies, these files provide bots with a map of your site that helps them discover and index the most important pages.
In this article, we will discuss the most important tips you need to know to create and optimize your sitemap for search engines and visitors alike.
1. Use Tools & Plugins to Generate Your Sitemap Automatically
Generating a sitemap is easy when you have the right tools, such as auditing software with a built-in XML Sitemap generator or popular plugins like Google XML Sitemaps.
In fact, WordPress websites that are already using Yoast SEO can enable XML Sitemaps directly in the plugin.
Alternatively, you could manually create a sitemap by following XML sitemap code structure. Technically, your sitemap doesn’t even need to be in XML format a text file with a new line separating each URL will suffice.
However, you will need to generate a complete XML sitemap if you want to implement the hreflang attribute, so it’s much easier just to let a tool do the work for you.
2. Submit Your Sitemap to Google
You can submit your sitemap to Google from your Google Search Console. From your dashboard, click Crawl > Sitemaps > Add Test Sitemap.
Test your sitemap and view the results before you click Submit Sitemap to check for errors that may prevent key landing pages from being indexed.
Ideally, you want the number of pages indexed to be the same as the number of pages submitted.
Note that submitting your sitemap tells Google which pages you consider to be high quality and worthy of indexation, but it does not guarantee that they’ll be indexed. Instead, the benefit of submitting your sitemap is to:
- Help Google understand how your website is laid out.
- Discover errors you can correct to ensure your pages are indexed properly.
3. Prioritize High-Quality Pages in Your Sitemap
When it comes to ranking, overall site quality is a key factor. If your sitemap directs bots to thousands of low-quality pages, search engines interpret these pages as a sign that your website is probably not one visitors will want to visit even if the pages are necessary for your site, such as login pages.
Instead, try to direct bots to the most important pages on your site. Ideally, these are pages that are:
- Highly optimized.
- Include images and video.
- Have lots of unique content.
- Prompt user engagement through comments and reviews.
4. Isolate Indexation Problems
Google Search Console can be a bit frustrating if it doesn’t index all of your pages because it doesn’t tell you which pages are problematic.
For example, if you submit 20,000 pages and only 15,000 of those are indexed, you won’t be told what the 5,000 “problem pages” are.
This is especially true of large e-commerce websites that have multiple pages for very similar products.
SEO Consultant Michael Cottam has written a useful guide for isolating problematic pages. He recommends splitting product pages into different XML sitemaps and testing each of them.
Create sitemaps that will affirm hypotheses, such as “pages that don’t have product images aren’t getting indexed” or “pages without unique copy aren’t getting indexed.”
When you’ve isolated the main problems, you can either work to fix the problems or set those pages to “noindex,” so they don’t diminish your overall site quality.
Update: Google Search Console has been recently updated in terms of Index Coverage. In particular, the problem pages are now listed and the reasons why Google isn’t indexing some URLs are provided.
5. Include Only Canonical Versions of URLs in Your Sitemap
When you have multiple pages that are very similar, such as product pages for different colors of the same product, you should use the “link rel=canonical” tag to tell Google which page is the “main” page they should crawl and index.
Bots have an easier time discovering key pages if you don’t include pages with canonical URLs pointing at other pages.
6. Use Robots Meta Tag over Robots.txt Whenever Possible
When you don’t want a page to be indexed, you usually want to use the meta robots “noindex,follow” tag. This prevents Google from indexing the page but it preserves your link equity, and it’s especially useful for utility pages that are important to your site but shouldn’t be showing up in search results.
The only time you want to use robots.txt to block pages is when you’re eating up your crawl budget. If you notice that Google is re-crawling and indexing relatively unimportant pages (e.g., individual product pages) at the expense of core pages, you may want to use robots.txt.
7. Don’t Include ‘noindex’ URLs in Your Sitemap
Speaking of wasted crawl budget, if search engine robots aren’t allowed to index certain pages, then they have no business being in your sitemap.
When you submit a sitemap that includes blocked and “noindex” pages, you’re simultaneously telling Google “it’s really important that you index this page” and “you’re not allowed to index this page.” Lack of consistency is a common mistake.
8. Create Dynamic XML Sitemaps for Large Sites
It’s nearly impossible to keep up with all of your meta robots on huge websites. Instead, you should set up rules logic to determine when a page will be included in your XML sitemap and/or changed from noindex to “index, follow.”
9. Use XML Sitemaps & RSS/Atom Feeds
RSS/Atom feeds notify search engines when you update a page or add fresh content to your website. Google recommends using both sitemaps and RSS/Atom feeds to help search engines understand which pages should be indexed and updated.
By including only recently updated content in your RSS/Atom feeds you’ll make finding fresh content easier for both search engines and visitors.
10. Update Modification Times Only When You Make Substantial Changes
Don’t try to trick search engines into re-indexing pages by updating your modification time without making any substantial pages to your page.
Last year, I talked at length about the potential dangers of risky SEO. I won’t reiterate all my points here, but suffice it to say that Google may start removing your date stamps if they’re constantly updated without providing new value.
11. Don’t Worry Too Much About Priority Settings
Some Sitemaps have a “Priority” column that ostensibly tells search engines which pages are most important.
Whether this feature actually works, however, has long been debated. Early last year, Google’s Gary Illyes tweeted that Googlebot ignores priority settings while crawling.
12. Keep File Size as Small as Possible
The smaller your sitemap, the less strain you’re putting on your server. Google and Bing both increased the size of accepted sitemap files from 10 MB to 50 MB in 2016, but it’s still good practice to keep your sitemap as lean as possible and prioritize your key landing pages.
13. Create Multiple Sitemaps If Site Includes >50,000 URLs
You’re limited to 50,000 URLs per sitemap. While this is more than enough for most sites, some sites will need to create more than one sitemap.
Large e-commerce sites, for example, might need to create additional sitemaps to handle extra product pages.