As a publisher, there’s nothing worse than putting a substantial amount of time (and money) into content creation, only to see slow adoption into Google’s index.
After all, the sooner your content is indexed, the sooner you can climb up the ranks for your given keyword(s) and start pulling in organic traffic. Any delay in indexation only prolongs this process — and it already takes long enough to rank.
Simply put, learning how to get your content in Google’s web index quickly is step one in driving traffic to your newly-published content.
In this article, we’ll look at this problem from a holistic view, covering the details of Google’s index, what indexation means, and a variety of methods to get your content indexed.
Before we get into specific strategies for getting your website indexed, we need to understand some basic concepts:
This is the search robot that Google uses to crawl the internet. It’s also known as a search spider because it follows links to scrape and index other pages on your site, or other sites altogether.
This is the term used to define what Googlebot is doing when it’s on your site. Once Googlebot is done with a page, it looks for links to other pages, follows them, and repeats the same process over and over again. It reports the information it finds back to Google, where it’s either indexed or not.
After Googlebot crawls your site and sends the information it picks up back to Google, we get to the process of indexation. Not all content that is crawled is indexed — it depends on the quality of the website, page, and content itself. Regardless, Googlebot extracts relevant information from your pages (text, image alt tags, heading structure, etc.) and submits for potential indexation.
Google’s entire business is based on this process of crawling the web for content, analyzing it, and adding it intelligently to its search engine indexes.
Isn’t there only one search engine index? No, not anymore. In the past, Google worked off of one primary index for all of their users, but since the explosion of mobile devices, they’ve focused on building out a mobile-first index.
This mobile-first index is considered the “primary” index by Google now, due to the trend of mobile search. First world countries are transitioning off heavy desktop use and third world countries are skipping the desktop phase altogether, meaning mobile truly is becoming the way that most human beings interact with the internet.
Google keeps multiple supplementary indexes in play at all times to spot fill search results that are crawled less often, or as a backup for desktop devices (as they’re slowly falling out of favor in the eyes of both Google and consumers).
We’ll never know the exact process that your content goes through when it’s getting indexed by Google, but decades of tinkering and testing has created an entire industry we now call SEO. And we’ve learned many best practices, despite not being insiders at Google learning exactly how the sausage is made.
Here’s a high-level overview of what’s happening when your site is crawled:
Fortunately, there’s a simple way to figure out if Google is indexing your content.
By using the “site:” search operator, you can see exactly how many pages of your site have been indexed by Google.
Here’s a look at the WordAgents website in Google:
There are 24 pages indexed in Google for Word Agents, which matches up with the number of pages we want indexed by Google. When you use the “site:” search operator, you’re looking for two different things:
The first and biggest problem is not being indexed at all. You can solve that problem by following any or all of the suggestions in the sections below.
The second problem is not having the correct pages indexed. There are many pages on a website that you’d probably prefer not to have indexed — category pages, archives, author pages, etc. Part of getting your website indexed is making sure that you prevent pages you don’t want to show up from being indexed as well.
A sitemap is an XML page on your site that lists all of the other pages on your site. You can see the WordAgents sitemap here for a better look, but it basically tells Google and other search engines what pages exist on your site and how often they’re updated.
Not all pages on your site are updated at the same frequency. You might add new blog posts to your homepage on a daily basis, but the blog posts themselves are updated once a week, month, or never again. A sitemap helps Google understand how often it should send Googlebot back to specific pages on your site to re-crawl them for updated information.
The simplest way to add a sitemap to your site, assuming you’re using WordPress, is by installing the Yoast SEO for WordPress plugin. It’s a fantastic plugin for WordPress SEO in general, but also includes XML sitemap functionality and customization.
Here’s how to set it up. First, enable XML sitemap functionality:
Next, click on “Post Types” and enable / disable specific types of posts. As a general rule, it’s a good idea to enable posts and pages and disable media and other types of posts. If you run an ecommerce site, make sure you have products enabled.
Now that your sitemap is set up, all you need to do is submit it to Google Search Console so Google knows that your new sitemap exists:
That’s it! Now you have a Google-approved source of data on how your site is being indexed. This is a major first step in troubleshooting any indexation issues you have, as well as a great way to encourage Google to crawl and index your content quicker.
Robots.txt is a funny little file that exists (or should exist) on every website. You’ll find it at yourdomain.com/robots.txt in almost all cases.
It tells search engine spiders what to do when they reach your site. If you have a robots.txt file, it’s the first thing a spider looks at before it does any crawling. They will then follow any instructions within the robots.txt file when crawling your site. If they don’t find a robots.txt file, they assume that you want every page on your site to be crawled.
Check if You Have a Robots.txt File
The easiest way to do this is to navigate to http://www.yourdomain.com/robots.txt. If you don’t see one, you should create one in a plain text editor and upload it to your site. However, this is rare — almost everyone should have a robots.txt file already. The more common issue is a poorly set-up file.
Above you can see the Word Agents robots.txt file. The first line is the user agents we’re giving these instructions to — by using the asterisk, we’re saying that every user agent should follow the rules below.
Next, there’s a list of pages that search engine spiders and other bots aren’t allowed to crawl. For example, we’ve disallowed spiders from crawling certain pages on our site that don’t matter for search engines, like our order page.
Here’s what Google has to say about your robots.txt file:
Most websites don’t need to set up restrictions for crawling, indexing or serving, so their pages are eligible to appear in search results without having to do any extra work. That said, site owners have many choices about how Google crawls and indexes their sites through Webmaster Tools and a file called “robots.txt”. With the robots.txt file, site owners can choose not to be crawled by Googlebot, or they can provide more specific instructions about how to process pages on their sites. - Google
As a general rule, you want to disallow Google from crawling pages that have no search value for visitors to your site. For many webmasters, this means any pages that will have duplicate content to pages that you do want indexed. Category archive pages are a good example of this, because they’re entirely comprised of existing blog posts that are being crawled already.
That being said, be careful when editing your robots.txt file. One simple mistake can prevent Google from crawling large portions of your site, which is the opposite of what you want.
It’s not uncommon for a website to have unnoticed crawl errors. There’s so much to work on when it comes to building and growing websites, sometimes we overlook simple issues preventing indexation.
Crawl errors come about most often when making large changes to a website — removing a forum, adding pages, moving pages to new URLs, etc. Here’s how to make sure you’re not suffering from crawling issues:
If you see any errors, download the list and get to fixing them! The most likely problem you’ll run into are old URLs that are no longer being found, and aren’t being redirected to newer, existing content. Fix all of these URLs and then “Mark as Fixed” in Search Console and you’ll be off to the races.
This is an often-forgotten tip that makes a bit of difference in how quickly your new content is indexed by Google and other search engines.
Go to your WordPress dashboard and then navigate to Settings > Writing and scroll to the Update Services section. By default, you should see this:
Ping-o-matic is the default service provided, but there are many more you can (and should) add. Copy and paste the list below into the Update Services box, then click “Save Changes”:
This is a no-brainer suggestion that you should be doing anyways for content promotion purposes, but it deserves a mention because it has a material impact on how quickly your content gets indexed.
Here’s a simple social share schedule proven to increase engagement on a single piece of content: post on every social platform the day you publish, then schedule another share for each platform after a week, and again after a month. You’ll be hitting different segments of your audience on each platform each time, so engagement to the piece will increase without you alienating your audience for sharing too much.
On top of that, your content will be picked up quicker by search engines, especially if those early shares take off with your community.
If you’re in the unenviable position of having an indexation problem, the first thing to note is how widespread the issue is. Are you having site-wide indexation issues, or is the problem only limited to specific type of page, or even better, a single piece of content?
First identify how severe the issue is. Go to Google Search Console and check your sitemap:
Look for a discrepancy between the submitted web pages and indexed web pages. This is the first and most important step to figuring out what’s going on. If you see a lower number of indexed pages vs. submitted pages, you have an indexation issue.
First, check that your robots.txt file isn’t preventing Googlebot from crawling pages that you want indexed. This is a surprisingly common problem, and deserves to be first on the list. Often times, you’ll accidentally prevent Googlebot from seeing specific pages. This is a simple fix — all you need to do is delete the offending line.
Check to see if you are excluding parameters that you want included. This is a smaller issue, but if you go to Google Search Console > Crawl > URL Parameters, you might find that Google isn’t including parameters that you’d like them to include.
Check to see if the non-indexed content is duplicate or thin. Google’s pretty good at determining how “worthy” of indexation a particular piece of content is. Look at the content on your site that’s not indexed and ask yourself, “Does this deserve to be indexed?” If it’s thin content or largely duplicated from another website (or your own website, as with archive pages), it might be the case that it’s better off not being indexed.
Are there enough links to your non-indexed content? Increasing the internal linking to your non-indexed pages can give them the boost they need for Google index them. A good strategy here is to link to a non-indexed page from a page that you know is both indexed and cached by Google. This way you guarantee that Googlebot will pick up the link to the non-indexed page the next time they crawl your site.
Getting your content indexed quickly might not be the flashiest or most fun part of content marketing, but it’s nevertheless crucial. We think of it as a foundational element — you don’t think about it often, but if it’s not working, you’re in a world of trouble.
We all know that organic traffic is an asset that compounds over time. This makes getting content indexed and ranking as quickly as possible paramount, because the sooner it’s up, the sooner it compounds. It’s similar to investing in your retirement accounts as early in life as possible...taking advantage of our most valuable asset — time — is key.