crawling - IAA Wedding Association

10 Steps to Get Your Site Properly Indexed by Google *

by International Associations | Nov 10, 2020 | Search Engine Optimization

If Google Doesn’t Index Your Site, You’re Invisible

If Google doesn’t index your website, then you’re pretty much invisible. To the rest of the world your site doesn’t even exist. You won’t show up in any search queries, and you have almost no chance of getting any organic web traffic.

Given that you’re here reading this article, I’m guessing that you understand that you need to do something to make your website visible to the world. So let’s get straight down to business.

This article will teach you how to fix these three problems:

Your entire website isn’t indexed.
Some of your pages are indexed, but others aren’t.
Your newly‐published web pages aren’t getting indexed fast enough.

But first, let’s make sure we’re on the same page and fully‐understand what indexing really does.

Google discovers new web pages by crawling the web, and then they add those pages to their index. They do this using a web spider called Googlebot.

Confused? Let’s define a few of the key terms you need to understand.

Crawling: The process of following hyperlinks on the web to discover new content.
Indexing: The process of storing every web page in a vast database.
Web Spider: A piece of software designed to carry out the crawling process at scale.
Googlebot: Google’s web spider.

Here’s a video from Google that explains the process in more detail:

When you search or Google something, you’re asking Google to return all the relevant pages from their index. Because there are often millions of pages that fit the bill, Google’s ranking algorithm does its best to sort the pages so that you see the best and most relevant results first.

You may have noticed that Google places their sponsored pages at the top of their list – so they might not actually be the most relevant for YOU.

The critical point I’m making here is that indexing and ranking are two different things. Indexing is showing up for the race, ranking is winning is winning the race.

You can’t possibly win without showing up for the race in the first place.

Go to Google, enter your website then search for site:yourwebsite.com

This number shows roughly how many of your pages Google has indexed.

If you want to check the index status of a specific URL, use the Same site:yourwebsite.com/web-page-slug operator.

Google Find Out How Much Traffic a Website Gets

No results will show up if the page isn’t indexed.

Now, it’s worth noting that if you’re a Google Search Console user, you can use the Coverage report to get a more accurate insight into the index status of your website. Just go to: Google Search Console > Index > Coverage for a full report.

Google Search Console Go to Index Go to Coverage

Look at the number of valid pages both with and without warnings.

If these two numbers total anything but zero, then Google has at least some of the pages on your website indexed. If not, then you have a severe problem because none of your web pages are indexed.

SIDENOTE: Not a Google Search Console user? Sign up. It’s free. Everyone who runs a website and cares about getting traffic from Google should use Google Search Console. It’s that important.

You can also use the search Console to check whether a specific page is indexed. To do that, paste the URL into the URL Inspection tool.

If that page is indexed, it’ll say “URL is on Google.”

Google Search Console if the page is indexed it says URL is on Google

If the page isn’t indexed, you’ll see the words “URL is not on Google.”

Google Search Console If the page is not indexed it says URL is not on Google

Have you found that your website or web page isn’t indexed in Google? If so, you need to try this:

Go to Google Search Console
Navigate to the URL inspection tool
Paste the URL you’d like Google to index into the search bar.
Wait for Google to check the URL
Click the “Request indexing” button

This process is good practice when you publish a new post or page. You’re effectively telling Google that you’ve added something new to your site and that they should take a look at it.

However, requesting indexing is unlikely to solve underlying problems preventing Google from indexing your old pages. If that’s the case, follow the checklist below to diagnose and fix the problem.

1) Remove Crawl Blocks in Your robots.txt File

Is Google not indexing your entire website? It could be due to a crawl block in something called a robots.txt file. To check for this issue, go to yourdomain.com/robots.txt.

Look for either of these two snippets of code:

Both of these tell Googlebot that they’re not allowed to crawl any pages on your site. To fix the issue, simply remove them.

A crawl block in robots.txt could also be the culprit if Google isn’t indexing a single web page. To check if this is the case, paste the URL into the inspection tool in Google Search Console. Click on the Coverage block to reveal more details, then look for the “Crawl allowed? No: blocked by robots.txt” error.

This indicates that the page is blocked in robots.txt.

If that’s the case, recheck your robots.txt file for any “disallow” rules relating to the page or related subsection.

Google robots.txt remove where necessary

2) Remove Rogue Noindex Tags

Google won’t index pages if you tell them not to. This is useful for keeping some web pages private. There are two ways to accomplish this:

Method 1: meta tag

Pages with either of these meta tags in their <head> section won’t be indexed by Google:

This is a meta robots tag, and it tells search engines whether they can or can’t index the page. The key part of this is the “noindex” value. If you see that, then the page is set to noindex.

To find all pages with a noindex meta tag on your site, run a crawl with Ahrefs’ Site Audit. Go to the Internal pages report and look for the “Noindex page” warnings.

Click through to see all affected pages. Remove the noindex meta tag from any pages where it doesn’t belong.

Method 2: X‐Robots‐Tag

Crawlers also respect the X‐Robots‐Tag HTTP response header. You can implement this using a server‐side scripting language like PHP, or in your .htaccess file, or by changing your server configuration.

The URL inspection tool in Search Console tells you whether Google is blocked from crawling a page because of this header. Just enter your URL, then look for the “Indexing allowed? No: ‘noindex’ detected in ‘X‐Robots‐Tag’ http header”

Google URL inspection tool in Search Console tells you whether Google is blocked from crawling a page because of this header

If you want to check for this issue across your site, run a crawl in Ahrefs’ Site Audit tool, then use the “Robots information in HTTP header” filter in the Data Explorer:

Google If you want to check for this issue

Tell your developer to exclude pages that you want indexing from returning this header.

3) Include the Page in Your Sitemap

A sitemap tells Google which pages on your site are important, and which aren’t. It may also give some guidance on how often they should be re‐crawled.

Google should be able to find pages on your website regardless of whether they’re in your sitemap, but it’s still good practice to include them. After all, there’s no point making Google’s life difficult.

To check if a page is in your sitemap, use the URL inspection tool in Search Console. If you see the “URL is not on Google” error and “Sitemap: N/A,” then it isn’t in your sitemap or indexed.

Google To check if a page is in your sitemap, use the URL inspection tool in Search Console

Are you not using Search Console? Head to your sitemap URL which is usually, yourdomain.com/sitemap.xml and search for the page.

Google Head to your sitemap URL usually, yourdomain.com:sitemap.xml and search for the page

These pages should be in your sitemap, so add them. Once done, let Google know that you’ve updated your sitemap by pinging this URL:

http://www.google.com/ping?sitemap=http://yourwebsite.com/sitemap_url.xml

Replace the last part of the above URL with your sitemap URL. You should then see something like this:

That should speed up Google’s indexing of the page.

4) Remove Rogue Canonical Tags

A canonical tag tells Google which is the preferred version of a page. It looks something like this:

<link rel="canonical” href="/page.html/">

Most pages either have no canonical tag, or what’s called a self‐referencing canonical tag. That tells Google the page itself is the preferred and probably the only version. In other words, you want this page to be indexed.

But if your page has a rogue canonical tag, then it could be telling Google about a preferred version of this page that doesn’t exist. In which case, your page won’t get indexed.

To check for a canonical, use Google’s URL inspection tool. You’ll see an “Alternate page with canonical tag” warning if the canonical points to another page.

Google To check for a canonical, use Google’s URL inspection tool

If this shouldn’t be there, and you want to index the page, remove the canonical tag.

IMPORTANT: Canonical tags aren’t always bad. Most pages with these tags will have them for a reason. If you see that your page has a canonical set, then check the canonical page. If this is indeed the preferred version of the page, and there’s no need to index the page in question as well, then the canonical tag should stay.

If you want a quick way to find rogue canonical tags across your entire site, run a crawl in Ahrefs’ Site Audit tool. Go to the Data Explorer. Use these settings:

This looks for pages in your sitemap with non‐self‐referencing canonical tags. Because you almost certainly want to index the pages in your sitemap, you should investigate further if this filter returns any results.

It’s highly likely that these pages either have a rogue canonical or shouldn’t be in your sitemap in the first place.

5) Check That the Page Isn’t Orphaned

Orphan pages are those without internal links pointing to them. Because Google discovers new content by crawling the web, they’re unable to discover orphan pages through that process. Website visitors won’t be able to find them either.

To check for orphan pages, crawl your site with Ahrefs’ Site Audit. Next, check the Incoming links report for “Orphan page (has no incoming internal links)” errors:

This shows all pages that are both indexable and present in your sitemap, yet have no internal links pointing to them.

IMPORTANT: This process only works when two things are true:

All the pages you want indexing are in your sitemaps
You checked the box to use the pages in your sitemaps as starting points for the crawl when setting up the project in Ahrefs’ Site Audit.

Not confident that all the pages you want to be indexed are in your sitemap? Try this:

Download a full list of pages on your site (via your CMS)
Crawl your website (using a tool like Ahrefs’ Site Audit)
Cross‐reference the two lists of URLs

Any URLs not found during the crawl are orphan pages.

You can fix orphan pages in one of two ways:

If the page is unimportant, delete it and remove from your sitemap.
If the page is important, incorporate it into the internal link structure of your website.

6) Fix Nofollow Internal Links

Nofollow links are links with a rel=“nofollow” tag. They prevent the transfer of PageRank to the destination URL. Google also doesn’t crawl nofollow links.

Here’s what Google says about the matter:

Essentially, using nofollow causes us to drop the target links from our overall graph of the web. However, the target pages may still appear in our index if other sites link to them without using nofollow, or if the URLs are submitted to Google in a Sitemap.

In short, you should make sure that all internal links to indexable pages are followed.

To do this, use Ahrefs’ Site Audit tool to crawl your site. Check the Incoming links report for indexable pages with “Page has nofollow incoming internal links only” errors:

Remove the nofollow tag from these internal links, assuming that you want Google to index the page. If not, either delete the page or noindex it.

Recommended reading: What Is a Nofollow Link? Everything You Need to Know

7) Add “Powerful” Internal Links

Google discovers new content by crawling your website. If you neglect to internally link to the page in question then they may not be able to find it.

One easy solution to this problem is to add some internal links to the page. You can do that from any other web page that Google can crawl and index. However, if you want Google to index the page as fast as possible, it makes sense to do so from one of your more “powerful” pages.

Why should you do this? Because Google is likely to recrawl such pages faster than less important pages.

To do this, head over to Ahrefs’ Site Explorer, enter your domain, then visit the Best by links report.

This shows all the pages on your website sorted by URL Rating (UR). In other words, it shows the most authoritative pages first. Skim this list and look for relevant pages from which to add internal links to the page in question.

For example, if we were looking to add an internal link to our guest posting guide, our link building guide would probably offer a relevant place from which to do so. And that page just so happens to be the 11th most authoritative page on our blog:

Google will then see and follow that link next time they recrawl the page.

PRO TIP: Paste the page from which you added the internal link into Google’s URL inspection tool. Hit the “Request indexing” button to let Google know that something on the page has changed and that they should recrawl it as soon as possible. This may speed up the process of them discovering the internal link and consequently, the page you want indexing.

8) Make Sure the Page is Valuable and Unique

Google is unlikely to index low‐quality pages because they hold no value for its users. Here’s what Google’s John Mueller said about indexing in 2018:

He implies that if you want Google to index your website or web page, it needs to be “awesome and inspiring.”

If you’ve ruled out technical issues for the lack of indexing, then a lack of value could be the culprit. For that reason, it’s worth reviewing the page with fresh eyes and asking yourself: Is this page genuinely valuable? Would a user find value in this page if they clicked on it from the search results?

If the answer is no to either of those questions, then you need to improve your content.

You can find more potentially low‐quality pages that aren’t indexed using Ahrefs’ Site Audit tool and

URL Profiler. To do that, go to Data Explorer in Ahrefs’ Site Audit and use these settings:

9) Remove Low‐quality Pages to Optimize Your “Crawl Budget”

Having too many low‐quality pages on your website serves only to waste crawl budget.

Here’s what Google says on the matter:

Wasting server resources on low‐value‐add pages will drain crawl activity from pages that do actually have value, which may cause a significant delay in discovering great content on a site.

Think of it like a teacher grading essays, one of which is yours. If they have ten essays to grade, they’re going to get to yours quite quickly. If they have a hundred, it’ll take them a bit longer. If they have thousands, their workload is too high, and they may never get around to grading your essay.

Google does state that “crawl budget […] is not something most publishers have to worry about,” and that “if a site has fewer than a few thousand URLs, most of the time it will be crawled efficiently.”

Still, removing low‐quality pages from your website is never a bad thing. It can only have a positive effect on your crawl budget.

You can use ahrefs content audit template to find potentially low‐quality and irrelevant pages that can be deleted.

Go to Data Explorer in Ahrefs’ Site Audit and use these settings

This will return “thin” pages that are indexable and currently get no organic traffic. In other words, there’s a decent chance they aren’t indexed.

Export the report, then paste all the URLs into URL Profiler and run a Google Indexation check.

IMPORTANT: It’s recommended to use proxies if you’re doing this for lots of pages (i.e., over 100). Otherwise, you run the risk of your IP getting banned by Google. If you can’t do that, then another alternative is to search Google for a “free bulk Google indexation checker.” There are a few of these tools around, but most of them are limited to <25 pages at a time.

Check any non‐indexed pages for quality issues. Improve where necessary, then request reindexing in Google Search Console.

You should also aim to fix issues with duplicate content. Google is unlikely to index duplicate or near‐duplicate pages. Use the Content quality report in Site Audit to check for these issues.

10) Build High‐Quality Backlinks

Backlinks tell Google that a web page is important. After all, if someone is linking to it, then it must hold some value. These are pages that Google wants to index.

For full transparency, Google doesn’t only index web pages with backlinks. There are billions of indexed pages with no backlinks. However, because Google sees pages with high‐quality links as more important, they’re likely to crawl—and re-crawl—such pages faster than those without. That leads to faster indexing.

We have resources on building high‐quality backlinks on our blog.