Why isn't Google crawling my website's meta

5 reasons why your pages are not being indexed by Google

You have finally put your new website online and are excited about the visitors. But nobody comes - because your URLs appear after months still not in Google search results. In this article you will learn more about possible reasons and learn how to fix errors yourself. For a basic explanation of how Google Search works and how pages are crawled and indexed, watch the video below first.

A crawler - what is it?

Even the most beautiful web presence has to be indexed so that online readers can discover it. The prerequisite for this is that the Googlebot crawls it: One of Algorithm driven program finds your website and lists all links to be crawled. The Googlebot takes the information and sorts it in an index with regard to its relevance and possible target groups.

Your page is never directly indexed

It repeats this process at regular intervals, so your website is not crawled just once. So don't panic if it doesn't work right away - the googlebot takes time with the mass of web information that needs to be processed every day around the world. Due to a limited crawl budget, it often does not search the entire website, but only selected pages. You can find a clear statement about this in the Google Search Console Forum. However, if too many of your pages are being ignored, you should find the sources of the error.

Google does not crawl all of the pages on the web, and it does not index all of the pages it crawls.

No indexing: first quick measures

On the one hand, Google supports you in the search for clues in the Search Console in the "crawling" area. In the report "Crawling error" you can find out if any errors have occurred in the last 90 days. They could have prevented the Googlebot from accessing some areas of your website. The category "URL error" indicates missing 301 redirects and pages not found (404 errors). The also gives you an additional overview "Site query" at Google. To do this, first enter your domain in the following format in the Google search mask:
site: exampledomain.de

Check which pages are affected

If you are asked whether you own this domain, you should first register the page in the Google Search Console. Log in with your login and choose on the start page "Add property". Here you enter the domain. You will receive information on how to confirm your ownership. The best thing to do is to download the given code and upload it to your website. If your site is already "known" to Googlebot, you will see it at this point your indexed urls. Does the number of pages roughly correspond to the number put online or are there major deviations? Check the following five points if there are any discrepancies.

EXTRA: How to optimize your IT infrastructure in the home office

1. Non-existent XML sitemap

Web crawlers like the Googlebot scour the internet for new content and wander from page to page. At least one link should lead to your page, otherwise it will remain invisible to the bot. This is not a problem with good onpage optimization - every new page will be found at some point. However, to speed up the search process, you should create an XML sitemap for Google as an indexing aid.

This is XML and this is how you work with it:

XML sitemaps are standardized text filesthat contain the structure of your website in machine-readable form and that search engines can easily interpret. Not only do they convey the URLs to Google, but also the date and frequency of changes and the priority or hierarchy of the page content. Content management systems such as WordPress offer plugins and tools for creating a sitemap, but you can also create them manually. If your uncompressed sitemap is larger than 10 MB, you have to subdivide it into several smaller sitemaps and into one Sitemap index file Submit.

Add a sitemap: This is how it works

The most convenient way to submit it is to Google the Google Search Console sitemaps tool. Log in to your account and select the relevant website. In the left tab you will find the item "Crawling" and under it "Sitemap". If one has not yet been submitted, you will see an error message. You click on "Add sitemap", Your URL will appear and an empty field in which you can insert the sitemap you created. Google will suggest other ways for you to submit a sitemap as well. If you have a good knowledge of code changes, indicate the path to your sitemap by adding the following line anywhere in your robots.txt file:
Sitemap: http://beispieldomain.de/sitemap_location.xml.

Possible sitemap errors

Even if you have already submitted the sitemap, errors can occur, which you can also identify in the "Sitemaps" area of ​​the Search Console. Below are some of the problems that Google lists under "Sitemap errors and solutions".

  • URLs not accessible / URL not allowed
    Check that your file is in the right location and on the right level. Make sure that all URLs start with the same domain name as the location of your sitemap, i.e. uniformly with www., Http or https.
  • URLs not accessed / 404 errors
    Google cannot fully process your sitemap. This happens, for example, if some URLs contain too many redirects that Googlebot cannot retrieve. Get rid of your broken links and set up permanent redirects.
  • Invalid or incomplete URL
    URLs are invalid if they contain unsupported characters, that is, if they are coded illegibly, or if the formatting is specified with htp: // instead of http: // (or vice versa).

2. Duplicate content

Also, check to see if Google has indexed your preferred page or a different version of the domain name. If http://exampledomain.de has not been indexed, add to your account as well http://www.beispieldomain.de and the possibly existing https version. Click on your website on the Search Console start page and specify which page Google should index under the “Website settings” gear icon.

Set the canonical tag

Use that too Canonical day, around Duplicate content to avoid: It is placed in the header of the source code and shows the crawler which of the URLs is the original source. This can look like this for the preferred domain:

<link rel="canonical" href="http://www.beispieldomain.de/beispielseite.htm"/>

But be careful: the canonical tag is not necessary everywhere and it can be rough if handled incorrectly Cause crawling errors. It may not appear in the body area of ​​the page source text or be used twice in the metadata.

3. Technical requirements for indexing

Status Codes:

Also look into your site's HTTP status codes: check regularly whether 301 redirects not work or whether 404 status codes exist. Pages with this status cannot be found for potential readers and web crawlers. Links that refer to such sites are called "dead links".

robots.txt file:

The problem may also be in the robots.txt file. The robots.txt file is a text file in which you can specify which areas of a domain may and may not be crawled by the search engine's crawler. Webmasters can use it to influence the behavior of search engine crawlers. Directories that should not be indexed can be marked with "Disallow".

User agent: * Disallow

With this command, you tell web crawlers to ignore entire areas of the page. These URLs then appear in the Search Console under "Blocked URLs". With the report "Access as by Google" in the Search Console, you can also find out whether the Googlebot is being blocked by the robots.txt. By the way, after a relaunch at the latest, a precise one is generally recommended Checking the robots.txt.

Metatag "noindex":

With the entry "noindex" in the meta tags, a search engine robot is informed that the visited page should not be included in the index. With “noindex”, webmasters have the opportunity to use the Affect indexing of their pages. The use of the noindex tag can be useful for:

  • internal search results pages
  • double category pages
  • copyrighted content

"Nofollow" attribute:

The attribute rel = "nofollow" is a micro-award in the HTML code of a website. It is used to mark up certain links so that they are not included in the formation of the Google index. The rel = “nofollow” attribute tells the search engine robots that crawl a website that they have this link do not have to or may not follow.

4. WordPress settings

If you use WordPress as a content management system and your blog is not indexed, the solution can be very close. Check in the "Settings" area in the left column whether the function "Stop search engines from indexing this website" is activated. Turning them off will no longer prevent Googlebot from showing them in search results. Other CMS have similar setting options.

5. Bad Neighborhood

When you have bought a domain, you immediately wonder which backlinks are used to get new traffic to your site. Link farms or bought links are of course out of the question, rather high quality links with thematic reference. If your page is still not indexed, look into its history. Former owners may have "Bad Neighborhood Links", Spam or hidden items placed on the page?

Explain the change of ownership to Google

If a bad link points to a website or an outbound link points to a website with many bad links, then that website is in bad neighborhood and is losing out Trust from google. It can be a bad quality link if one of the web pages violates the guidelines of search engines such as Google or Bing. If the page has a previous Google penalty and has been disindexed for this reason, submit a “Request to review the website” and explain to Google that you have unknowingly taken over a domain that unfortunately did not meet Google's guidelines. Checking and re-indexing is possible, but it can some time last.

Conclusion: indexing is mandatory

The indexing of the start page and subpages are essential for your success on the Internet. Why all the work when the page disappears into nirvana? So take the time to check for any web crawling errors with the Google Search Console. Follow the webmaster guidelines, avoid bad links and hidden text. Technical pitfalls such as incorrectly programmed robots.txt files, “nofollows” in meta tags or multiple indexing are also common reasons for poor visibility. And of course you need the content Convince Google! This rarely works with a simple landing page without links.

Kathrin Schubert M.A. - the word juggler - is a Romance writer, copywriter and freelance editor. She writes and proofreads web and advertising texts for companies in Germany and abroad. She supports SMEs and start-ups in optimizing their company blogs and websites. She shares her knowledge of search engine optimization and content on her blog www.seotexterin-muenchen.de. Services: Web texts, articles for magazines, customer magazines and blogs, advertising editing. www.kathrin-schubert.de