For years webmasters have faced problems indexing their authenticated pages on search engines. While they may want Google to index their pages, they still want normal visitors to log in to view that information. Search engines like Google do not index pages that ask for authentication. The log in page treats all the users including Googlebots the same way. This prevents the pages to appear on the SERP (search engine results page).
Considering this factor Google has introduced a new feature known as, “First Click Free”. Here, only those users who arrive at the website through Google will be granted permission to view the page. However, to view further information, they will be asked for login details. To implement this feature we need to follow these Google guidelines:
- The users who arrive at your website through Google search result should be allowed to see the full text of the content they’re trying to access.
- You need to make the content identical to both Googlebot and the users who visit from Google.
Technical ImplementationTo include the restricted content in Google’s search index, the crawler needs to be able to access that content on the site. Since Googlebot cannot access pages behind registration or login forms we need to configure the website to serve the full text of each document when the request is identified as coming from Googlebot via the user-agent and IP-address. It’s equally important that the robots.txt file allows access of these URLs by Googlebot.
When users click a Google search result to access the content, the web server will need to check the “Referer” HTTP request-header field. The website needs to display the entire content of the page that is protected from other visitors. When the referring URL has a Google domain www.google.com, the website needs to display the entire content. Hence, based on the IP addresses or the User-Agent HTTP header, the content is delivered.
This helps the Googlebots to crawl the protected content, providing you quality traffic you seek.
Some people refer to this as Cloaking, but Google denies this fact. According to Google, “Cloaking refers to the practice of presenting different content or URLs to users and search engines. Serving up different results based on user agent may cause your site to be perceived as deceptive and removed from the Google index.” But this practice shows the same content to Googlebots and Google users which could be termed as a fair practice.
August 10, 2009 ISSUE #37