Indexing of https is one of those mysteries that makes the life of SEOs more interesting. While we know that it is possible to index it in most search engines, almost no one knows how to achieve it in the shortest possible time.
What is https?
https is the secure version of the http protocol. The difference between the two is that the former transmits encrypted data, while the latter transmits it unencrypted.
The https system uses encryption based on Secure Socket Layers (ssl) to send information.
The decoding of the information depends on the remote server and the browser used by the user.
It is mainly used by banking entities, online stores, and any type of service that requires the sending of personal data or passwords.
How does https work?
Contrary to popular belief, https does not prevent access to information, it only encrypts it when it is transmitted. Therefore, the content of a website that uses the https protocol can be read by search engine spiders. What cannot be read is the content that is sent from that website to its server, for example, the login and password to access a private area of the website.
The standard port for this protocol is 443.
How do we know that https is actually indexed?
Google has been indexing https since early 2002 and progressively, the rest of the search engines have been adapting their technology to index https as well.
The last search engine to do so was MSN, which achieved this goal in June 2006.
If we search for “https://www.” or inurl:https in the main search engines, we will find https pages indexed in them.
How can we index our https?
In principle, we can naturally index our pages in https, but as this protocol transmits information much more slowly, sometimes the spiders fail to download the pages in the time they have set and leave without indexing them. This is the main problem we can encounter. We will solve it by trying to reduce the download time of these pages.
How can we speed up https indexing?
There are two techniques:
- Google Sitemap: Include the https pages in our sitemap (we are referring to the Google sitemap, not the sitemap for humans), and register it in Google sitemaps.
- Guerrilla: distributing links to our https pages throughout the Internet, and thus getting the spiders that are indexing the pages where we have the links to also enter the https part of our website.
How can we prevent our https from being indexed?
It's not as easy as it sounds. We can't just include https pages in our robots.txt. Each port needs its own robots.txt, so we'll need to create a robots.txt for our http pages and another one for our https pages. That is, we'll also need to have a page called
If you need help indexing or deindexing your https pages, please feel free to contact us. We will be happy to assist you.
Additional information:
MSN blog about indexing – Article explaining that MSN starts indexing https
http://blogs.msdn.com/livesearch/archive/2006/06/28/649980.aspx
Google's information on how to not index https:
http://www.google.es/support/webmasters/bin/answer.py?answer=35302
More information about Google sitemaps:
Google SiteMaps
http://www.geamarketing.com/articulos/Descubre_indexacion_futuro_Google_SiteMap.php
Free online course on search engine positioning: Search Engine Optimization Course
http://www.geamarketing.com/posicionamiento_buscadores.php



