It has been talking a lot about how Barack Obama has used the Internet to publicize his candidacy and for mobilizing voters. It has also been commented on many blogs the ambitious technology plan Obama for America (you can read here).
But one of the things that has caught more attention and few people have noticed: the change that has suffered the Robots.txt the website of the White House, very much in line with what Obama preaches.
What is Robots.txt?
It is a text file containing instructions on pages and not visitable visitable by Robots, a web page. That is, it indicates which parts of the website should not be scanned by robots.
Normally, it is content that appears on the website, but only want it to be accessible to people who surf the web, you do not want it indexed content appears in search engines. It is also used when a content manager creates duplicate content and thus penalized by search engines.
This file is created following the instructions can be found here: RobotsAnd all the robots that follow the "Robots Exclusion Protocol"Undertake to heed these instructions.
If a website has created this text file, robots understand that they can index it (although to be sought from that page robots.txt robots generate a 404 and therefore, it is recommended that a blank page is created and FTP upload is named Robots.txt to thus generated 404 on the page will be real and can be released by the webmaster).
Let's return to the White House Robots.txt
Until a few days ago, when explained in class what a file Robots.txt and what is the "Protocol Robots Exclusion"I put several examples to illustrate the different types we can create robots.txt to instruct robots indexers:
- A blank page robots.txt
- Robots.txt A page with more or less "normal" instructions
- A robots.txt page completely exaggerated and misplaced.
Well ... Obama has me "saboteado"Examples and it loaded my example of malpractice in a matter of Robots.txt: The webmaster of the new website of the White House has created a new Robots.txt well done, clear and concise.
The webmaster of George Bush Jr., had created a robots.txt with thousands and thousands of pages with forbidden access to the robots. Or say fits that ... there was nothing interesting in that content (once he had dedicated me to go read what they did not want it to be indexed ... pictures of the first lady, speeches, etc ...). But it showed well that the White House had a somewhat archaic what is internet and publishing content concept.
The new websmaster, in this sense, is shown to have much clearer what should be the website of an institution like the White House.
It is worth ... but how was this Robots.txt?
Fortunately, in the slides of my classes always I include screenshots of what I explain, not fail me the internet or where class have no connection ... (how sad to have to always think about this possibility).
So under these lines (at end of post) include the image I have filed and that now becomes history ... (Look at the bar scroll the pantallazo... It is the one that shows the magnitude of the listing)
The current robots.txt page can see it by clicking here: Robots.txt of Casablanca with Obama .