Skip to main content

Blogger's robots.txt and sitemap

Robots.txt
If you are using blogger for your hosting you have a robots.txt file automatically, and can NOT change it.

To find the robots.txt, open up your web browser, and type in your Blogger blog’s URL, at the end of the URL add robots.txt.

For example, if the URL of your blog is http://myblog.blogspot.com, then enter http://myblog.blogspot.com/robots.txt

You may find the following entries:
User-agent: Mediapartners-Google
Disallow:

User-agent: *
Disallow: /search

Sitemap: http://myblog.blogspot.com/feeds/posts/default?orderby=updated

Mediapartners-Google is an Adsense crawler, which crawls pages to determine AdSense content. Google only use this bot to crawl your site if AdSense ads are displayed on your site. So the first two lines means your blog allow Mediapartners-Google to crawl blog contents, nothing are disallowed, so it's empty after "Disallow:".

"User-agent:* " means all search engine, a star sign '*' means all. The robots.txt instruct all search engines not to craw the subdirectory /search, the purpose is to avoid duplicate content. Your Blogger posts can be reached by archive date (normal), and also by label, each different label on any blog will result in a different URL pointing to the same post.

http://myblog.blogspot.com/2010/01/blogpost1.html
http://myblog.blogspot.com/search/label/label1/blogpost1.html
http://myblog.blogspot.com/search/label/label2/blogpost1.html
http://myblog.blogspot.com/search/label/label3/blogpost1.html

We can see there is only on URL named by date - if index only by date - each and every post has one and only one post date.

But, how many lablels you assign to your post, there will be same number extra URL point to the same post. "If the search engines were allowed to index by label (under /search subdirectory), they would see 6 extra instances of that post, one per label search. Since the post was already indexed by archive date, those 6 label search instances would be considered duplicate content. The search engines would penalise all 7 URLs for having duplicated content." (Blogger help forum)

Sitmap
Now about sitemap. You can find a line in the Robot.txt that starts with 'Sitemap:'. The URL after that label is the location of your sitemap.

"Using the example above, the line would look like:

Sitemap: http://myblog.blogspot.com/feeds/posts/default?orderby=updated

Back in Google’s Webmaster Tools, the domain name part of the URL would already be included, so you would just need to specify the feeds/posts/default?orderby=updated portion of the sitemap URL. "(Technically Easy)

You can also ignore this sitemap, anyway Google Webmaster tools will look at the robots.txt file for a sitemap if one isn’t specified.

Google Sitemap will accept all xml pages, you can submit your blog feed as sitemap URL, for example atom.xml.

Comments

Cris said…
HI friends, this information is very interesting, I would like read more information about this topic, thanks for sharing. homes for sale in costa rica
songwriter said…
Yes, this was helpful as I was trying to fix a problem was Google had indexed houston.maindomain.com which I point FreshHoustonJobs.com I added no follow to robots.txt for houston.maindomain.com and was looking for info about sitemap info in robots.txt file. Thanks for posting. I already have a sitemap registered with Google. But now I'll add a link to it in my robots.txt too. Hope to keep Google from accessing the same content via subdomain that the realdomain FreshHoustonJobs.com points to. Don't want duplicate content.
Blogger said…
Get free website marketing tools at TraffiCheap.

Popular posts from this blog

How to Input Phonetic Symbols (IPA) in Google Docs

You can insert special characters by clicking "Insert" on the menu, then click the "Ω Special Characters", the choose "Latin" category from the drop-down menu, and then Phonetics (IPA) sub-category. Insert Special Characters in Google Docs There is a short-cut for inputting some IPA symbols which you use them frequently. Automatic Substitution in Google Docs similar to Auto Correct in MS Word. You can replace common acronyms, misspellings and other symbols. So you can set auto-replace for your IPA symbols, for example, "e<" for "ɛ", "o/" for "ø", "o>" for "ɔ" etc. Automatic Substitutions in Google Docs

How to stop Freenet?

How to stop or temporally shutdown Freenet? On Windows, you may find "stop freenet" in Freenet Tray. On Ubuntu, or other Linux system, go to your Freenet folder, run a command inside the terminal: FreenetUser@ubuntu:~/Freenet$ ls *.sh run.sh  update.sh You can see run.sh command, run.sh have six options, one of them is to stop the Freenet: FreenetUser@ubuntu:~/Freenet$ ./run.sh ? Usage: ./run.sh { console | start | stop | restart | status | dump } FreenetUser@ubuntu:~/Freenet$ ./run.sh stop Stopping Freenet 0.7... Waiting for Freenet 0.7 to exit... Stopping Freenet 0.7... Stopped Freenet 0.7. This is how you to stop the Freenet on Ubuntu.

Vodafone Router Configuration for Incoming Connection and other Services

This post is about specific Vodafone Home Broadband Router configuration for Incoming Connection and other service on your home computer. This is not a tutorial, and will not be going into details of the services and router configuaration, I just want to point out couple of points you need to pay attention to. 1. Port Forwarding. Vadafone Router has its own name for Port Forwarding, it's called Port Mapping under Port Management category.The picture below shows that I set up web server on my desktop, I map my local address on port 80; and I set up PPTP incoming connection by mapping local address on port 1723. Port Forwarding 2. Exposed Host. If you have set up Port Mapping, you still can't get what you want, then you have to put your device outside the Firewall, i.e. bypass Vodafone Connects (means Router) firewall, in a Demilitarised Zone (DMZ), using it's Exposted Host Function. Static NAT (Network Address Translation) means the Router will translate publi