Debugging The Network Unreachable / Robots. txt Unreachable Error

On 01 August, 2007, I noticed a significant drop in traffic to many of my sites, including my main website. Now, most webmasters will be familiar with the occasional blip when things go wrong, so you learn to sit tight and monitor the situation. Sometimes, these things can last anywhere from a day to 2 weeks, and the last thing you DON'T want to do is react in panic and make changes that aggravate the situation. However, after 3 weeks and no sign of recovery, I started to worry. I decided to do some checking on my own. Along the way, I learnt many lessons.

The symptoms

* Google Sitemaps reports "Errors" that alternated between :

  • Network unreachable: Network unreachable
    We encountered an error while trying to access your Sitemap. Please ensure your Sitemap follows our guidelines and can be accessed at the location you provided and then resubmit.
  • Network unreachable: robots. txt unreachable
    We encountered an error while trying to access your Sitemap. Please ensure your Sitemap follows our guidelines and can be accessed at the location you provided and then resubmit.

* Checking the Diagnostics tab, I started to see many pages listed as unreachable.

 Robots. txt unreachable

* Analysis of my cached in Diagnostics revealed Google's last cached date for my robots. txt file was was back in July 13, 2007. This would suggest that Googlebot had problems accessing my robots. txt file (and my site) after that.

* A look into my server logs showed Googlebot's last single visit for the day was on August 10. That seemed strange when in the past, Googlebot would visit me at least once every hour on average.

* The graph of Google's crawl rate showed Google stopped crawling my site in late July, and the second graph showed erratic download times co-inciding with Googlebot's absence.

googlebot_crawl_chart

googlebot_downloadtime_chart

I believe that as a result of Googlebot being unable to access my pages, my site dropped out of the Search Engine Ranked Pages (serps), so traffic that used to come from being ranked on page one for popular terms stopped. This caused an unnerving drop in revenue.

Checking Google
As usual, my first stop when these sort of things happen is the online forums. I needed to check if this was a widespread issue that webmasters were facing. I did find a number of threads on the same problem, but not enough to verify that this was a serious bug on Google's part. However, the small number of forum threads about a particular problem doesn't necessarily mean that it isn't a Google bug. After all, Google IS a machine, made up of hundreds of thousands of servers and many, many algorithms. And if your site is not a heavy-weight traffic generator, it's unlikely that a Google bug affecting your site is going to be top-priority for the guys at Mountain View.

Checking Myself and My Sites
Once I had more or less determined that this wasn't a widespread Google bug, I needed to make sure Googlebot didn't stop visiting because of my own doing.

* Did Googlebot stop visiting due to a penalty?
The only thing I could think of that would possibly cause a penalty by Google was reciprocal links pages. I had stopped reciprocal linking for a long time now BUT maintained those pages out of courtesy to the sites that still kept their links to me. However, recent updates in Google's Webmaster Guidelines suggests that reciprocal linking will hurt your site :

Don't participate in link schemes designed to increase your site's ranking or PageRank. In particular, avoid links to web spammers or "bad neighborhoods" on the web, as your own ranking may be affected adversely by those links.

I wasn't sure if my reciprocal links pages were part of the problem, but in the interest of self-preservation, I decided to remove all reciprocal link pages from all my sites.

* Was my robots. txt file okay?
First, the Google Webmaster Tools indicated to me that my robots. txt file was okay although its cached version was about 30 days old. The long lapse between the current date and the last cached date of the robots. txt file SHOULD have indicated to me that Googlebot had problems accessing the file. Robots. txt validators that I used below showed there was nothing wrong with my file :

http://www.invision-graphics.com/robotstxt_validator.html

http://validator.czweb.org/robots-txt.php

Conclusion : My robots. txt file seemed okay.

* Was my sitemap file okay?
Google's sitemaps can seem like an unruly monster to the uninitiated. Many webmasters have reported that the Google sitemap reports have been known to be buggy at times and report errors when there were none. In any case, I used the following tools to check the validity of my sitemaps and its schemas :

http://www.validome.org/google/

http://www.smart-it-consulting.com/internet/google/submit-validate-sitemap/

http://www.xml-sitemaps.com/validate-xml-sitemap.html

https://www.google.com/webmasters/tools/docs/en/protocol.html

http://www.sitemaps.org/protocol.php

Conclusion : My sitemap.xml file seemed okay and passed validation by the tools above.

* Was my .htaccess file okay?
My investigations revealed that incorrect coding in the .htaccess file could cause unnecessary looping (meaning I could have been sending Googlebot in circles and it threw up a red flag), so this had to be checked out. There were no known validators that I could use to verify that my .htaccess file was okay so I took my problem to the forums :

Since I had no experience in .htaccess coding and debugging, I had to rely on the more experienced webmasters who contribute to the forum listed above. Thankfully, all who looked at the content of my .htaccess file cleared it, saying there wasn't any problem with my .htaccess file.

Conclusion : My .htaccess file didn't seem to contain coding errors

* Were my pages blocking Googlebot?
It was unlikely that my pages were blocking Googlebot since I did not make major changes and Googlebot was visiting them up till the last visit on July 13 1007. However, I needed to make sure I did not inadvertently place any tag like the following in my pages :

  • <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
  • <META NAME="GOOGLEBOT" CONTENT="NOINDEX, NOFOLLOW">

Update : Another way you may be blocking Googlebot is if you load your pages with scripts that leave Googlebot "stranded".

Conclusion : All my pages DID NOT contain any tags that would have blocked Googlebot

Checking My Web Host Provider
The nature of the error in Google Sitemaps strongly suggested that Google was having problems accessing my site where other bots and search engines didn't. It was a really perplexing situation. My web host initially replied that they had not blocked Googlebot, so it was left to me to find possible loopholes. I turned again to the boards. The difficult part initially was trying to find out the right question to ask. So I started by describing my problem. Then one by one, contributors made suggestions and I followed through on every one of those suggestions.

A contributor commented that using her header check tool, she found all my URLs were returning a 403 (forbidden) status. I have not placed a live link to her tool because she has since taken it offline for upgrading and maintenance. In any case, to verify the results her tool was giving me, I used another header check tool. Indeed using this tool, most of my URLs returned a "Operation Timed Out" error.

I then wondered if my site was the only one that was experiencing 403 Forbidden and time out error. I copied the URLs of all the clients listed on their "prominent clients" page and checked them with the 2 tools. Indeed, I found many of those sites returned the same 403 - Forbidden status on the first tool and timed out on the second tool. I reported this finding to my web host.

The first sign that I was probably on to something was when there was a reply from them stating they needed time to "conduct tests". I signed up with a trial account with WatchMouse.com, a website monitoring service to see if any more timeout errors were popping up and indeed they were.

watch mouse website monitoring service

Again I reported this to my web host. Their reply stated that timeouts could be caused by many other reasons other than the servers, which I accepted. However, I noted that they had requested their Security Team to investigate the matter, which was another indication that we were on the right track.

Hurray!
Checking my stats on August 30th, I was surprised to see that the server had registered over a hundred visits from Googlebot. I immediately contacted my web host and asked if they had made any changes and they confirmed that they did. So here's what caused the inadvertent blocking of Googlebot according to them :

Our firewall has an automated mechanism which will block IP addresses deemed to be making too many concurrent connections to our server in a short time. Our security department has whitelisted the google network range that is noticed to make these connections. On top of that we have made the firewall less stringent in the sense we will allow a higher threshold of concurrent connections compared to previously. Based on your feedback, the configuration is just right.

It is not the server that has the problem but the datacenter network that is not reachable from certain locations. We have not change any settings at the time. However, it is possible that there are more users who use Google Sitemap, causing increased concurrent connections to the server. For the current issue, it appears that our firewall's stringent policy has temporary block the bot.

Lessons Learned
In hindsight, it occured to me that modification to my .htaccess file could have caused an increase in the concurrent connections to the server. I had modified my .htaccess file to solve canonicalization problems by redirecting :

  • redundant URLs to new URLs
  • non-www URLs to www URLs
  • index.php to root

I theorize that since these redirections involved hundreds of URLs it's possible that when I deployed the changes in my .htaccess file in mid July, it triggered the "increase" in concurrent connections as the bots were redirected to the correct pages. In other words Googlebot attempted to make 2 connections for every page - once to the old/non-www URL and then to the new/www URL. As the concurrent connections increased, it triggered the automated mechanism that blocked Googlebot's IP address. This in turn caused more time-out errors. The spikes in the Googlebot Download Time chart (above) indicates long download times which eventually ended in timeouts. Unfortunately, this affected one of the most important files - the robots. txt file - which every bot needs to before it accesses a site's pages. These time-outs also made my sitemap inaccessible, so since Googlebot could not access these 2 important pages, it could not confirm my site still existed!

However, my web host's analysis of the situation brought them to the conclusion that more of their clients (like myself) had begun to use Google Sitemaps and this is the reason for the increase in concurrent connections. Whatever the reason, at last check, Googlebot has resumed its crawl of my site and I must say it is a welcome sight.

Googlebot visits

I've learnt a lot as I struggled with this problem and I hope this post helps some of you who may be wondering why Googlebot has stopped visiting your site or you are experiencing the "Network unreachable: Robots. txt unreachable" error.

Share this blog post with a friend:

These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Technorati
  • Netscape
  • Reddit
  • YahooMyWeb
  • StumbleUpon
  • Linkter
  • SphereIt
Printed from: http://www.homewithandrew.com/index.php/debugging-the-network-unreachable-robots-txt-unreachable-error/ .
© HomeWithAndrew.com 2010.

52 Comments   »

  • Interesting article - I've been having similar issues fetching robots.txt, so I wonder if they might be related. I'll look into it when I get home :-)

    • Andrew says:

      Hi Blogs for Money...

      Hope you find the fault. When I asked on forums whether "network" meant the physical server infrastructure or website network, I got a variety of answers. One other webmaster I know also inadvertently blocked Googlebot with a setting in his server. Whatever the case, one thing I do know is that when Googlebot stops visiting, it leads to a whole lot of other symptoms and usually ends with a revenue drop. I hope you catch the problem before it hurts your earnings.

      Andrew

  • vrsane says:

    Hello,

    I am also having same problem.. google bot stopped visting one site .. there are other sites on the same server with different IP's .. have no problem with google bot ...

    Only one site was working fine until last week but sitemap is getting rejected with the same error network unreachable...

    Checked the server logs.. google bot is not coming at all..

    Did not do any thing that prevents the google bot as you have mentioned ion your blog..

    Any thoughts on this issue ..

    Please share , if possible please email me ...

    Regards,
    vrsane

    • Andrew Shim says:

      Hi vrsane...

      Hope you received my email reply, but for the benefit of other readers who might also be affected, I think the best way to deal with this is to troubleshoot systematically.

      It's a three-pronged approach. You have to be open that the error could be at Google's end, your site or your web host, although the nature of the error seems to point to a problem which the web host has to deal with.

      Andrew :mrgreen:

  • baghdad says:

    ok

    i have the same problem

    what shoud i do??

    if there is any modification to .htaccess file

    plz write it

    thanks

    • Andrew says:

      hello bahgdad...
      I must admit that .htaccess coding is really alien to me. My .htaccess contents are standard codes used to re-rirect my index.php to root and non-www pages to www. These were copied from examples in online forums and okayed by my web host support.

      From what I've read, if you have NOT done anything to your website and the error suddenly popped up, the solution is likely to be found at your web host.

      Hope it helps bahgdad... :lol:

  • baghdad says:

    :mrgreen: :mrgreen:

    thanks andrew

    but im biggener

    and not understand what you say :oops: :oops:

    plz

    if there is any suggestion to tell my host

    bec i ask my host

    he said the problem not from server

    plz help :sad: :sad: :sad:

    • Andrew says:

      I'm sorry to hear your problem with Googlebot.

      When I first asked my web host, they too said that it was not a problem on their server. So I had to do a lot of invertigation and learning on my own. It took me three weeks before I knew WHAT questions to ask.

      I understand that every situation is different when it comes to this error, so my best advice to you is to describe in detail what has happened, and ASK your web host to explain to you possible reasons WHY this error occured. That way, you learn and take it from there.

      I understand from Vamsee (the earlier commenter) that he checked his server logs and found that his firewall WAS indeed blocking 2 of Googlebot's IPs.

      Here's another post about someone with the same problem and how he found the solution :

      http://www.sabahan.com/2007/08/18/banned-from-google-find-out-how-to-entice-googlebot-to-recrawl-your-site/

      If you don't know how to explain to your webhost, perhaps you can point them to this post or the one above.

      Hope it helps... :smile:

  • baghdad says:

    thanks

    i will do so

    and come back again

    see you :razz:

  • jeni says:

    Hi,
    I just found your site googling for "network unreachable." I'm having the same problem - 140 of my pages are now unreachable according to my Google tools stats. This has been going on for two weeks. I've spent the last few hours testing everything that I can, and trying to think of what I installed on my site in the last two weeks. I just discovered I started having the issue the day after I installed BlogRush - blogrush.com. I uninstalled it a few days ago, so I'll soon see if that has anything to do with this, or if it's just a coincidence.

    I've put a message out to my webhost to see if they can look into the problem. I don't have access to my logs. My sitemap and robots.txt appear fine.

    • Andrew says:

      hello jeni...
      I don't think the BlogRush widget would cause this problem. As the error suggests, it is most likely and issue that your web host needs to deal with. Remember that the Google sitemaps errors may not be real-time. That means that your webhost may have been down while you were not aware and heaven forbid, that WAS the time Googlebot came visiting. This would result in a network unreachable error. You might want to use the tools mentioned in the post to help debug.

      There's also a free service called WatchMouse that will check your website every hour or so. YOu might want to use this to see if your website is accessible. Note that errors in accessing a website can occur at many points. Don't assume it is just occuring at your webhost's server.

      As for your sitemap and robotx.txt files (for the benefit of newbies), they may be viewable on your browser but you might still get the network unreachable error. Remember that this error (network unreachable) is reporting the status of Googlebot's attempt to visit your website. As my webhost mentioned, it could be due to an automatic barring by their server. Unless you are using a free blogging site - like blogspot - you should be able to access your server logs. If the problem persists, it may be time to change to your own domain.

      I hope the error clears up by itself - which has been reported before by other webmasters. This would indicate your server was down at the time of Googlebot's visit.

      Cheers Jeni!

      • jeni says:

        Thanks for your response! According to the Google tools, my site was having trouble on about 5 separate days over the last 2 weeks, for a total of 147 unreachable urls. Google didn't have trouble reaching the sitemap or robots.txt - just regular pages on my site.

        I just signed up for WatchMouse. I've been using InternetSeer for the past 7 years on my other site, and that only monitors once an hour, but I've only had my site down maybe 3 of 4 times in those 7 years, according to InternetSeer.

        I have my own domain, but I don't have a private server. Since I have so many different scripts installed across all of my websites, I think moving hosts would be a bigger nightmare than the loss of Google traffic. I feel like I'm stuck. I've always liked my host for their high uptime %...

        I did try all the other tools you suggested, and found no problems. However, in my .htaccess I also use the redirect to the www version of my site, but I've had this on my site til day one, and the google errors only started two weeks ago. Again I guess it points to a host issue.

        The only other thing I was wondering is if something on my pages is causing the pages to load too slow for Google, causing the errors. Guess I'll just play it by ear and hope my host comes up with something.

      • Andrew says:

        Sounds like you know what you're doing Jeni...

        If you have your own domain and it's hosted on an Apache server, your webhost SHOULD allow you to download your server logs - it's a common feature of every hosting package. If they don't, me thinks it's time to look for another host...

        Since you say that it's the regular pages that are affected, you need to check to see WHEN the errors were occuring. Only your log files can tell you this. When you know the exact time these errors were occuring, and what the errors were (404 etc) then you need to confirm this with your webhost. I know... NO webhost likes to admit fault immediately. Sometimes, it takes a bit of "convincing" before they admit....

        The other possibility would be an internal issue on Google's part. In all the forums that I asked this question, a number of webmasters also confirmed that they experienced similar problems which sort of resolved themselves. This would point to Google being the source of the problem itself.

        However, when you mention that you have lots of scripts running, it does throw up a few red flags. A common webmaster rule is never to make a page script-heavy with too many calls to external scripts or even place scripts within multiple nested tables. It's a long shot, but if the problem persists, what I would do is to create a test page that is script-free. Then add those scripts one by one, and wait to see if Googlebot throws up the same errors.

        I hope you get to the root of the problem Jeni. Just out of curiosity, is this the URL you are talking about?

        • jeni says:

          Yes this url is the one I'm having the problems with - savvyskin.com. I still haven't heard back from my host, so I'm going to keep monitoring with WatchMouse, and check back every day with google tools. I'm trying to keep my blog as simple as possible, but I've installed a lot of plugins and widgets, which could always slow things down. This is my first blog, so I am still learning how to perfect it for speed and function. But in the end I have a feeling the problem is my host (or hopefully a google glitch). Thanks for your help!

  • jeni says:

    Just a quick update. Even though my site doesn't seem to go offline at all, WatchMouse has found all sorts of errors - Timeout during negotiation, Host name lookup failure, Broken pipe, while sending HTTP header, etc. There have been about 20 errors per day! Did you notice such dramatic errors on your site when you got WatchMouse?

    At this point, I'm considering switching hosts just for my one site, but I'm extremely nervous about finding a new host, since the one I chose had excellent reviews when I chose it. And I'm worried it will be a nightmare switching my blog ever - not sure if I need to reinstall it on the new server, etc.

  • drmike says:

    Something I discovered with my own problem with the bot is that the robots.txt file can't have blank lines at the end of it nor any blank spaces at the end of the last line. The cxweb site that you direct folks to pointed this out to me as being the problem while the other site didn't even mention it.

    Thanks for the information. :)

    • Andrew says:

      yes... it was an eye opener for me too. I've been a programmer long enough to know that even a semi-colon (or lack of it) can throw everything out of kilter but come on... my first impression was... how the H*LL are we supposed to know a thousand and one syntax requirements for every language and file!

      Sigh... live and learn.... :???:

      • drmike says:

        It's a picky little bastard. :evil:

        Actually still can't get Google to see my sites. *sigh*

      • Andrew says:

        I think I only noticed Googlebot stopped visiting 2 weeks after its last visit. Then it was about another 1 week of frantic searches on forums and web for possible solutions and another desperate 1 week of ding-dong with my webhost. All in all, I lost a couple hundred bucks of earnings when traffic dissappeared.

        It took a personal plea to the CEO of my webhost before they actually got their Technical Manager into the picture and still the she (the Technical Manager) refused to admit that the problem was on their end. Fortunately, one of the Engineers was a really decent fellow with a conscience. He did his own research and found the problem.

        What I'm saying is... Don't give up drmike. Like everyone who's posted their comments here, your problem probably has a different twist to it that you need to figure out systematically.

        And yes...I really understand - it's like staring at a blank wall everywhere you turn!

  • Stanton Bond says:

    This thread is really helpful. I do have this to add. On EVERY website we're running with Apple WebStar serving software, our robots.txt file and our sitemap.xml files are being read just fine. On EVERY website we're running with Apache web serving software (two different servers), NONE of the robots.txt files appear to be read by Google -- all indicate the infamous Robots.txt file unavailable error.

    This can't be a network issue because the servers are all on the same LAN and the same WAN.

    I keep thinking it must be an Apache thing, but WHAT?

    I believe Apache is the most popular serving software on the internet, so if it's an Apache thing, gobs of people must be fighting this. Any Apache guru's out there?

    • Andrew says:

      Thanks for the great input Stanton. I DID notice that and wondered if it could indeed be the cause. I'm running on Apache, but when checking other sites that returned a 301 error using the tools mentioned above, a number of them were on IIS.

      Perhaps those who posted could leave a note to mention what server you're on. Who knows... we might help thousands of other baffled webbies end their nightmare!

      • jeni says:

        My site is on an Apache server, and Google doesn't have a problem reading my robots.txt file.

        I am still having the same "network unreachable" errors on a lot of my individual pages. WatchMouse is finding all sorts of errors almost every day on my site. Since I don't have any sites on other hosts, I don't know if it's common for WatchMouse to find lots of errors, or if my host is indeed really bad.

        I've been briefly researching new webhosts, but am extremely nervous to make a move. I haven't figured out if I have to reinstall my WordPress software and all my plugins, and have a new Mysql database created at the new host, etc. And I've been reading glowing reviews, and then horrible reviews, about different hosts, and I can't find any with mostly positive reviews.

        My host has written back to me, and said they made some changes to my server try to make it run better. But, since I'm on a shared host, they didn't have much else to offer me, other than moving to my own server. But since my site is not making much money, I'd probably pay more in hosting that I'm making.

        I'm not sure if the errors are really affecting my Google traffic. In the end, I wonder what harm having my pages "unreachable" is doing, if my pages aren't actually dropping out of Google.

      • Andrew says:

        Hello Jeni... please remember that data has to travel through many points from your server to the visitor and errors occuring at any one of these points CAN throw up and error on Watchmouse.

        Finding a new webhost can be very scary if its your first time, but generally speaking, when you decide to move, your current webhost SHOULD port over your WHOLE site (as is) to the new host. This means that they will send over a DUPLICATE of your site to the new host. They then will make the necessary changes to the DNS and wait for it to propogate. All in all, moving webhosts shouldn't take any more than 3 days.

        Yes, everybody will have different views about webhosts. Here's how I checked on my webhost prior to moving:

        - I contacted their Live Support chat and asked questions. If they're a good host, they should be online almost imeediately.

        - I asked technical questions. If they can explain everything and clear all my doubts (without sounding like they have better things to do) then they've scored another point.

        - I then asked if they would handle all the necessary tasks to port my site over and they confirmed everything would be handled according to procedure.

        - Then I visited their website and forums to gauge if they were a reliable host. What you don't want to do is to go with a reseller.

        Tell your webhost that the site is running fine, but it may be blocking Googlebot. Check your affected pages against HTML validators and see if they pass, although Googlebot is pretty lenient about this.

        If the Network Unreachable errors are recent, then ask your webhost to provide access to the server logs and note go through line by line for the affected dates. See what the error code is.

        Ask your webhost to check their firewall settings.

        In the meantime, post questions on forums and http://www.google.com/support/webmasters/. I didn't find the answer directly by doing this, but it led me slowly as to WHAT to look for, WHAT questions to ask and WHERE to ask them.

  • Rasheed says:

    Thanks !

    I reached your blog using google trying to find a solution for this problem.

    At last i found the firewall is blocking googlebot from accessing my server.

    I unblocked google's ip from the the black list and i am waiting for google coming visits.

    Thanks for your detailed post !

    10/10

    • Andrew says:

      Great Rasheed! My sources tell me that some web hosts set a low threshold for concurrent visits by the same bot to prevent abuse. In fact, many website owners ARE experiencing this problem but don't even realize it! They just chalk it up to "being penalized" by Google.

  • jeni says:

    Just a quick update! I had sort of forgotten about my "unreachable" problem because despite having 100 unreachable URLs at any given time, my Google traffic had tripled in the last month or two, so I was finally happy.

    Last week, my site went offline for an entire day, and I had enough, and switched hosts finally. I just checked, and I went from having 100 unreachable URLs to 3! Woohoo! It's not making any difference in my Google traffic, but it's a relief nonetheless!

    Case closed!

    • Andrew says:

      Wow! That's great news Jeni! Wasn't that much of a nightmare switching hosts right? However, now that you're on a new server, you need to keep an eagle eye on your stats. If you're using Google Sitemaps, then go to Tools > Set Crawl Rate and you will find 3 graphs that show you the number of pages Google has crawled, the number of kB it has downloaded a day and the Time spent downloading. The third one is important. If you see erratic spikes in this graph, it means Googlebot is having problems accessing your pages.

      Hope all is smooth sailing from here on Jeni!

  • jeni says:

    Yeah switching hosts ended up being really easy. I thought I would have to reinstall WordPress and all the plugins, but it turns out I didn't have to do any of that!

    I just checked the graphs, and there's only one giant spike in the last 90 days. I'll keep monitoring it to see how it pans out with the new host. Thanks!

  • Stefan says:

    This great, thanks a lot Andrew. I had exactly same problem. Lil bit headache, I thought it was my hosting firewall block googlebot IP. Then I read this article, I follow all the steps then violaa..., no more errors on google webmaster tools sitemap. Thanks...

  • Ben says:

    Thank you. I was pulling my hair out at this. Great post. I am on to my hosting provider about this.

    • Andrew says:

      Heh... just a reminder - please check things on your end first. I hate being the one that points the finger at webhosts... I hope things work out for you...

  • Le Trung Thu says:

    Currently! I have same problem with my site. I found your article very interesting. I will try ;)

    Thanks man :|

  • i cant figure out this xml sitemap i cant not get indexed at all

    • Andrew says:

      Hi Avery...
      There is a difference between an ordinary sitemap and an XML sitemap. An ordinary sitemap is just an ordinary (HTML) page that list out the links to subsections and pages. An "XML sitemap" would normally be referred to as an RSS feed. This is a page that conforms to a specific format usually written in XML. You can learn more about creating your own RSS XML feed if you Google "how to create your own xml rss feed". This RSS feed is basically used by RSS aggregators to "feed" web users with your content without them having to actually visit your site. In order to do this, web users will need to use an RSS reader. You can Google "RSS readers" to learn more about RSS feeds.

      As far as you site is concerned, I searched for "snipersuitghillie" and found 38 entries in Google, with your Home page at PR4. This means that you HAVE indeed been indexed by Google, so you are doing something right! Don't worry. Getting the hang of SEO and Google takes some time.

      All the best.

  • hard money says:

    Hey Andrew,

    Your post really helped me! I have a lot of sites and was getting the exact "network unreachabel" and "sitemap error bwecause of robots.txt" that you mentioned. I just forwarded this url to my webhost and hopefully they can fix it in their firewall rules. Your analysis was spot on and I really appreciate you spelling it all out. If you had some adsense ads on here I'd click them just to make you some money, but alas you only have the one webhosting ad and I've already got that... ;)

    • Andrew says:

      hi hard money... glad you found the post helpful.

      I now make it a habit to visit Webmaster Tools to check my sites' status and possible errors. Doing that daily is a heck of a lot better than getting one gigantic heart attack when traffic suddenly stops!

      Adsense doesn't do too well on HomeWithAndrew.com so I pulled them out long ago. But your kind intentions have made me think about putting a "tipping jar" script so's nice folks llike you can buy me a Big Mac! Cheers!

  • Jasper says:

    Andrew, this thread has been my living nightmare for the past 10 days. I have axperienced all the problems above and then some. I have done everything to try and get Google and my miserable host--HostGator--to play nicely and help me work through a solution, but they're acting like spoiled brats, blaming each other and pointing the finger. I'm at the end of my rope and I want OUT!!!! I cannot run my business this way. I do not want to spend the next week teaching HostGator how to do their job. I have simply given up and want an new Host. PLEASE!!!! Tell me a Host that most people like and has already resolved the issues illustrated here. If anyone knows a decent host, please let me know. I've got 25 blogs that are sitting there with no traffic because Googlebot can't access my plain-as-day, well-written sitemap.xml file (server time-outs) and breaks when it sees my simple, generic Google-supplied robots.txt file.

    Desperate in Boston.

    • Andrew says:

      Wow Jasper... 25 blogs and no visitors. That is really the pits. To be fair, I have NEVER met a System Admin who admits the fault could be on their end until they exhaust all possible options. By then, it may be too late to say "I told you so".

      Let's get to down to your prob. If I were you, I would first transfer ONE or TWO domain(s) out to any other host. Then you know the drill. Put out some links or get your pals to put out some links pointing to your site. Hard as it may be, sit tight and wait for Googlebot. Continue to monitor the situation for a couple of days (yeah... it's nerve wracking). If you're sure that Googlebot is spidering your (transfered) blog without problems, then chances are you are right and there is a problem over at your present host that they may be unable (or unwilling) to change. However, IF you still can't get Googlebot to visit after your domain transfer, you will need to check a 1001 other things that may have gone wrong.

      As far as I'm concerned (from the way they handled my problem above), my present webhost is as reliable and professional as I can expect. You might want to give them a try. Your 2 options are :

      1. The banner on the right leading to Exabytes.com. This is an affiliate ad. If you prefer not to go via my affiliate link, feel free to

      2. Go direct to http://www.exabytes.com

      I really hope you get out of this mess soon Jasper!

  • Jasper says:

    Hey Andrew...thank you very much for your excellent advice. I am definitely going to take it... starting right now. The latest update from my current host, HostGator, is that they suggest I try to verify my site from a different location using a different ISP. They suggest that maybe my current ISP, Verizon, is dropping packets in the transfer. I suspect they believe this because they do not block any bots or spiders coming from Google IPs, and they have been able to verify my sites without an issue. I have three PCs at home--two that connect wirelessly and one direct-connected. I've tried all three PCs and used both Firefox and IE Explorer, all without success. The odd thing is that a month ago, I had no problem verifying my sites in Webmaster Tools. And a strange new update is this: HostGator verified several of my sites for me, and these sites showed 'no errors' for the sitemap.xml. Google has since spidered the file and now reports errors. Some of these sites only have two or three pages, and I had rebuilt and resent the sitemap.xml file prior to HostGator verifying it, so this can't be a problem on my end. This has to be some problem in communication occurring between Google and HostGator. Well, anyway...I'm going to test the waters with a new host, as per your suggestion, and see how that goes. This thread, by the way, is excellent and I really appreciate your rapid feedback, Andrew! Much appreciated, mate!

  • Ger ke says:

    Hi Andrew, Great post, thanks for this! I am experiencing the same problem. I am in contact with my ISP now and they want to know the google network range so they can whitelist it. Do you know how I can provide them with this?

  • ankit says:

    I was facing same error and i also contacted my host..but they said they are not blocking any IP range..i found that due to my default wp privacy settings..it was blocking search engines to crawl my site..but i changed that option to allow all search engines to crawl my site..and other seach engines like yahoo, bing are crawling too but not google..even google ads are not showing properly..they are showing public service ads on my site ..where as my content is original and its not adult too..

  • Great post. I am experiencing the exact same problem with only a few of my pages (as compared to all pages, which makes it a bit more difficult to debog) and this was very helpful.

  • to my mind, network unreachable problem means when google bots tries to make so many concurrent connection, website server temporarily disallows access to them. then google reports it as network unreachable.

  • Sharath says:

    Hi, Nice post. I too had a similar problem. I added the following Meta Tags on all my pages and it worked out fine.

  • Sharath says:

    I too had the same problem. I added the following meta tags to all my pages and it all worked out fine.

    meta name="GOOGLEBOT" content="INDEX, FOLLOW" /
    meta name="robots" content="noodp,index,follow" /
    meta name="revisit-after" content="7 days" /

  • Hridoy says:

    This post helped me a lot to see solve a problem related to robots.txt. Thanks :razz:

  • Yohan Perera says:

    Actually your article helped me to solve a different kind of problem.

    I am using Pingdom to monitor my blog. Since the day I started using Pingdom with it’s default 5 minutes check, I started to receive frequent down alerts for my blog :sad: . I checked with the hosting company several times but they denied any down time :evil: .

    Later on I found out that the firewall guarding my web server has a automatic mechanism which blocks the IP address of any outside host that makes concurrent connections in a short period of time :mrgreen: .

    Since I was using 5 minutes check resolution Pingdom was making concurrent connections to my web server once in every 5 minutes. The firewall misunderstood this to be a suspicious act and blocked pingdom frequently.

    Pingdom had no idea that it’s being blocked by a firewall and can’t get to the site. Thus reported to me as the site is down :lol: .

  • Observed your site on del.icio.us these daysand really liked it.. i bookmarked it and will likely be back again to catchit out some more later ..

  • What is the best SEO Forum these days?'~"

  • are there any good seo forum which discusses more about keyword research?"""

  • This is great stuff to have out there. I think I saw this on gofucktalk though.

  • @Valerie, there are so many search engine optimization forums out there that could help you in keyword research.

Trackbacks/Pingbacks

  1. Google-Sitemap not loaded (403) - vBulletin SEO Forums
  2. Google Sitemap Errors | Stephan Miller
  3. Google Can't find my site - Thirty Day Challenge Forums
  4. Sitemap Google webmaster tool « MaseBlog
  5. Search Engine Optimization Dallas - Fixing my Google Sitemap Errors so I show up in the search engine results | Swanson SEO Services Blog

RSS feed for comments on this post , TrackBack URI

Leave a Reply

  • Recent Comments

    • Jefferey Winslow: Thank you.
    • Nakisha Butsch: Hi i love your blog, found it while randomly surving a couple days ago, will keep checking up. Btw...
    • musti009: hahaha! i made 500 today m veryyy happpy!!!!!!!!!!!!!!!!!!!!!!!! ! :mrgreen: :mrgreen: :mrgreen: :mrgreen:
    • Lee Matute: Sick of obtaining low amounts of useless traffic to your website? Well i want to inform you of a fresh...
    • firearms: I located your blog online and read some of your other blogposts. I just now included you to my Google News...
    •  Foot detox: my brother was one of those people that contracted H1N1, luckily, he survived...