When my fingers get itchy and I feel the need to tweak my websites, I make a copy of the page and work on it. So if I wanted to work on the homepage, I'd make a copy, rename the page as index2.php and work on that page. Then, I'd upload the test page to the server to test it. The terrible habit part is I tend to leave those test pages on the server, long after they have served their purpose.
On 2 May 2007, doing my routine morning checks on my earnings and web stats for my main site, I was shocked to see everything had crashed. Web traffic was down to a trickle (20% its normal level) and obviously earnings plummetted as well. My first reaction was to check my site to make sure it was up and running - web servers have been known to crash. Everything seemed okay. Then I checked my rankings. I found my homepage went MIA where I used to rank number 1 or on the first page of the serps. I used all my regular tools to check every keyphrase that I had consistenly ranked high. I was nowhere. Turning to the forums to see if there was a major update going on didn't help. Panic set in.
I didn't make any changes to my site recently and so it was a big mystery to me. I did a search for my domain name and found that I was missing there too. If you search for your domain name (minus the www. and .com) your homepage should appear as the first entry, unless it comprises very common words.
After 3 days of frantic searching, I resigned myself to the fact that somehow Google didn't like my site anymore. Bye-bye number one ranking. Bye-bye Trustrank. Bye-bye online money! The only thing I could do was wait and hope that this was an error on Google's part and that the problem would somehow correct itself. This went on for three miserable weeks.
Then one day - May 18 I think - I decided to do a search for my domain name again. This time, I went through all 800 plus entries in the results and towards the end of the results set, found two entries with the same title and description of my homepage. My heart skipped a beat. Looking at the URLs the two pages were :
- a copy of my index.php page which I had used for testing and then left it in the server.
- an RSS xml file which where I had used the same title and description as my homepage.
The funny thing was that index2.php had absolutely NO inbound links pointing to it, yet there it was in Google's cache, bright as daylight! Because this page was an EXACT copy of my homepage (except for the file name), I was sure that I had found the problem. Google probably saw index.php and index2.php as being duplicates and somehow chose to keep index2.php! I went to the URL submission tool in Google Sitemaps and requested for index2.php to be removed.
I couldn't remove my RSS feed file because many new aggregators were using this feed and to remove it would just cause too much headaches. I decided to block Googlebot from crawling the folder where this file was stored (something I should have done from the very start) and changed the title and description within the page. I then requested for the URL of this file to be removed as well. I had to wait about 3 days, but both URLs were finally removed.
Another 3 days later, I found my site back in the serps again, albeit ranking one position lower. Heck I couldn't care less. I was BACK! The best part was that traffic picked up again and earnings from every revenue source was back at their levels just before this mess started.
I did some searching and found that Googlebot DOES sometimes crawl pages that have NO inbound links to it. And as was my case, these crawled pages remain in Google's cache for about two to three weeks before being purged. I learned that there are a number of possible (read that as "unproven") ways that Google finds pages with no inbound links to it :
- The URL appears as a referrer in a pubic referrer log, or trackbacks, etc.
- Google crawls ISP logs
- The Google Toolbar (with Pagerank display enabled) reports the existence of the URL to Google
- Somebody submits the URL to Google using their Add URL form
There is also an answer in the Information for Webmaster about how Google crawls pages in our "secret server".
After that incident, I did a major cleanup of my files and deleted every duplicate page that I had created for testing. I also checked every page for duplicate titles and/or descriptions.
Lesson learnt, and not to be forgotten - clean up after yourself. It's easy to copy a page and do our testing on it, thinking that Googlebot doesn't see it, but it DOES! Don't leave duplicate files on your server even if there are NO inbound links pointing to it and ALWAYS make sure EVERY page has a unique title and description.
Hope this post will help someone, someday, somewhere...











Hi andrew, does the duplicate content make ur site that bad? As a webmaster, we always forgot some duplicate files. So, how is ur site rankign now? still #1? which keyword u search?
Hi Poh Ee... I thought duplicate files (even under a different name) were harmless, but apparently not. I have sinced cleaned my server of EVERY duplicate file. Now I also remove any file I use for testing as soon as I'm done.
Btw. Sorry for the late reply... I was away for a long time on my other business
Thanks Andrew, I have some cleaning up to do. My traffic stopped about 4 hours after I noticed Google had started indexing my new content and adding it to the search results.
My site, DressCodeGuide.com does indeed have some test pages hidden away exactly as you describe, and what's more many of the real pages are very similar, and at the time, had a standard page title and description etc.
U r a life saver. I thought I was gonna have to get a proper job (not that I'm making enough to live on yet but I lost all hope when traffic fell),
Matt.
Glad to help Matt. Time is my biggest foe. If not, I would post EVERY mistake I've ever made and how I sort of solved them - which would probably fill quite a few reams of paper. EVery mistake we make and learn from eventually becomes a blessing to someone else ; - )
Sorry, the link above is broken, the web site is at DressCodeGuide.