Bing ignores robots.txt

One of the long-standing conventions on the web is that automated search engine crawlers should follow a set of rules about what pages they should and should not visit and index. For many crawlers or bots, all you have to do is properly setup your robots.txt file, and viola, you control what the bot will and will not visit. The GoogleBot tends to behave well according to what is in the robots.txt file, but there are others, specifically BingBot that do not.

It seems that as far back as 2012 , the BingBot has had webmasters complaining about how it ignores their robots.txt files.

I have a site that uses Magento’s layered navigation to filter the products on each category page, similar to what you would see at Amazon or Best Buy . However, when all those layers are crawled by a crawler, it results in a lot of duplicate information in the indexes of Google and Bing. In addition, it creates unnecessary load on the web servers for pages that the vast majority of visitors never visit.

As a result of Bingbot not following directions, webmasters have a few options, but none are very good. The first is to block all traffic from Bingbot, which would remove you from the search results for Bing, always a bad idea. You could ignore it, spending more on infrastructure to handle the load that Bingbot creates. Or, you could forcibly block BingBot from accessing those URLs that are restricted by robots.txt. It seems that the third option is the best, and is the one to most easily implement in an Apache .htaccess file.

comments powered by Disqus

Related Posts

Always Use Automated Integration Testing

QA or Quality Assurance of a software project is often the area of software development that is most neglected. Typically developers avoid software testing like their lives depended on it. While a basic level of testing is required for a single scenario to validate that your code “works”, the level of testing that is required to ensure that all users have a good user experience across all targeted platforms is something that a developer seems to think is beneath them.

Read More

Visual Studio 2013 EditorPackage Did Not Load Correctly

One of the things that continually conspires to drive me away from Microsoft products and towards those that are free and open source are the random bugs that pop up from time to time in their incredibly expensive software. The other day, I had to restart my Windows development system and discovered I had an issue when I tried to start Visual Studio 2013. When Visual Studio tried to start and open any files that had been previously open or that I wanted to open for the first time, I got this error message: The ‘Microsoft.

Read More

SQL Server Transaction Log Exponential Growth

There are few things more frustrating than seemingly random issues that crop up in software when configuration changes occur. One such occurrence is when you migrate your databases from Microsoft SQL Server 2012 Standard Edition to Microsoft SQL Server 2012 Enterprise Edition with High Availability and the transaction log suddenly begins to experience exponential growth without ceasing. It turns out that when using Python and pyodbc on Windows to access SQL Server, there can be some unpredictable results.

Read More