Bing ignores robots.txt

One of the long-standing conventions on the web is that automated search engine crawlers should follow a set of rules about what pages they should and should not visit and index. For many crawlers or bots, all you have to do is properly setup your robots.txt file, and viola, you control what the bot will and will not visit. The GoogleBot tends to behave well according to what is in the robots.txt file, but there are others, specifically BingBot that do not.

It seems that as far back as 2012, the BingBot has had webmasters complaining about how it ignores their robots.txt files.

I have a site that uses Magento’s layered navigation to filter the products on each category page, similar to what you would see at Amazon or Best Buy. However, when all those layers are crawled by a crawler, it results in a lot of duplicate information in the indexes of Google and Bing. In addition, it creates unnecessary load on the web servers for pages that the vast majority of visitors never visit.

As a result of Bingbot not following directions, webmasters have a few options, but none are very good. The first is to block all traffic from Bingbot, which would remove you from the search results for Bing, always a bad idea. You could ignore it, spending more on infrastructure to handle the load that Bingbot creates. Or, you could forcibly block BingBot from accessing those URLs that are restricted by robots.txt. It seems that the third option is the best, and is the one to most easily implement in an Apache .htaccess file.

Related Posts

Apr 9, 2014
One minute

SQL Server Transaction Log Exponential Growth

There are few things more frustrating than seemingly random issues that crop up in software when configuration changes occur. One such occurrence is when you migrate your databases from Microsoft SQL Server 2012 Standard Edition to Microsoft SQL Server 2012 Enterprise Edition with High Availability and the transaction log suddenly begins to experience exponential growth without ceasing.

It turns out that when using Python and pyodbc on Windows to access SQL Server, there can be some unpredictable results. If you have a long-running SQL query that you are running from Python and pyodbc, when you are running it against a Microsoft SQL Server 2012 Standard Edition database, it will fail and time out silently, making Python think that the query succeeded. On the other hand, if you run the same long-running SQL query from Python and pyodbc in Microsoft SQL Server 2012 Enterprise Edition with High Availability, it will fail and rollback the query, but will fill the transaction log.

Jul 8, 2014
3 minutes

Always Use Automated Integration Testing

QA or Quality Assurance of a software project is often the area of software development that is most neglected. Typically developers avoid software testing like their lives depended on it. While a basic level of testing is required for a single scenario to validate that your code “works”, the level of testing that is required to ensure that all users have a good user experience across all targeted platforms is something that a developer seems to think is beneath them.

Sep 19, 2014
3 minutes

Apple's iPhone Announcement is a Big Deal for T-Mobile

Every year, we are treated to a big show from Apple about what the next iPhone will be like, and how magical it actually is. In case you have been living under a rock, this major Apple annoucement is one of the largest news-making fancy press-conferences you will see these days. It used to be this way when Microsoft would launch a new operating system, remember that launch announcement and launch party for Windows XP? What about for Windows 8? Oh yeah, these announcements are only a big deal when you are the dominant force in the marketplace instead of trying to play catch-up in all areas because your technology is old.