Bing ignores robots.txt

One of the long-standing conventions on the web is that automated search engine crawlers should follow a set of rules about what pages they should and should not visit and index. For many crawlers or bots, all you have to do is properly setup your robots.txt file, and viola, you control what the bot will and will not visit. The GoogleBot tends to behave well according to what is in the robots.txt file, but there are others, specifically BingBot that do not.

It seems that as far back as 2012, the BingBot has had webmasters complaining about how it ignores their robots.txt files.

I have a site that uses Magento’s layered navigation to filter the products on each category page, similar to what you would see at Amazon or Best Buy. However, when all those layers are crawled by a crawler, it results in a lot of duplicate information in the indexes of Google and Bing. In addition, it creates unnecessary load on the web servers for pages that the vast majority of visitors never visit.

As a result of Bingbot not following directions, webmasters have a few options, but none are very good. The first is to block all traffic from Bingbot, which would remove you from the search results for Bing, always a bad idea. You could ignore it, spending more on infrastructure to handle the load that Bingbot creates. Or, you could forcibly block BingBot from accessing those URLs that are restricted by robots.txt. It seems that the third option is the best, and is the one to most easily implement in an Apache .htaccess file.

Related Posts

Apr 9, 2014
One minute

SQL Server Transaction Log Exponential Growth

There are few things more frustrating than seemingly random issues that crop up in software when configuration changes occur. One such occurrence is when you migrate your databases from Microsoft SQL Server 2012 Standard Edition to Microsoft SQL Server 2012 Enterprise Edition with High Availability and the transaction log suddenly begins to experience exponential growth without ceasing.

It turns out that when using Python and pyodbc on Windows to access SQL Server, there can be some unpredictable results. If you have a long-running SQL query that you are running from Python and pyodbc, when you are running it against a Microsoft SQL Server 2012 Standard Edition database, it will fail and time out silently, making Python think that the query succeeded. On the other hand, if you run the same long-running SQL query from Python and pyodbc in Microsoft SQL Server 2012 Enterprise Edition with High Availability, it will fail and rollback the query, but will fill the transaction log.

Oct 13, 2014
2 minutes

Web Browser Font Rendering is the New Edge Case

In the early days of the web, designers and developers relied upon visitors to the sites they were developing to have their chosen font pre-installed on their computers so that their web browser of choice would be able to properly render the selected font. As quickly became obvious, there is a wide variety of fonts installed across all computers worldwide, so this was not an achievable scenario, especially when print level typography was desired. Unfortunately, at that time, the solution was to put all of the text in an image, ensuring the text would display exactly as the designer had specified, but hiding the same text from search engines and blind users.

Sep 2, 2014
One minute

Visual Studio 2013 EditorPackage Did Not Load Correctly

One of the things that continually conspires to drive me away from Microsoft products and towards those that are free and open source are the random bugs that pop up from time to time in their incredibly expensive software. The other day, I had to restart my Windows development system and discovered I had an issue when I tried to start Visual Studio 2013. When Visual Studio tried to start and open any files that had been previously open or that I wanted to open for the first time, I got this error message: The ‘Microsoft.VisualStudio.Editor.Implementation.EditorPackage’ package did not load correctly.