Good and Bad Bots: Understanding Spam Bots and Spiders

By: Karl Schneider  | 03/06/2019


Tools for measuring and understanding user behavior, such as Google Analytics, are vital for tech users and are used all around the marketing world. For this reason, it’s crucial to understand what spam bots and spiders are and how they can impact your data and results if you’re not careful. Because, even with security measures, these threats can still impact your data. Learn what these data threats are and how measures are being taken to fight them.
 
 

Bad Bots

There are many bots with bad intentions, from cloning your content to seeking out vulnerabilities within your system. For example, in a Moz article written by Carlos Escalera, he states that “no matter what security measures it {a website} has, there will always be people trying to abuse its reach for their own interest”. Even CMS platforms, such as Wordpress, must take extra security measures to fight these bad bots, beyond built-in security. Extra security can include using strong usernames and passwords, or installing security plugins to fight hackers. In Google Analytics, you can set up filters to protect against this.

 

Good Bots

The good bots unsurprisingly involve Google and Bing bots that crawl websites in order to index websites and rate or update SEO rankings for sites. The great deals you’re getting from that Google Chrome plugin? Those come from a bot which crawls e-commerce websites, locating the best deals. Good bots also look for copyrighted content and will not scrape content from websites. While these bots mean well, they can also hurt your reporting. For this reason, using filters and security measures are key.

 

Spiders

Web crawlers, also called spiders, are a type of bot which can be configured to continually browse the internet. The benign versions of these are primarily used for tasks such as search engine indexing, providing updated web content from webpage(s), or validating hyperlinks and HTML. That said, they also operate without approval and consume resources on visited webpages, which could impact page load times via high server load.

A spider/crawler can also be used as a data collection tool to perform tasks such as compiling a list of email addresses from websites throughout the internet. The amount of software (and even Google Chrome plugins) available to accomplish this is prevalent and readily available to anyone. Most of the time, these email extracting spider/crawlers target industry-specific webpages to create a segmented list of emails for spamming recipients with information.

 

WHAT CAN YOU DO TO PROTECT YOURSELF?

Luckily, sites like Google Analytics have recognized these threats and created the aforementioned solutions and preventative measures. Other solutions include utilizing the robots.txt file or the .htaccess file. These files basically set the guidelines for what you’ll allow bots that want to access your site to accomplish/have access to. Webmasters who run the sites can leverage the robots.txt and .htaccess files to effectively ban bots from crawling their website in the future. This is an important first step to stop these bots from using your website’s data for an unapproved action.

 
Knowing what a bot is and how it functions is a key first step to learning how to block it. Knowing the difference between healthy and unhealthy bots is necessary and really comes down to filtering out the bad and keeping the good. The bots utilized by Google and Bing, for instance, crawl and index your pages so people can find your website organically.

If you need help differentiating between a good and bad bot, we can help! Just contact us and we can perform an audit of your site. 



 

Share this




Comments
Blog post currently doesn't have any comments.
 Security code