防止什么|如何防止Robot骚扰你的网页

【www.gdgbn.com--网页配色】

今早在www.aspalliance.com上看到一篇文章（Stopping Automated Web Robots Visiting ASP/ASP.NET Websites
，http://aspalliance.com/1018_Stopping_Automated_Web_Robots_Visiting_ASPASPNET_Websites），
主要是讲了下如何采取一些措施，防止robot过度去抓你的网站。看了一下，有的东西还是值得探讨下的，现归纳如下：

1、辨认ROBOT的一些参考标准
Large numbers of requests from a single IP address or a range of IP addresses within the same subnet (i.e. the first three numbers of the IP address are identical).
・ Large numbers of requests for database driven content compared to the rest of the website.

・ Many requests made from browsers that do not support ASP Sessions.

・ Lots of and increasing numbers of website visitors, but no corresponding increase in transactions (e.g. sales!).

・ Large numbers of spam or automated requests being generated from online forms.
2、到http://www.robotstxt.org/wc/norobots.html上，可以找到一个组织提出的防御robot的建议标准（可惜这个不是什么权威标准拉，没什么约束力），在这里有一些平常我们可以用到的例子和方法，主要是搞一个robot.txt文件，放在网站根目录下，比如
User-agent: *
Disallow: /
禁止所有robot

允许所有的robot访问：

User-agent: *
Disallow:

User-agent: *
Disallow: /cyberworld/map/ 不允许robot探访/cyberworld/map目录下的文件

User-agent: cybermapper 允许cybermapper这个robot
Disallow:

User-agent: *
Disallow: /cyberworld/map/
Disallow: /tmp/
Disallow: /foo.html 不允许访问foo.html这个文件了

3、如果不方便设置robot.txt的话，还可以在meta里做手脚，比如用

可以单独对某页设置防御robot

4 减慢robot的疯狂访问。如果发现robot疯狂对你的站访问，而造成效率的降低的话，可以减低
robot的访问，
User-agent: Slurp
Crawl-delay: 10

是针对yahoo的，具体可以到http://help.yahoo.com/help/us/ysearch/slurp/slurp-03.html
去看详细情况。但其实有的robot很智能的，有时不会那么蠢真的一拥而上地去访问。

本文来源：http://www.gdgbn.com/wangyezhizuo/13809/

上一篇：启动web查看注册页_关于web注册页的可用性分析

下一篇：解析用图片的url引用提高网站排名的方法|解析用图片的URL引用提高网站排名的方法

防止什么|如何防止Robot骚扰你的网页

相关推荐

网友关注

热门标签

最新网页配色

网页配色热门文章