.htaccess tutorial

htaccess Elite


web site grabber...

Ask your mod_rewrite and Redirection questions here, and get answers!

web site grabber...

New postby Abacus » 03 Aug 2007 16:02

I've been using the following htaccess file for a few years. I'm not sure whether it ever worked or not but can anyone suggest any code that will make a log or any improvements to it?


errordocument 403 http://www.***/index.htm
errordocument 404 http://www.***/index.htm
errordocument 500 http://www.***/index.htm

(all 'Document Not Found' and other errors get redirected to my own error page.)

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^.*Ants.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*attach.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*BackWeb.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Bandit.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*BlackWidow.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Buddy.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*CherryPicker.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*ChinaClaw.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Collector.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Copier.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Crawler.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Crescent.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*DISCo.*$.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Download.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Downloader.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*eCatch.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*EirGrabber.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*EmailCollector.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*EmailSiphon.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*EmailWolf.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Express.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*ExtractorPro.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*EyeNetIE.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*FileHound.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*FlashGet.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Gameboy.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*GetYou.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Go.Zilla.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Go-Ahead-Got-It.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*gotit.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Grabber.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*GrabNet.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Grafula.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*HMView.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*HTTrack.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Image.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*InterGET.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*JetCar.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*JOC.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*larbin.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*leech.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*LeechFTP.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Likse.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Magnet.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Mag-Net.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Mass.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Memo.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*MIDown.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Mirror.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Mister.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Navroad.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*NearSite.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*NetAnts.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*NetSpider.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*NICErsPRO.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Ninja.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*NetZIP.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Octopus.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Offline.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*PageGrabber.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Papa.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*pcBrowser.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*PCUser.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*PiX.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Pump.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*RealDownload.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Reaper.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Recorder.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*ReGet.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Siphon.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*SiteSnagger.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*SmartDownload.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Snagger.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Snake.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Stripper.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Sucker.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*SuperBot.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*SuperHTTP.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Surfbot.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*tAkeOut.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Teleport.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Telesoft.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*VoidEYE.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*WebAuto.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*[Ww]eb[Bb]andit.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*WebCopier.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*WebEMailExtrac.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*WebFetch.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*WebPictures.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*WebReaper.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*WebSauger.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*eXtractor.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Vacuum.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Vampire.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*WebStripper.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*WebWhacker.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*WebZIP.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Whacker.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Wget.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Widow.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Wolf.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Xaldon.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Zeus.*$
RewriteRule ^.*$ http://www.use_a_normal_browser_to_view ... lease.com/ [L]

8)
TTFN
Jim
Abacus
 
Posts: 2
Joined: 21 Jul 2007 19:12

New postby mod_rewrite » 20 Aug 2007 15:43

Cool, what kind of log would you like? A list of blocked bots? List of errordocument usage?
mod_rewrite
 
Posts: 102
Joined: 30 Oct 2006 19:55

New postby Abacus » 22 Aug 2007 10:12

just a need an idea of how many users were blocked?

a list of their IP numbers would be a bonus

8)
TTFN
Jim
Abacus
 
Posts: 2
Joined: 21 Jul 2007 19:12

New postby produke » 25 Aug 2007 07:22

Ok, your current setup is bad for logging because both your errordocuments and your rewriterules redirect to an external site. What you could do instead is change your rewriterule so that instead of redirecting users it fails them. When it Fails them [F], they will be sent wherever your ErrorDocument 403 directs them.. so then have your ErrorDocument 403 go to /403.html and create a blank /403.html file... Then all you have to do is parse your error.log file like
Code: Select all
cat error.log | grep 403.html
and it will display all the requests that failed, including the user-agent (bot), ip, and failed request.

Code: Select all
errordocument 403 /403.html
RewriteRule ^.*$  -  [F]


Or you could just use something like awstats

see: fight blog spam with apache
User avatar
produke
 
Posts: 242
Joined: 25 Sep 2006 04:48


Return to Redirect or Rewrite Questions