Apache logs – useful oneliners
From time to time I have a need to analyze Apache access logs real quick. In most cases, website owners are noticing some strange activity or unusual traffic peak and we want to check if this has something to do with a particular client or a particular day.
For instance, one of our clients noticed that there is an ususual high number of visits on the whole website (image stock) at the beginning of the current month. The pieces of bash code listed below helped us to pinpoint the source of the traffic.
Count traffic by day
To make our work easier I combined the access logs from the five past days into one combined log file. The first thing I checked was the numbe of requests per day:
> awk '{print $4}' combined.log | cut -d: -f1 | sort | uniq -c | sort -n 67164 [02/Jun/2019 402913 [29/May/2019 484095 [30/May/2019 1710491 [01/Jun/2019 1936686 [31/May/2019
As you can see the number of requests on May 31st and June 1st is much higher than in previous days.
Count traffic by IP
In the next step I wanted to check if the traffic came from one or more sources. To achieve this I used my second one-liner:
> awk '{print $1}' combined.log | sort | uniq -c | sort -n | tail 17593 6.9.64.68 20231 7.8.39.242 30222 4.5.255.138 38483 6.6.50.195 42278 7.8.99.43 49451 5.6.224.145 56956 4.3.107.86 72411 8.5.178.227 115413 6.5.11.183 2834852 2.8.239.35 # IP addresses are not real
As you can see, the majority of the traffic came from one of the IPs. I checked this IP location using https://www.iplocation.net/ and I found that the traffic came from Chile. This is not the usual source of traffic on this website.
Count by User Agent
As the last step I wanted to check if there is a particular User Agent string for this crawling IP. I wanted to check if this is a bot of some kind or somethign else. I used this one:
> grep 2.8.239.35 combined.log |awk 'BEGIN { FS = "\"" } ; {print $6}' | sort | uniq -c | sort -n 21 - 97 Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36 2834725 Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.167 Safari/537.36
It looks like someone was using the software with the User Agent set manually or maybe something like Selenium. Please note that the above oneliner can be a little bit tricky. If you don’t get responses you are expecting, try to adjust the $6 in the awk call – the User Agent information doesn’t have to be located in the sixth column of your log file. Please note also the fact that I used different Field Separator in awk call. Instead of default spaces I used quotation mark.