Apache logs – useful oneliners

From time to time I have a need to analyze Apache access logs real quick. In most cases, website owners are noticing some strange activity or unusual traffic peak and we want to check if this has something to do with a particular client or a particular day.

For instance, one of our clients noticed that there is an ususual high number of visits on the whole website (image stock) at the beginning of the current month. The pieces of bash code listed below helped us to pinpoint the source of the traffic.

Count traffic by day

To make our work easier I combined the access logs from the five past days into one combined log file. The first thing I checked was the numbe of requests per day:

> awk '{print $4}' combined.log | cut -d: -f1 | sort | uniq -c | sort -n

   67164 [02/Jun/2019
  402913 [29/May/2019
  484095 [30/May/2019
 1710491 [01/Jun/2019
 1936686 [31/May/2019

As you can see the number of requests on May 31st and June 1st is much higher than in previous days.

Count traffic by IP

In the next step I wanted to check if the traffic came from one or more sources. To achieve this I used my second one-liner:

> awk '{print $1}' combined.log | sort | uniq -c | sort -n | tail
# IP addresses are not real

As you can see, the majority of the traffic came from one of the IPs. I checked this IP location using https://www.iplocation.net/ and I found that the traffic came from Chile. This is not the usual source of traffic on this website.

Count by User Agent

As the last step I wanted to check if there is a particular User Agent string for this crawling IP. I wanted to check if this is a bot of some kind or somethign else. I used this one:

> grep combined.log |awk 'BEGIN { FS = "\"" } ; {print $6}' | sort | uniq -c | sort -n
       21 -
       97 Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36
  2834725 Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.167 Safari/537.36

It looks like someone was using the software with the User Agent set manually or maybe something like Selenium. Please note that the above oneliner can be a little bit tricky. If you don’t get responses you are expecting, try to adjust the $6 in the awk call – the User Agent information doesn’t have to be located in the sixth column of your log file. Please note also the fact that I used different Field Separator in awk call. Instead of default spaces I used quotation mark.