Log merging and analysis using Python

From time to time I have a need to manually review logs gathered in the longer time period. I have to merge them together, filter irrelevant information and review most common errors. To perform such tasks I created simple Python script described below.

Configuration and merging multiple files

At the beginning of my script, I placed the set of configuration variables. I want to merge files which are stored in logs subdirectory relative to my script (please note double slashes – in Windows I have to escape them properly), save them in the single file (“all-in-one.txt”) and retrieve only lines containing the specific string and not containing strings from the list of exclusions.

import os
import operator

# configuration
log_files = []
start_dir = os.getcwd() + "\\logs\\"
pattern   = ".log"
all_in_one_filename = start_dir + "all-in-one.txt"
log_error_string = "ERROR:"
# expected patterns
expected = ["W3SVC2"]
# excluded patterns
excluded = ["semrush","YandexBot","User Agent not allowed",
            " INFO: ","BLEXBot","AhrefsBot"]

# get all log files
for dir,dirs,files in os.walk(start_dir):
    for filename in [f for f in files if f.endswith(pattern)]:
        log_files.append(dir + filename)

# prepare output file
with open(all_in_one_filename, "w", encoding='ISO-8859-1') as output_file:

# process file by file and save to all-in-one 
with open(all_in_one_filename, "a", encoding='ISO-8859-1') as output_file:    
    for log_file in log_files:
        print("procesing: " + log_file)
        with open(log_file, "r", encoding='ISO-8859-1') as input_file:
            for line in input_file:
                expected_found = False
                excluded_found = False
                for pat in expected:
                    if pat in line:
                        expected_found = True
                for pat in excluded:
                    if pat in line:
                        excluded_found = True
                if expected_found and not excluded_found:

print("files merged")

# [...]

As you can see, for the patters which are expected and excluded, I’m using simple “pattern in text” statement. If you are in need to use regular expressions, this is also possible. You can also notice files encoding – it was added because in some of my files there are specific chars that have to be processed properly.

Errors count

I need to count particular errors to see how important given issue is. In order to achieve this, I used the dictionary – keys are filled with error text, values with the number of occurrences of given error.

# [...]

# prepare dictionairies
errors_dict = {}
# retrieve error messages
with open(all_in_one_filename, "r", encoding='ISO-8859-1') as in_file:
    for line in in_file:
        bl_pos = line.find(log_error_string)
        if bl_pos > 0:
            end_pos = line.find(";",bl_pos)
            if end_pos > 0:
                this_error = line[bl_pos+1:end_pos]
                if this_error in errors_dict:
                    errors_dict[this_error] +=1
                    errors_dict[this_error] =1

# [...]

In the above code, I’m retrieving only the part of the line, starting from the “ERROR:” string and ending at “;” char. This gives me an ability to extract only the most important information from each line.

Sorting and displaying

As the last task, I’m sorting the error dictionary and displaying it. Most common errors are displayed at the end, so I can see them at glance, even when working in the command line.

# [...]
# sort dict
sorted_errors = sorted(errors_dict.items(), key=operator.itemgetter(1))

# display errors with count
print("Errors found")
for error_string, occurrences in sorted_errors:
    print("{}; {}".format(occurrences,error_string))

print("finished processing")

This is an example of the simple Python solution that can be extended for your needs. In fact, I’m using various versions of this script to handle various logs.