Excel report from the database – the quick way in Python

If you are thinking of a quick way to prepare an Excel report from the data you have in your database, you can use Python. In order to achieve this, you will also need a few things:

  • Pandas – data analysis and manipulation library (I will not use full potential of this one here, but it is handy to manipulate data)
  • openpyxl – a Python library to read and write Excel files
  • database connector (in my case – MySQL connector)
  • database itself (in my example I used test MySQL database)

I will not perform any data manipulation outside the database in this example, so I placed all the “data preparation logic” in the query:

SELECT emp_no, first_name AS "First Name", last_name AS "Last Name", 
    birth_date AS "Birth Date", hire_date AS "Hire Date"
FROM employees
ORDER BY hire_date DESC

Now, the main part of the solution:

import pandas as pd
import mysql.connector
from openpyxl import Workbook
from openpyxl.utils.dataframe import dataframe_to_rows


def run():
    connect()
    employee_data = get_employee_data()
    employee_df = pd.DataFrame.from_dict(employee_data)
    disconnect()

    wb = Workbook()
    ws = wb.active
    ws.title = "Employee Report"
    for row in dataframe_to_rows(employee_df, index=False, header=True):
        ws.append(row)

    wb.save("report.xlsx")

As you can see, it is only a few lines. I connect to the database and pull the employee data. Next, I use Pandas to convert the dictionary returned from the database (I will get to this in a minute) to the Pandas Dataframe. This step is needed because I want to use the “dataframe_to_rows” utility function that is part of openpyxl.

Once I have my data in the dataframe, I create Workbook, select an active Worksheet, and set the title of this Worksheet. Once this preparation is finished, I simply iterate through dataframe rows using “dataframe_to_rows” and append them to the Worksheet.

The last step is to save the file on the drive. Here is how it looks like once generated:

There is no styling – this is a completely different topic 🙂 but you can see that the header contains column names like “First Name”, “Last Name” and so on. This is what I placed in the query itself. The result is returned as dictionary, my data gathering function looks like this:

def get_employee_data():
    query = """
        SELECT emp_no, first_name AS "First Name", last_name AS "Last Name", 
            birth_date AS "Birth Date", hire_date AS "Hire Date"
        FROM employees
        ORDER BY hire_date DESC
        LIMIT 0, 100
    """
    cursor = connection.cursor(dictionary=True)
    cursor.execute(query)
    result = cursor.fetchall()
    cursor.close()
    return result

That is basicly it. In a few steps you are able to obtain a proper Excel file ready for further adjustments.

The complete example code:

import pandas as pd
import mysql.connector
from openpyxl import Workbook
from openpyxl.utils.dataframe import dataframe_to_rows

connection = None


def run():
    connect()
    employee_data = get_employee_data()
    employee_df = pd.DataFrame.from_dict(employee_data)
    disconnect()

    wb = Workbook()
    ws = wb.active
    ws.title = "Employee Report"
    for row in dataframe_to_rows(employee_df, index=False, header=True):
        ws.append(row)

    wb.save("report.xlsx")


def get_employee_data():
    query = """
        SELECT emp_no, first_name AS "First Name", last_name AS "Last Name", 
            birth_date AS "Birth Date", hire_date AS "Hire Date"
        FROM employees
        ORDER BY hire_date DESC
        LIMIT 0, 100
    """
    cursor = connection.cursor(dictionary=True)
    cursor.execute(query)
    result = cursor.fetchall()
    cursor.close()
    return result


def disconnect():
    connection.close()


def connect():
    global connection
    connection = mysql.connector.connect(
        user="DB_USER",
        password="DB_PASS",
        host="DB_HOST",
        database="DB_NAME",
    )