Excel report from the database – the quick way in Python
If you are thinking of a quick way to prepare an Excel report from the data you have in your database, you can use Python. In order to achieve this, you will also need a few things:
- Pandas – data analysis and manipulation library (I will not use full potential of this one here, but it is handy to manipulate data)
- openpyxl – a Python library to read and write Excel files
- database connector (in my case – MySQL connector)
- database itself (in my example I used test MySQL database)
I will not perform any data manipulation outside the database in this example, so I placed all the “data preparation logic” in the query:
SELECT emp_no, first_name AS "First Name", last_name AS "Last Name", birth_date AS "Birth Date", hire_date AS "Hire Date" FROM employees ORDER BY hire_date DESC
Now, the main part of the solution:
import pandas as pd import mysql.connector from openpyxl import Workbook from openpyxl.utils.dataframe import dataframe_to_rows def run(): connect() employee_data = get_employee_data() employee_df = pd.DataFrame.from_dict(employee_data) disconnect() wb = Workbook() ws = wb.active ws.title = "Employee Report" for row in dataframe_to_rows(employee_df, index=False, header=True): ws.append(row) wb.save("report.xlsx")
As you can see, it is only a few lines. I connect to the database and pull the employee data. Next, I use Pandas to convert the dictionary returned from the database (I will get to this in a minute) to the Pandas Dataframe. This step is needed because I want to use the “dataframe_to_rows” utility function that is part of openpyxl.
Once I have my data in the dataframe, I create Workbook, select an active Worksheet, and set the title of this Worksheet. Once this preparation is finished, I simply iterate through dataframe rows using “dataframe_to_rows” and append them to the Worksheet.
The last step is to save the file on the drive. Here is how it looks like once generated:
There is no styling – this is a completely different topic 🙂 but you can see that the header contains column names like “First Name”, “Last Name” and so on. This is what I placed in the query itself. The result is returned as dictionary, my data gathering function looks like this:
def get_employee_data(): query = """ SELECT emp_no, first_name AS "First Name", last_name AS "Last Name", birth_date AS "Birth Date", hire_date AS "Hire Date" FROM employees ORDER BY hire_date DESC LIMIT 0, 100 """ cursor = connection.cursor(dictionary=True) cursor.execute(query) result = cursor.fetchall() cursor.close() return result
That is basicly it. In a few steps you are able to obtain a proper Excel file ready for further adjustments.
The complete example code:
import pandas as pd import mysql.connector from openpyxl import Workbook from openpyxl.utils.dataframe import dataframe_to_rows connection = None def run(): connect() employee_data = get_employee_data() employee_df = pd.DataFrame.from_dict(employee_data) disconnect() wb = Workbook() ws = wb.active ws.title = "Employee Report" for row in dataframe_to_rows(employee_df, index=False, header=True): ws.append(row) wb.save("report.xlsx") def get_employee_data(): query = """ SELECT emp_no, first_name AS "First Name", last_name AS "Last Name", birth_date AS "Birth Date", hire_date AS "Hire Date" FROM employees ORDER BY hire_date DESC LIMIT 0, 100 """ cursor = connection.cursor(dictionary=True) cursor.execute(query) result = cursor.fetchall() cursor.close() return result def disconnect(): connection.close() def connect(): global connection connection = mysql.connector.connect( user="DB_USER", password="DB_PASS", host="DB_HOST", database="DB_NAME", )