Simplifying Expense Categorization with AI and Python

The first step towards efficient personal expense management is understanding how your earned money is being utilized. One practice that I find very helpful is categorizing expenses. This means organizing expenses into different categories such as house rent, food, transportation, clothing, entertainment, medical expenses, and so on.

In this article, I will introduce you to a method for optimizing expense categorization using AI, specifically ChatGPT, and Python. Starting from your bank statement, we will explore how to generate a comprehensive list of your expenses and consult the AI to assign them a category from a set of predefined categories.

Prerequisites

To build our solution, we will need to invoke the OpenAI APIs in our Python script. I recommend following OpenAI’s technical documentation, which provides detailed instructions on how to do this. The main steps are as follows:

Register on OpenAPI.
Request the API Key for authentication.
Follow the instructions on the site that describe how to invoke the API.

Alternatively, you can also use the Azure OpenAI service. The experiments for this blog were conducted using the ChatGPT 3.5 turbo model.

The logic behind

The logic implemented in Python for automatic expense categorization consists of the following steps:

Reading expense transactions from various sources (e.g., bank account transactions, credit card statements). Depending on the file formats you have, the functions that perform data reading must be modified accordingly.
Creating a single DataFrame that contains all the expenses from different sources.
For each expense making an API call to OpenAI by sending the expense descriptions to obtain the correct category. Defining a suitable prompt is crucial to achieve the desired result. We will address this in the next section.
Assigning the suggested categories to the corresponding rows in the DataFrame and exporting the result to an Excel file.

In this blog, we won’t go into technical details or provide the Python code, as I have made the code freely available on my GitHub, with three sample input files.

The prompt

The main input to provide to the API is the prompt, which allows us to give specific instructions and requests to ChatGPT. Each prompt can consist of multiple parts, each associated with a role, to define the behavior and response of the model. The roles we will use, and that are supported by ChatGPT, are:

System: to set the assistant’s behavior and provide specific instructions on how it should act during the conversation.
User: for requests or comments that the model should respond to.

For the system prompt, here is the default message that I have used:

I would like to classify my expenses using specific categories. In input you will have the list of categories and the description of an expense. Please associate a category to the expense. Please only respond with the exact name of the category listed and nothing else. Categories:
{{categories_here}}

From the code, it is possible to customize the prompt and set the preferred categories. The default categories will be as follows: House rent, Supermarket, Internet home, Mobile phone, Gas, Electricity, Bank charges (card, taxes), Online services, Restaurants, Delivery, Aperitifs/bars, Shopping for Home, Clothes, Health, Courses, Technology, Transportation (plane, bus, subway, car).

Regarding the user prompt, it will contain the expense description read from the specified source files. The goal is to obtain a category suggestion associated with the expense from the API. During this phase, the precision and accuracy of the expense description are crucial, as well as the presence of references to stores or entities that the ChatGPT model is capable of recognizing. The description associated with bank transactions can be a source of error, as we have limited control over it.

Furthermore, to improve ChatGPT’s responses, it is possible to enhance the prompt with more details. For example, if there are recurring expenses that the AI doesn’t recognize, you can include the specific description of that expense in the prompt and explicitly ask to associate it with a specific category. This way, even in future analyses, ChatGPT will be able to correctly recognize that particular expense and assign it the desired category.

Results

Below is the Excel generated by the Python script, showing the expense category assigned for each expense. In this simple test case, the categories suggested by ChatGPT in the column label are quite accurate.

In general, it is advisable to use tools like this or similar ones as a resource to obtain suggestions from the AI assistant. However, it is always important to perform a thorough verification of the data generated by the AI.

Conclusion

In this article, we have explored how ChatGPT can be used to assist in categorizing our personal expenses. During active usage of this script, I have observed that utilizing ChatGPT has led to a significant reduction in expense categorization time, with a decrease of 60-70% compared to manual efforts. This demonstrates the potential advantage of using artificial intelligence as support for processes that can be repetitive, such as personal expense categorization. Despite the assistance provided by AI, human supervision and review remain crucial to ensure reliable results.

Prerequisites

The logic behind

The prompt

Results

Conclusion

By Mario Longo

Leave a Reply Cancel reply