CSV, PANDAS and Python

We will analyze the data for the historical prices of SEB investment funds using powerful Python programming language modules – CSV and PANDAS.

The question may arise why should I use SEB Funds for the examples instead of stock price data which are standard source data for these kind of things? The answer is simple – I have savings account linked to these funds and I alone have duty of setting up the allocation.

In this article, we will take the first steps in deciding on which funds we want to see in our portfolio and which we will avoid.

When working with large-scale data tables that contain countless numerical values have to be stored somewhere. Of course, data can also be stored in a database, which would be the right choice if the data is to be regularly updated and stored for a long time, but this time we will look at the CSV format data, which will be our raw material format and cover the databases when we will continue this topic.

If you are more interested in storing data in a database, don’t worry, at the very end I will show you how to save the data into a database using sqlite3 module.

When I started this article, I was planning to create a Python code that will automatically take data from the SEB website for the proposed investment funds, but then this article would be just about it, not the work with CSV files (which is a much simpler topic) and their analysis. We will get back to this topic when it comes to obtaining data from websites.

What is CSV?

CSV or comma-separated values is a file format that is basically the same as widely known TXT files. The main difference is that in CSV all entries are separated by a comma, semicolon or any other separator by authors choice actually, depending on the regional settings. The world standard is a comma and decimal sign is dot “.”.

Regardless of separator selection, it is the same throughout the file and the structure of any CSV file is a table. With or without headings. In our example we will use CSV files with the historical prices of SEB investment funds.

Python library for CSV

As I mentioned earlier, unfortunately, I will not be able to briefly show how to get these historical data automatically, so I will have to download them manually. The time it takes depends, of course, on how many funds you will want to look at. I did not accept all, but 13 funds got together. You will do something.

I’m not really into the manual job, so I saved all CSV files according to their ISIN codes available on the homepage. It looks like this:

It would be helpful to get the list of these files in the Python code. I will not go to this step, but I would add that here we will get an ISIN code list without the .csv suffix. If you want to keep them csv, instead of simply add it to isin.append (i [: 12]) isin.append (i):

Now load one of the CSV files and look or get things far:

If you see something like that, then great!

CSV functions

  • csv.reader
  • csv.writer
  • csv.register_dialect
  • csv.unregister_dialect
  • csv.get_dialect
  • csv.list_dialects
  • csv.field_size_limit

In this article we will discuss only first two though.

Working with CSV files

Soon, we’ve already seen an excellent example of csv.reader. It’s just that simple. In my code above, the function csvFile.close () was superfluous, since using With open () should not be a problem when the file remains open, but keep in mind that files should be closed after their use. Working with With open () is the safest way, because even if the code gets stuck somewhere, the file will be closed.

The next step is to create a new CSV file with the csv.writer, but what do you write here with csv.writer (‘File’, ‘w’) writerows (‘Hi, the world!’) If we can do something more complicated right away, but more interesting?

Let’s do data aggregation

Things that we need to do:

  1. Create a new list (eg data = []);
  2. Collect data from all CSV files with ISIN codes in the title;
  3. Save data to a new CSV file.
  4. Get acquainted with the PANDAS module 🙂

So the first step is pretty simple, but for the second one, let’s just say:

I hope that I will not miss you for a cycle or with open, because everything seems to be simplifying. Things or data become completely abusive when we’re up to formatting. Also, now it might be easier to put ISIN codes in the column names, but it seems to me, then I would not only lose you, but would even spoil yourself: D

Here we read the string of csv data with str (row [0]) [: 10] and str (row [0]) [11: -1] for two reasons:

  1. We do not want the list in the list, so we read the contents of each respective list;
  2. In order to successfully continue, we need two entries in place of a single entry with a date and historical price: date and value.

Saving aggregated data to CSV file

When we are already so far, then the great stumbling block with the CSV module is coming to an end. You just have to save. In this case, I would like to add names, so let’s write one line before we start printing all other data:

Whew, finally, the boring part of the deal. Getting started with PANDAS or pd is the right thing to do, as is the case with shortening the code.

CSV and PANDAS

Let’s start by eliminating the shortcomings in our previous table by turning ISIN codes against dates and converting dates into dates:

Changing the date type data will have a slight effect on the speed of the code execution, but this is a necessary step to continue. If you want to remove it print (csvtab.dtypes). it’s useful at the start to make sure the data types in the table are correct.

Analysis of available data

The most simple analysis of all of our data is possible with a very simple command: table.describe (). As we rotate ISIN codes for column names, we will have statistics for each fund. Here’s a bounce from the results:

You may need to add, but remember that after any data manipulation, take some of the results and check the correctness of the results. I’m not saying that we have made mistakes here, but there is a negative experience working with other widely used, user-friendly systems. 🙂

Means, medians, standard deviations and numPy

You may need to add, but remember that after any data manipulation, take some of the results and check the correctness of the results. I’m not saying that we have made mistakes here, but there is a negative experience working with other widely used, user-friendly systems. :)…

Let’s examine how our result in the table of the stomach looks:

It does not look bad, but if we print this to a csv file and would like to continue processing and analyzing data with Excel, or by simply checking our Python result in another environment, it would be difficult for us to combine ISIN numbers with dates. It might be more convenient to blame ISIN codes for column names. You can easily do this with the .unstack () command:

Now the result is already practically 4 tables (amounts, standard deviations, average and median prices) that are logically ordered and in the foreseeable order (by dates).

Saving the data to MS Excel, CSV or SQLite database

Acknowledging I was surprisingly challenging to export to MS Excel. Perhaps this is due to the fact that at present my computer only has Linux Mint. But the good news is that I did manage it and we will look at how to save the options in both the MS Excel file and the two csv files.

MS Excel:

The print team, of course, is not obligatory, but I wanted to see a confirmation that at last everything succeeded. 🙂

CSV

Exporting to txt and csv files has always been the easiest solution:

SQLite

Since last week I wrote about the SQLite Python module, here is an example of how to export data to a SQLite database that would also be suitable for such a project:

This time we will stop here, but we will definitely return to Pandas and data analysis, as well as these investment funds.

If You like this post and would love to see more of this content, please spare a coin for further development of it

[wpedon id=”473″ align=”center”]