Is P2P lending worth it?

Based on statistics from Mintos P2P lending platform.

If you would browse the net for passive income ideas sooner or later you would read that P2P lending is a great source of passive income. Although it comes with notes that it is pretty risky as well. In this article I will share results of different investment simulations based on P2P statistics found on Mintos home page.

The stats from Mintos

  • Average annual return: 11,89%
  • Number of defaulted loans: 7,69%

Impressive.

Simulation – Mintos case

I ran 1 million simulated investments with a 7,69% chance that any of them will default but if they don’t go bust they give 11,89% annual return. And even though I reran the simulation couple of times (so more millions of simulated investments) I got same result:

  • Investors have the default rate of 7,7%
  • Investors annual return is -4,3% loss.

Simulation – bit worse than Mintos

For this simulation I assumed that return stays the same since most of P2P lenders offer 10% – 12% returns, but for default rates will be at the same amount.

  • 11,8% of investments default
  • Investments bring 13% loss

Simulation – Bondoras case

It is really difficult to get default rates from Bondora directly but I found some info here.

Based on that I set default rate to 24% and return rates to whooping 20,93%.

With these risky end stats we lose 32% annually but let’s check the safest option on Bondora: Risk of default 13,06% with returns averaging at 25,8%:

  • Annual return: – 3,56%

Conclusion

It looks like unless P2P platform provides risk free investment opportunities, they do not seem appealing except a case when loosing 4% per year is your most profitable option.

Additionally there is a credit risk of the P2P company itself. They are not around for a long time and we don’t know if they will live through next financial crisis.

After playing around with my simulator I found that we can get into profit if we manage to lower the default risk to 4%. But there are platforms that offer safe investment opportunities where the only risk You take is the risk of that platform going bust. One of such platforms is Twino.eu (can’t do simulation for it since they don’t publish default rates for loans in their platform) but there must be others as well.

Just wanted to add one final conclusion: The high number of recommendations to invest in P2P lending can mean only one thing – P2P lending companies have some good affiliate rates. You can check the affiliate program whenever somebody or some article suggests You should invest in P2P lending. 

Stay safe! 🙂

Evaluating apartment market price with Machine Learning

I have taken a pause form blogging for quite some time because we (me and my fiance) were busy lately. Busy looking for our first home – an apartment which we could design as we please and then later on rent out once we can afford to buy a house. #DreamBig

If You have gone through this process of manually scrapping post boards with hope to find the “perfect one” then You know it’s a nightmare. Does not matter in what country or city You live in. It’s terrible. So I decided to build a tool that would assist me with this task. So the web scraping project was started and it quickly turned into machine learning project as I understood that data can be misleading and I am not competent enough to evaluate what could be market price for given apartment.

Since the web scraping part is pretty darn easy (and surprisingly long due to all the cleaning that has to be done) I will skip on that and let’s review the Machine Learning algorithm instead. Also the script is pretty messy and is running stable only for 15 days now. I just don’t want to clean it up so that I could show it to public. Not this week.

As usual, I will give the full code at the end of article.

The beginning: Imports

So for this project I currently use two algorithms: Gradient Boosting and Random Forest. (Yes, I love that lazy random forest… 🙂 ) And as the luck might have it – both are part of Skicit-Learn package – ensemble.

Other important mentions are pandas and default package for training set split (sklearn.cross_validation.train_test_split).

The setup

As You might have suspected the data-set (even though it’s “in-house” built) is neither split into training and testing sets nor is it perfectly filtered and formatted. Although I did make sure it’s formatted as well as possible withing reasonable amount of code lines. So we won’t exaggerate over formatting and pre-processing.

What we will have to do however is read the data into dataframe from csv that my previous code exported and filter the data on columns and drop NaN values.

 

Nuts and bolts: The functions

So here comes the stuff that makes things more interesting – functions for calculating average prices on 3 different levels:

  • Street – Region – District – Project
  • Region – District – Project
  • Region – Project

In this data Street is street first word form street name, Region is state or city in which the Street and District are in and finally District is district in which the apartment is in. Whit Project we understand the type of building in which the apartment is located. This is mandatory field in Sell posts so sellers should be aware what type their building is.

Important note: data consists from information provided by sellers. And if it’s not obvious in country where you live in I will point out that sellers are not always concerned about correctness of information in their posts. Sometimes mistakes are made by accident but sometimes intentionally. Often issue in this dataset is incorrect deal type (Sell/Rent) stated or incorrect Project stated.

With three functions we will try to get average prices excluding the one in question. So if we are reading data for 15th apartment in data set we will start by excluding it from data and only then calculate the mean price per square meter. In case algorithm fails to get the value we are looking for then we fall back on larger aggregation: Street Price -> Local Price -> Project Price.

We aim for two things with this:

  1. Letting machine learning algorithm know that there this specific flat is somewhat different from several others based on average values. Each city and street should have different average values given that there is enough data for it.
  2.  Providing a simple hint on what the price should be like since most if not all real estate valuation methods imply comparison with similar real estates.

 

The loop

The big event is here. The loop through dataframe in order to clean, reformat and add data to existing dataset.

Basically we will just make sure that we have integers for Rooms, Floor, Max Floor and floats for prices and apartment size. And also we will add data using our previous 3 functions but first we will check if we have link as a string and floors are not astronomical. 🙂

Once we have gathered everything in a list of lists, we will build another dataframe named xData which will be actual source for our machine learning practice.

I had some bizarre errors along the way when writing this short program so i added few more rows to it just to check what’s left of my initial dataset and also check if data types are correct.

Finally: The ML part

So finally we got to the fun part – training our machine learning models and testing them. Actually this part is so straight forward, I could had copied it from actual source example showed on http://scikit-learn.org (but I didn’t). Still, I won’t get into too much detail here as you have seen this code for dozens of times if You are interested in ML.

I would recommend playing around with the features I used here until You get best results for yourself. From what I noticed changing other features does not positively affect the resulting model scores.

Tip: I observed that on my data Gradient Boosting model with max_depth equal to given number of features provides highest score rates!

Conclusion

Despite the simplicity of this model and limited observation count it provides surprisingly high accuracy – missing the price (more than 30%) only around 10% of time. Which to my mind is quite low taking into account that there are flats that are extremely bad and then this value represents potential price after renovation or extremely luxury in which case it’s not for You. Otherwise you wouldn’t be looking at actual market value of property.

Accuracy (% of total)

As always there are things to be improved here but overall I am satisfied with the result since the outcome is getting more precise each day I get new data in. to improve this algorithm I would need to add dates and rent/sell fields into play as well as treat the post text somehow. Possibly bag of words method. But I am not sure if i will be doing any of that since I have found the apartment we were looking for and bought it. 🙂

Full code as promised:

Visualizing portfolio historical performance with Python

Hey there!

In this article we are going to check out how to examine and visualize the historical performance of a portfolio. This can be any portfolio but since we scraped prices for SEB funds in our previous article (Web scraping investment fund prices with Python) we will use those prices here as well. If You haven’t read that article it would be good to at least check out the full code used there as we will import it and use one variable from it. Also by doing so we will initiate the whole thing to run and update the prices for us here.

The set up

This is how we will do this:

  1. Define functions for gathering historical prices for our portfolio allocation
    • Gather prices per chosen funds
    • Calculate total portfolio price based on weights in allocation
  2. Define function that would get all the ISIN codes of our funds and run the price update
  3. Put together our portfolio return, SEB fund average returns for performance comparison

Pretty easy but I will still guide You through it. No worries about that.

The walk-through

The imports

We will need 3 things to be imported and one optional:

  • pandas – for data cleaning and manipulation.
  • matplotlib.pyplot for data visualization.
  • datetime to get today’s date.

Optional:

We could just read MS Excel in stead but I wanted to have this file updating the previous one. I know this could be done in a simpler way but whatever. What’s done, done.

Also we could say that matplotlib.pyplot is optional since it is possible the get the charts form pandas DataFrames as well, but whenever I run it from terminal I see only matplotlib and no pandas charts so I would still trust the main chart to matplotlib.

Function for loading prices form MS Excel

This function will read excel file we previously saved and retrieve an aggregated portfolio price based on given allocation and prices from sheet ‘DATA’. It requires ‘allocation’ as a pandas DataFrame with fund ISIN codes and allocation weights for our current portfolio.

In order to get that nice (an single) list of of prices we will need to go through these steps:

1. Create a list of column names (ISIN codes) for allocation

Initially I believed that I can just go with allocation.index as a list for filtering, but something in pandas proved me wrong. The issue was that ‘allocation‘ had ISIN codes that were not saved in excel. Reminder: We dropped funds that didn’t have at least 1000 prices/days when gathering prices.

Why we included ISINs for which we don’t have prices in allocation? It happens when we want to take average fund performance by giving each ISIN equal weights. This shouldn’t be a problem if You would use the same ISIN codes that we have for our portfolio.

Quick workaround – looping though the all ISIN codes saved in prices and then through allocation.index putting ISINs that matched in separate list – filterColumn.

2. Filter and adjust prices according to given allocation

This task calls for another loop!

But if You think of it we could have done it all together in previous loop… All the bright ideas come to me too late. I am not willing to change this setup (since it works well).

Anyway. We will do the following:

  1. Basic double-loop operation on fund prices DataFrame index and filteredColumn ISINS
  2. Append a temp list p for our portfolio price.
  3. Drop NA values everywhere to have only clean data

Here the parameter errors=’coerce’ means that if there would happen to be an error while converting DATE to date then it would deliver NA which we would drop on sight.

3. Group by date and deliver some stats

Lastly we want to use these neat and clean price dates as index as they are unique (at least should be) so we will use function group that would assign whatever we are grouping on as an index.

If for any reason there would be duplicate dates we would sum up the prices of our portfolio anyway and escape an error. But so fare there haven’t been any duplicates..

And as a cherry on the top, and this is optional, we will get some statistics on our prices. We could just go with pd.DataFrame.describe() but we don’t need all of that stuff. We will settle with STD, Mean return and Sharpe ratio:

It looks a bit ugly but I believe that using an extra variable when it’s not needed just puts some extra weight on memory so I went with raw portfolio.pct_change().dropna() in stead of assigning it to a variable.

To be perfectly honest with You, I will drop these print() lines of code as soon as this article gets published because I have some bigger plans for this in near future and I would prefer it not doing non-obligatory calculations. Meanwhile the current output is:

Before You continue I just wanted to say that .dropna() in case with pct_change() is not to drop erroneous entries. It is for the first record that’s always empty when You use .pct_change().

That being said, here’s the full code getPortPrices function:

Function for initiating SEBscraper and getting ISIN codes

This is a short one – it imports our previous project SEBscraper as seb and initiates price update from which we take DataFrame funds. As we noticed in previous function You have to be careful with this one because we didn’t take prices for all funds that we got from overview page. We only took the ones that had significant history (more than 1000 prices)

Putting everything together

Finally all the pre-requirements have been fulfilled and we can get our hands wet with the data.

we will start with the print() to let ourselves know that we are starting the process and initialize the price update. After prices are updated and we got the latest list of fund ISIN codes that SEB offers, we will read the prices out form the Excel file where they are saved.

We want to process them in two ways:

  1. Get our current protfolio allocation
  2. Get an average of SEB investment fund performance

Well, we will calculate the performance but in order to get to that, we will assign equal values to all funds. We can also add other allocations that we might like. I have prepared the split in BONDS, EQUITY and MIX based on their names. Here’s a reminder how it looks:

If You wanted to comment on my chosen allocation that it looks like I’m 60+ then I agree. Way too conservative and I expect to see it in charts as well.

Here’s where we start:

So now we have everything ready in DataFrame totalPrices. The code gives the following output:

AS we can see our bench a.k.a SEB investment fund average performance outperforms our allocation in returns and Sharpe ratio and our allocation beats it in capital preservation. But since I am not ancient, I would prefer to be taking more risks for bigger winnings…but more on choosing optimal allocations in next article.

The plot

So we get to the fun part – data visualization. First thing I want to stress here is that we are dealing with dates for which we have prices for. So I would assume that all of these are working days. In other words, picking last 360 days would not be equal to 1 year. In this code I have made an assumption that each month has equal amount of business days = 21. And that means that quarterly data is 63 days and annual – 250. This will be relevant real soon.

And by soon I meant right now. We will check standard deviation (STD) for past 250 days and also check price movements for the same period with moving average for a month.

For the STD that represents risk we see that our portfolio is 10x safer and more stable than the average:

But it has it’s trade-off in average gains:

At some points we would of course enjoy having the portfolio that we have here. Like, in all those price drops below 0, but in general we can see that average SEB fund enjoyed higher rise in price than our

allocation. Also what we can notice here is the fact that 250 days back is not equal one year. More like 3/4 of a year. We can note this in further work that’ s needed to be done. In this case for comparison reasons it’s fine as long as all date ranges are the same for single comparison.

Let’s take a peek at our price movement frequency in histogram:

Yeah..so it’s not all bad if we do not take into account inflation and stuff. I mean it’s sort of mostly positive…right next to zero. The fact that price changes were mostly upwards is actually really good if You thin about it. It just sucks that those are not significantly large movements upwards.

Let’s move on and check the price movements for last 63 dates and also add some extra stats with it:

We also get the following output:

Nothing to be proud of since we are in the read for last ~3 months. This just keeps sending us signals that there is a better allocation among these funds. If we assume we have a choice among only these funds we can be sure that we can find a more optimal allocation than the one we have currently. And this assumption is true for me since I have a life insurance linked to these funds and I have the privilege of setting the allocation. (another reason why we will definitely discuss asset allocation optimization)

One last plot to view is a scatter plot for STD/returns to see how well risk and return go hand in hand. Obviously we want to have the least risk for the most returns. But let’s see if this will be the case. From previous metrics we can already anticipate low risk scores and not so good looking returns.:

You might guess which one is our portfolio in there. It’s a clear indicator that we need to reallocate the portfolio for better performance.

Saving the summary figures to MS Excel file

So now we can save our results in an Excel spread sheets and close this topic so we can move on to the optimization. 🙂

As You might have noticed I added bit more periods to the summary so it has quarterly, annual and 5 year figures. Or at least what we consider as equivalents of those time periods within this project.

Full code:

As always, here’s a full code for this project (go to web scraping fund prices article to get the code for SEBscraper):

Simulating outcomes of Black & Red roulette game series

This one might be off the point a little bit but I was really surprised how many people have tried and burned with this crazy Casino strategy. Well I hope most of people visiting my blog wouldn’t even attempt to outplay Casino but in case You were wondering how to test your “strategy” for outplaying Casino let’s see an example Black & Red roulette game strategy.

The Back & Red strategy

The main principle is very simple – You bet on either Red or Black and stick with it until You win. To cover the losses You double your bet after each loss. That’s it. Simple as that and that’s what should make anybody doubt the strategy. Clearly the strategy to win Casino must be as difficult as rocket science. Let’s test it.

The code

Let’s calculate what would be the average outcomes of this strategy.

The outcomes

In order to get more or less reliable data let’s run 100k tests for 1k buks betting 5 each time. After some serious calculations I got these results:

After these runs I did another round of 100k simulations just to check if result are significantly different. This is what I got on my second trial:

To put the money pool movements graphically:

So let’s read the results:

  • If You are about to gamble 1k USD You should have at least 1375,45 USD (5 USD bet x 27509% bet increase) to save your ass if things go south.
  • be mentally prepared to lose 963,95 USD (5 USD bet x 19279% lost bet) or more since these are only averages.
  • You should quit while you’re not losing which would inevitably happen at around thousand rounds.
  • I must admit that our standard deviation and winnings by color measurements look shady so I wouldn’t draw conclusions out of them.

One other thing that I wanted to check – do outcome numbers change if my cash changes. Meaning, if I have 10 grand instead of one or if I have only 100 buks to gamble but I am still a big chicken and will bet 5 as my initial bet. As we saw from previous results the bet significantly increases anyway.

So let’s see this in more detail.

Having 10x the cash

This time we will gamble away 10 000 dollars. How fast can we lose 10k USD?

Let’s try one more time just to see if result significantly changes:

So let’s read the results:

  • If You are about to gamble 10k USD You should have at least 12 189,70 USD (5 USD bet x 243794% bet increase) to save your ass if things go south.
  • be mentally prepared to lose 7 724,70 USD (5 USD bet x 154494% lost bet) or more since these are only averages.
  • You should quit while your not losing which would inevitably happen at around nine thousand rounds.

How long 100 dollars last in roulette?

This will be more realistic since I wouldn’t consider gambling on more than 100 dollars if I have 0% chance to get ahead of the game.

and another round of simulation for comparison:

So let’s read the results:

  • If You are about to gamble 10k USD You should have at least 177,05 USD (5 USD bet x 3541% bet increase) to save your ass if things go south.
  • be mentally prepared to lose 111,15 USD (5 USD bet x 2223% lost bet) or more since these are only averages.
  • You should quit while your not losing which would inevitably happen at around nine thousand rounds.

Conclusions

First of all I must admit that this test has turned out to be a fail at this stage. More work on it is needed to get statistically significant outcome but I since this wasn’t really a project I wanted initially to spend nearly as much time as I have already, I will call it quits here and ask You to comment if You have any advice how to make this simulation more accurate and memory efficient. Currently this model is eating a lot of memory and I am not 100% sure about the outcomes as it suggests that we can win money with this strategy although it’s not true in reality.

Having that said, here’s what we can conclude here:

  • Amounts of winnings and losses are have a strong positive correlation with money being gambled – if we gamble more money we get higher losses and winnings percentage wise.
  • We can expect that at some point our bets will be higher than the amount of money we wanted to gamble. So we would need to have at least as much money to spare as we have to gamble in order to win this game. And also be willing to go all in to come out clean.
  • The amount of rounds to lose all our money is also increasing with increased money pool and increasing bet decreases the amount of rounds needed to lose all your money.

Last notes

While these simulations show that there is theoretical chance of winning significant amount of money keep in mind that You have to be rather crazy and willing to go all in. Also, You would have to have strong discipline and a lot of extra cash for this to have realistic probability to work.

Web scraping investment fund prices with Python

In this article we will look at a relatively short Python project that would read the names of all SEB investment funds, ISIN codes and their historical prices. The work if we would do it our selves is easy and we could do it manually but I personally got bored at the fourth fund and it’s prices. So let’s do it all automatically with the power of Python!

Why SEB investment funds?

I could think of only 2 reasons at the moment. In this case we will do this for educational purposes but in general we could do this if:

  • we want to invest and of course analyze the time series data but there are little to no money that we could spare for investing. But we are SEB’s clients. Bank clients usually purchase the shares owned by the bank at a reduced commission and can also invest relatively insignificant amounts.
  • We want to practice web scraping on some basic pages. Believe me, in finance it’s hard to find web pages that are more basic than big old banks that are around for century. They never had a necessity to upgrade their web presence. (up until now…)

When I started this article I was still working at SEB and from those times I still have life insurance for which I have to determine the allocation of my money to their funds. In the future we can expect the continuation of this topic in connection with analysis of the time series data as SEB does not offer an opportunity to look at the historical performance of your portfolio themselves. This task is left to me (and You). If you have a similar problem with SEB or another bank this series of articles might be not only educational but even useful.

The Code

For those who don’t care about my monolog around the code here’s the complete code:

But those who want to view this code in separate parts and continue to read my monologue I chose to split this fairly short code in 3 steps:

  • Acquisition of fund name and ISIN code
  • Reading each individual funds’ pages and finding prices
  • Save prices in DataFrame

Acquisition of fund name and ISIN code

For those who wonder what ISIN is:

The International Securities Identification Number (ISIN) is a code that uniquely identifies a specific securities issue. The organization that allocates ISINs in any particular country is the country’s respective National Numbering Agency (NNA)

.

Source: Investopedia

So it’s clear – ISIN is a unique sequrity reference made up of two letters (based on country) and 10 digits. Total of 12 characters combination. In our case it also has an added value – it is included in the HTML addresses of the SEB fund price pages. In other words, we will get them, because they are vital for us to successfully fulfill our plan but on this topic we will return to the next section on reading individual pages of the funds.

How SEB investment fund pages are built/structured?

Open SEB Investment Fund price page and right-click to select Inspect or whatever web browser you are using. You will see the website design from inside. I use Google Chrome so in my case I right-click and Inspect. You should open something like this (on the right side):

I am not ready to run an introduction to HTML course here so I will describe it in my words. Hopefully some would describe it as plain English.

Here you are looking for an element that starts with <table> which obviously marks some table element of the page. Each

to </ table> consists of lines which in turn are marked as and between tr marks we expect to see column separators </ tr> and each column has in turn column separators

 

containing data inside </ td>. Try to expand these elements on your own and shortly You will see that we are interested in second

element of each

that contains information about SEB Funds. The records we are looking for should look something like this:SEB aktīvais plānsAs you can see here we have both the ISIN and fund name. So you need to find every

entry containing these two elements but you must first read the page with Python:

So here we use 2 modules: requests and BeautifulSoup from bs4. We will use requests to get the data in our Python and save to variable fundList. Then we want to process the HTML object into more readable format so we take HTML text from the fundList and parse it through BeautifulSoup. Don’t forget to add .text or .content to your requests response variable when passing it to BeautifulSoup. I must admit that this part is hard to mess up.

BeautifulSoup allows you to use a function like .find_all (‘tag‘). This method lets us select all HTML elements/tags that match whatever tag we have provided it with. The normal .find (‘keyword‘) also works but it stops at the first element. So lets write .find_all (‘tr’) and .find_all (‘td’), because these two elements are the only ones that interest us. We don’t need .find_all(‘table’) since every table is built with

and

.

The isin list should be defined early otherwise we will have an error as soon as we use the .append () method to add new data for a fund to the list.

Saving information to DataFrame

It is more convenient to work with 2D data if it’s stored in a DataFrame rather than lists or arrays. DataFrames provide various built-in operations that we can perform on given data tables:

Reading each individual funds’ pages and finding prices

By clicking on the fund or in the element of the section of the above, we see that the fonts addresses have a simple structure. It is enough to replace the ISIN code with the variable from the isin [1] and we will get the HTML address of individual fund.

Structure: Where to find prices?

Same as before we need to find the part of the HTML code that interests us. It is fairly fortunate that these pages are not designed according to the latest fashion with jQuery requests or flash. Then it would be harder scrape them… at least for me.

When looking at the code inside the Python you can use the soup.prettify () method which will indent the text block making it look more structured but we can of course also inspect the page in our browser. If you quickly scroll through the code You will realize that this time we will be interested in a place that starts with:

Bunch of rows with data in middle and it will end with this:

Once you locate the information you need to define starting point and ending point. I would definitely not recommend slicing from heading 1 to heading 35 (these are random numbers just to show an example). Maybe some of the readers could recommend me a more efficient method of handling this but I will use the .find () text method to locate starting/ending points of each price per date.

So let’s stat reading:

If you read this article in order to learn how to download information from websites with Python I have to warn you that it takes most of the time to clean up your data or to search for the right cleaning solution. I had to play around a bit until I got this part done right…or at least working. I wanted to find a more fancy solution for this data cleaning issue but at the end of the day I did’t care how it looks as long as it works.

As you can see I use the txt variable to temporarily save the piece of text that soup.find_all (‘script’) delivered. The main task of txt is to clean the text as much as possible so that we don’t have to do much of a cleaning later on. Here we will use the simple search function to find start and end points based on what we saw in HTML code. Main task is to find correct key words or signs that are unique to these

How we cut our expenses by 30%

Are You struggling to save money for things You would love to have? Stuck living paycheck to paycheck? Are Your bills eating up all the income and You are sick of hearing all those “Get extra income” tips? You got to the right article then. I will share how we managed to save approximately 30% of our costs by slightly changing our routines and building new habits.

I believe everyone wants to spend less on casual/boring things like bills and household supplies in order to have more money for relaxation, better lifestyle, hobbies or just something to put aside in their savings account. Recently we managed to cut monthly spending by about a third. And I’ll tell you how to do it!

In short, I could name two actions:

  • Keep track of monthly expenses
  • Plan a weekly meal.

If You are not the kind that would enjoy reading and stuff the shortcut could be just buying ready to use (cheap) templates and go with them. Like these for example:

Finance Planner, Printable kit:

Finance Planner, Printable kit

Printable Weekly Meal Planner:

Printable Weekly Meal Planner

And there are plenty other like these. You can even get a Household planner if you would like to (but necessary needed if You have both above templates). It supper easy concept and for those who are still interested I will share a bit more with our experience and how we started.

Monthly costs

When we are looking at budget form cost cutting perspective we are more interested in the highest cost items, our largest expenses. We want to know where money disappears and whether all spending is justified. So we began to explore bank account transactions and checks. It’s not rocket science but it’s important to understand that it’s a two step process: analyse and take action.

we can break it down into these smaller steps:

  1. Collect information
  2. Categorize the expenses
  3. Create cost overview
  4. Create a summary
  5. Make a decision and execute it

Let’s go through the whole process.

Collect information

I find it hard to gather cash expenses because going through bunch of receipts is just so boring so I try my best to pay with a debit card wherever possible. Down of it is that it will be really hard to put it into category later on but we have aligned our expense categories to a the detail of our bank account transactions. in other words we try to sort bank transactions by the recipient of the transfer. Of course, there is a portion of the cost that can not be covered from the debit card (rent, market purchases, etc.), but most of our expenses are done through debit card nowadays.

To make it easier to “work” with bank account transactions, enter information from receipts in Excel together with account transactions and try to stick with a format that works for both. You will loose details from receipts, but it will be easier to draw some statistics out of it later on.

This raw data needs to be aggregated and split into categories in order to get the bigger picture. We try to categorize the expenses to see how they change month by month. It also helps to predict how much we should have in one or the other category at the beginning of the month.

Tip: Align your budget period to your income frequency.

Get paid weekly – do a weekly expense planing. Get paid monthly do the monthly one.

Of course it does not hurt to do a monthly budget even if You get paid every week, but Ideally You need to have a expense plan the moment You receive money. It’s harder for people that don’t have regular payment dates like self employed people. If You are one of them, I would recommend sticking to monthly plan. Actually, if You are self employed You most likely know everything about creating a budget. 🙂

Categorize the expenses

You can choose the categories you need yourself and more personalized (better fitting your needs) plan is a better plan. We have rather general categories that we will have to review at some point in our lives in order to make it more precise. The more general categories, the easier it is to split the expenses, but the less insight it gives… Today we are still using following categories:

  • Food
  • Apartment expenses
  • Household supplies
  • Invoices
  • Entertainment
  • Car
  • Unnecessary expenses
  • Savings

The most important of all categories is Unnecessary spending. In this category we put all the payments that we could had lived without or didn’t really need them. The longer we practice tracking our costs and be mindful what ends up in this category, the smaller this category gets. Actually we stopped spending money on totally stupid things pretty quickly once we started realizing how much we waste every month. Now, we are trying to recognize those spendings that we actually didn’t need. It’s a lot tougher process today compared how it was when we started this but we still try to do it.

The main purpose of this process is not to download a lot of bank account transactions and/or to collect bunch of papers in form of receipts, but to estimate what expenses were really necessary.

As soon as you begin to realize what your expense weakness is these expenses will decrease because You will already know in the back of your mind that if bought these will land in the bad category (whatever you will name it) of your expense list.

For me, the weakness was with Aliexpress purchases – I bought all kinds of unnecessary stuff (like different gadgets, tablets watches etc.) and most of it ended up in garbage. We have bought a lot of toys from Ali for our cat, but we wouldn’t categorize that as useless spending. It’s for our cat … 🙂 And actually pet toys from AliExpress are very good when we compare them to local store goods. And prices are lot better so nothing is black or white – we cannot say that buying from AliExpress means having Unnecessary expenses.

Create cost overview

Once all of the above has been done we need to put everything together and see how it looks in total picture. We need to sum together expenses per category (You can do this in previous step. That’s how we also do it now).

Here is an example of how this monthly summary can look like:

Here it’s important to be honest with yourself and first of all evaluate whether the summary seems to be plausible to you (whether there is any mistake in sorting the costs by category). Also try to explain to yourself why certain figures are higher than the others. A really good practice is to have the average per category and check the numbers that are way off the average after your first quarter doing this expense analysis. Usually there are large deviations from average in February due to Valentines day and in month when You have to pay taxes, insurance et cetera. Comment your figures if needed, but don’t leave a large deviation unexplained to yourself.

We recommend that you include average figures not only to see unusually high or lower inclination, but also because it will be useful in planning your funding next month. You might be surprised and not believe me right now but knowing how much You will need for each category really makes it easier to plan ahead. Now on payday I already clearly see how much excess money we have based on average monthly spending that we have while previously I had to guess or even worse – assumed that I have a lot of money in my pocket now.

Create a summary

Summarizing the above:

  • To minimize the biggest and unnecessary costs, they must first be aware and accurately defined in themselves.
  • In the clear consciousness where you spend a lot and where you do not have to spend your money at all, you will be able to recognize these unnecessary expenses and refrain from them.

It’s not obligatory to make your own summary in Excel, but it’s important to know clearly where are You ready and willing to spend your hard-earned money.

Make a decision and execute it

It’s good that we have a clear overview of our expenses, but serious results require serious efforts. If You wanted just to get rid of some unnecessary spending, this should be enough and weekly meal plan would put an extra kick to it. But if You are in serious debt situation You might be willing to go even few steps forder on cutting your expenses. What are next steps? Well, You name it. There are so many to do, but one thing is common to them all – You need to make a decision to apply them and then execute that decision.

A good starting point here is to sit with your list of expenses and go through each item on it. Look for things that You didn’t mark as unnecessary expense. Identify things that you want less than getting rid of your debt or saving for that things because of which You are reading this article. Then think in what order You will get rid of those things but be careful here. Try to find a way that’s least painful to you and your family members. Otherwise you won’t get rid of them.

Weekly meal plan

If the former seemed elementary, then this should not be a problem at all. It’s just as simple as it sounds – plan when, where and what you will eat. You’ll save money on spontaneous decisions! We borrowed this idea a bit from the Zero Waste movement representatives in – SeekTheSimple.

It goes hand in hand with what I wanted to highlight above – spending money only on what you choose to spend it on. Do not rely on impulses, because these impulses are often generated artificially and you’ll end up at McDonald’s ordering a big set with Big Tasty burgers…

It does not mean that you can not plan to eat outside house but if you still want to save some money try to schedule both meals at home and outside.

Start when You with your spouse are fed. Sit at a table and compile a list of meals for the next week. Give yourself the freedom to change the days when you will have them but stick to the dishes you put on the list. I would recommend compiling a list of three things:

  1. Meals for the next 7 days (we usually plan from Sunday to Sunday)
  2. Required ingredients for each meal
  3. Where to buy the right one

And here note that the more you can buy from the marketplace, the more you can save money for months when you did not plan your meal.

For us, our plan of food is relatively simple – we have 7 meals (dinner) planned, breakfast is random (whatever we love since it’s the most important meal of the day and we cannot skip it) and lunch is usually dinner from the day before. We experiment with different recipes and we also try out some recipes that I would call odd – recipes from vegan and vegetarian recommendations. Sometimes it turns out really good, sometimes…well not really. I would recommend You do that as well not only for new experiences, but also cutting off from meat also lets you prepare cheaper meals which in turn affects your budget.

Marketplace / Shops

We buy a lot of produce in nearest marketplace because we noticed that a lot of things are cheaper here. We do not feel the difference in quality and it’s from local farmers (potentially healthier than supermarket produce). But if something is available on the marketplace it does not mean that the store sells a lower quality alternative or you buy it cheaper on the marketplace. No, prices and quality must be followed very carefully. If you’re not careful you have a chance of getting a bad deal.

Evaluate what you are ready to pay, what you are ready to endure, and for how much. Each one of us of course has his/her own insight into things. We, for example, are not vegetarians and for us Eco goods are not a “must have”  items. If we know that the product is not unhealthy then it is a pretty Eco for us.

As a practical example, eggs – we know that they can be bought from a farmer for 3 euros laid by healthy hen, but we will, however, take eggs from store and pay only 1.20 euros (more than half the price!). Eggs  from store are a lot less “Eco” than the ones sold by farmers but we don’t really value them high enough to pay double.

I also recommend that you find your favorite places to buy food and I definitely recommend you go to explore the local market and its offer.

Additional suggestions

If your goal is to save money, be sure to look for recipes from ingredients that last a long time (eg beans, buckwheat, rice, etc.) and plan at least 2 meals a week using these products. When you buy these products, I recommend buying the package with biggest size that you can afford and store at home. The earlier you fill out your storage room, the longer you have these dishes with a “discount”.

If You loved this article and would like to support further development of this blog, please donate what You can afford:

[wpedon id=”473″ align=”center”]

CSV, PANDAS and Python

We will analyze the data for the historical prices of SEB investment funds using powerful Python programming language modules – CSV and PANDAS.

The question may arise why should I use SEB Funds for the examples instead of stock price data which are standard source data for these kind of things? The answer is simple – I have savings account linked to these funds and I alone have duty of setting up the allocation.

In this article, we will take the first steps in deciding on which funds we want to see in our portfolio and which we will avoid.

When working with large-scale data tables that contain countless numerical values have to be stored somewhere. Of course, data can also be stored in a database, which would be the right choice if the data is to be regularly updated and stored for a long time, but this time we will look at the CSV format data, which will be our raw material format and cover the databases when we will continue this topic.

If you are more interested in storing data in a database, don’t worry, at the very end I will show you how to save the data into a database using sqlite3 module.

When I started this article, I was planning to create a Python code that will automatically take data from the SEB website for the proposed investment funds, but then this article would be just about it, not the work with CSV files (which is a much simpler topic) and their analysis. We will get back to this topic when it comes to obtaining data from websites.

What is CSV?

CSV or comma-separated values is a file format that is basically the same as widely known TXT files. The main difference is that in CSV all entries are separated by a comma, semicolon or any other separator by authors choice actually, depending on the regional settings. The world standard is a comma and decimal sign is dot “.”.

Regardless of separator selection, it is the same throughout the file and the structure of any CSV file is a table. With or without headings. In our example we will use CSV files with the historical prices of SEB investment funds.

Python library for CSV

As I mentioned earlier, unfortunately, I will not be able to briefly show how to get these historical data automatically, so I will have to download them manually. The time it takes depends, of course, on how many funds you will want to look at. I did not accept all, but 13 funds got together. You will do something.

I’m not really into the manual job, so I saved all CSV files according to their ISIN codes available on the homepage. It looks like this:

It would be helpful to get the list of these files in the Python code. I will not go to this step, but I would add that here we will get an ISIN code list without the .csv suffix. If you want to keep them csv, instead of simply add it to isin.append (i [: 12]) isin.append (i):

Now load one of the CSV files and look or get things far:

If you see something like that, then great!

CSV functions

  • csv.reader
  • csv.writer
  • csv.register_dialect
  • csv.unregister_dialect
  • csv.get_dialect
  • csv.list_dialects
  • csv.field_size_limit

In this article we will discuss only first two though.

Working with CSV files

Soon, we’ve already seen an excellent example of csv.reader. It’s just that simple. In my code above, the function csvFile.close () was superfluous, since using With open () should not be a problem when the file remains open, but keep in mind that files should be closed after their use. Working with With open () is the safest way, because even if the code gets stuck somewhere, the file will be closed.

The next step is to create a new CSV file with the csv.writer, but what do you write here with csv.writer (‘File’, ‘w’) writerows (‘Hi, the world!’) If we can do something more complicated right away, but more interesting?

Let’s do data aggregation

Things that we need to do:

  1. Create a new list (eg data = []);
  2. Collect data from all CSV files with ISIN codes in the title;
  3. Save data to a new CSV file.
  4. Get acquainted with the PANDAS module 🙂

So the first step is pretty simple, but for the second one, let’s just say:

I hope that I will not miss you for a cycle or with open, because everything seems to be simplifying. Things or data become completely abusive when we’re up to formatting. Also, now it might be easier to put ISIN codes in the column names, but it seems to me, then I would not only lose you, but would even spoil yourself: D

Here we read the string of csv data with str (row [0]) [: 10] and str (row [0]) [11: -1] for two reasons:

  1. We do not want the list in the list, so we read the contents of each respective list;
  2. In order to successfully continue, we need two entries in place of a single entry with a date and historical price: date and value.

Saving aggregated data to CSV file

When we are already so far, then the great stumbling block with the CSV module is coming to an end. You just have to save. In this case, I would like to add names, so let’s write one line before we start printing all other data:

Whew, finally, the boring part of the deal. Getting started with PANDAS or pd is the right thing to do, as is the case with shortening the code.

CSV and PANDAS

Let’s start by eliminating the shortcomings in our previous table by turning ISIN codes against dates and converting dates into dates:

Changing the date type data will have a slight effect on the speed of the code execution, but this is a necessary step to continue. If you want to remove it print (csvtab.dtypes). it’s useful at the start to make sure the data types in the table are correct.

Analysis of available data

The most simple analysis of all of our data is possible with a very simple command: table.describe (). As we rotate ISIN codes for column names, we will have statistics for each fund. Here’s a bounce from the results:

You may need to add, but remember that after any data manipulation, take some of the results and check the correctness of the results. I’m not saying that we have made mistakes here, but there is a negative experience working with other widely used, user-friendly systems. 🙂

Means, medians, standard deviations and numPy

You may need to add, but remember that after any data manipulation, take some of the results and check the correctness of the results. I’m not saying that we have made mistakes here, but there is a negative experience working with other widely used, user-friendly systems. :)…

Let’s examine how our result in the table of the stomach looks:

It does not look bad, but if we print this to a csv file and would like to continue processing and analyzing data with Excel, or by simply checking our Python result in another environment, it would be difficult for us to combine ISIN numbers with dates. It might be more convenient to blame ISIN codes for column names. You can easily do this with the .unstack () command:

Now the result is already practically 4 tables (amounts, standard deviations, average and median prices) that are logically ordered and in the foreseeable order (by dates).

Saving the data to MS Excel, CSV or SQLite database

Acknowledging I was surprisingly challenging to export to MS Excel. Perhaps this is due to the fact that at present my computer only has Linux Mint. But the good news is that I did manage it and we will look at how to save the options in both the MS Excel file and the two csv files.

MS Excel:

The print team, of course, is not obligatory, but I wanted to see a confirmation that at last everything succeeded. 🙂

CSV

Exporting to txt and csv files has always been the easiest solution:

SQLite

Since last week I wrote about the SQLite Python module, here is an example of how to export data to a SQLite database that would also be suitable for such a project:

This time we will stop here, but we will definitely return to Pandas and data analysis, as well as these investment funds.

If You like this post and would love to see more of this content, please spare a coin for further development of it

[wpedon id=”473″ align=”center”]