Visualizing portfolio historical performance with Python

Hey there!

In this article we are going to check out how to examine and visualize the historical performance of a portfolio. This can be any portfolio but since we scraped prices for SEB funds in our previous article (Web scraping investment fund prices with Python) we will use those prices here as well. If You haven’t read that article it would be good to at least check out the full code used there as we will import it and use one variable from it. Also by doing so we will initiate the whole thing to run and update the prices for us here.

The set up

This is how we will do this:

  1. Define functions for gathering historical prices for our portfolio allocation
    • Gather prices per chosen funds
    • Calculate total portfolio price based on weights in allocation
  2. Define function that would get all the ISIN codes of our funds and run the price update
  3. Put together our portfolio return, SEB fund average returns for performance comparison

Pretty easy but I will still guide You through it. No worries about that.

The walk-through

The imports

We will need 3 things to be imported and one optional:

  • pandas – for data cleaning and manipulation.
  • matplotlib.pyplot for data visualization.
  • datetime to get today’s date.

Optional:

We could just read MS Excel in stead but I wanted to have this file updating the previous one. I know this could be done in a simpler way but whatever. What’s done, done.

 #Importing standard libraries for data cleaning & visualization:
import pandas as pd
import matplotlib.pyplot as plt
import datetime

Also we could say that matplotlib.pyplot is optional since it is possible the get the charts form pandas DataFrames as well, but whenever I run it from terminal I see only matplotlib and no pandas charts so I would still trust the main chart to matplotlib.

Function for loading prices form MS Excel

This function will read excel file we previously saved and retrieve an aggregated portfolio price based on given allocation and prices from sheet ‘DATA’. It requires ‘allocation’ as a pandas DataFrame with fund ISIN codes and allocation weights for our current portfolio.

In order to get that nice (an single) list of of prices we will need to go through these steps:

1. Create a list of column names (ISIN codes) for allocation

Initially I believed that I can just go with allocation.index as a list for filtering, but something in pandas proved me wrong. The issue was that ‘allocation‘ had ISIN codes that were not saved in excel. Reminder: We dropped funds that didn’t have at least 1000 prices/days when gathering prices.

Why we included ISINs for which we don’t have prices in allocation? It happens when we want to take average fund performance by giving each ISIN equal weights. This shouldn’t be a problem if You would use the same ISIN codes that we have for our portfolio.

Quick workaround – looping though the all ISIN codes saved in prices and then through allocation.index putting ISINs that matched in separate list – filterColumn.

def getPortPrices(allocation):
#   Always nice to inform yourself about opening file and other processes
    print('\nReading fund data...')
    fundPrices = pd.read_excel('SEB funds.xlsx','DATA', index_col='DATE')
    
#   Creating a list that will be used as a filter for ISIN columns later on:
    filterColumn = []
    for c in fundPrices.columns:
        for a in allocation.index:
            if c == a:
                filterColumn.append(c)

2. Filter and adjust prices according to given allocation

This task calls for another loop!

But if You think of it we could have done it all together in previous loop… All the bright ideas come to me too late. I am not willing to change this setup (since it works well).

Anyway. We will do the following:

  1. Basic double-loop operation on fund prices DataFrame index and filteredColumn ISINS
  2. Append a temp list p for our portfolio price.
  3. Drop NA values everywhere to have only clean data
#   Getting the prices for given allocation and calculating total average price
#   based on allcation weights
    p = []
    print('Calculating portfolio aggregated price...\n')
    for index, row in fundPrices.iterrows():
        for i in filterColumn:
            p.append([pd.to_datetime(index), float(row[i]*allocation[i])])
    portfolio = pd.DataFrame(p, columns=list(['DATES','PRICES']), dtype=float).dropna()
    
#   Making sure types are read correctly
    pd.to_numeric(portfolio['PRICES'])
    pd.to_datetime(portfolio['DATES'], errors='coerce').dropna()

Here the parameter errors=’coerce’ means that if there would happen to be an error while converting DATE to date then it would deliver NA which we would drop on sight.

3. Group by date and deliver some stats

Lastly we want to use these neat and clean price dates as index as they are unique (at least should be) so we will use function group that would assign whatever we are grouping on as an index.

If for any reason there would be duplicate dates we would sum up the prices of our portfolio anyway and escape an error. But so fare there haven’t been any duplicates..

And as a cherry on the top, and this is optional, we will get some statistics on our prices. We could just go with pd.DataFrame.describe() but we don’t need all of that stuff. We will settle with STD, Mean return and Sharpe ratio:

# We will work with Percent change or return figures going forward so let's 
#   caluclate them and drop N/A (first row)
    portfolio = portfolio.groupby('DATES')['PRICES'].sum()
    
    print('STD: {}%'.format(round(100*portfolio.pct_change().dropna().std(),5)))
    print('Mean return: {}%'.format(round(100*portfolio.pct_change().dropna().mean(),5)))
    print('Sharpe: {}\n'.format(round(portfolio.pct_change().dropna().std()/portfolio.pct_change().dropna().mean(),2)))
    
    return portfolio

It looks a bit ugly but I believe that using an extra variable when it’s not needed just puts some extra weight on memory so I went with raw portfolio.pct_change().dropna() in stead of assigning it to a variable.

To be perfectly honest with You, I will drop these print() lines of code as soon as this article gets published because I have some bigger plans for this in near future and I would prefer it not doing non-obligatory calculations. Meanwhile the current output is:

STD: 0.29853%
Mean return: 0.01213%
Sharpe: 24.6

Before You continue I just wanted to say that .dropna() in case with pct_change() is not to drop erroneous entries. It is for the first record that’s always empty when You use .pct_change().

That being said, here’s the full code getPortPrices function:

def getPortPrices(allocation):
#   Always nice to inform yourself about opening file and other processes
    print('\nReading fund data...')
    fundPrices = pd.read_excel('SEB funds.xlsx','DATA', index_col='DATE')
    
#   Creating a list that wil be used as a filter for ISIN columns latr on:
    filterColumn = []
    for c in fundPrices.columns:
        for a in allocation.index:
            if c == a:
                filterColumn.append(c)

#   Getting the prices for given allocation and calculating total average price
#   based on allocation weights
    p = []
    print('Calculating portfolio aggregated price...\n')
    for index, row in fundPrices.iterrows():
        for i in filterColumn:
            p.append([pd.to_datetime(index), float(row[i]*allocation[i])])
    portfolio = pd.DataFrame(p, columns=list(['DATES','PRICES']), dtype=float)
    
#   Making sure types are read correctly
    pd.to_numeric(portfolio['PRICES'])
    pd.to_datetime(portfolio['DATES'], errors='coerce').dropna()
    
#   We will work with Percent change or return figures going forward so let's 
#   calculate them and drop N/A (first row)
    portfolio = portfolio.groupby('DATES')['PRICES'].sum()
    
    print('STD: {}%'.format(round(100*portfolio.pct_change().dropna().std(),5)))
    print('Mean return: {}%'.format(round(100*portfolio.pct_change().dropna().mean(),5)))
    print('Sharpe: {}\n'.format(round(portfolio.pct_change().dropna().std()/portfolio.pct_change().dropna().mean(),2)))
    
    return portfolio

Function for initiating SEBscraper and getting ISIN codes

This is a short one – it imports our previous project SEBscraper as seb and initiates price update from which we take DataFrame funds. As we noticed in previous function You have to be careful with this one because we didn’t take prices for all funds that we got from overview page. We only took the ones that had significant history (more than 1000 prices)

def getISINs():
#   Importing isin variable from previous project    
    import SEBscraper as seb
    isin = seb.funds
    return isin

Putting everything together

Finally all the pre-requirements have been fulfilled and we can get our hands wet with the data.

we will start with the print() to let ourselves know that we are starting the process and initialize the price update. After prices are updated and we got the latest list of fund ISIN codes that SEB offers, we will read the prices out form the Excel file where they are saved.

We want to process them in two ways:

  1. Get our current protfolio allocation
  2. Get an average of SEB investment fund performance

Well, we will calculate the performance but in order to get to that, we will assign equal values to all funds. We can also add other allocations that we might like. I have prepared the split in BONDS, EQUITY and MIX based on their names. Here’s a reminder how it looks:

If You wanted to comment on my chosen allocation that it looks like I’m 60+ then I agree. Way too conservative and I expect to see it in charts as well.

Here’s where we start:

#Getting the AVG SEB fund performance:
print('Initialize price update and get the ISIN codes:')
isin = getISINs()['ISIN']
aBench = pd.DataFrame({'ISIN':isin, 'ALLOCATION':round(1/len(isin), 3)}).groupby('ISIN')['ALLOCATION'].sum()

#Getting the allocation we have set in our MS Excel file "SEB portfolio"
print('Getting the portfolio prices:')
fundAlloc = pd.read_excel('SEB portfolio.xlsm', 'FUNDS', index_col='Fund Name', dtype={'ISIN':str, 'ALLOCATION':float})
pAlloc = fundAlloc.filter(items=['ISIN', 'ALLOCATION']).loc[(fundAlloc['ALLOCATION'] != 0.00) & (fundAlloc.index != 'Total')].groupby('ISIN')['ALLOCATION'].sum()

#Setting up comparison DataFrame for prices and price changes:
totalPrices = pd.DataFrame({'PORTFOLIO':getPortPrices(pAlloc), 'SEB AVG':getPortPrices(aBench)}, index=getPortPrices(pAlloc).index).dropna()

So now we have everything ready in DataFrame totalPrices. The code gives the following output:

Initialize price update and get the ISIN codes:
List of ISINs and fund names gathered!

All prices are up to date!
Getting the portfolio prices:

Reading fund data...
Calculating portfolio aggregated price...

STD: 0.02481%
Mean return: 0.00133%
Sharpe: 18.72

Reading fund data...
Calculating portfolio aggregated price...

STD: 0.29853%
Mean return: 0.01213%
Sharpe: 24.6

Reading fund data...
Calculating portfolio aggregated price...

STD: 0.02481%
Mean return: 0.00133%
Sharpe: 18.72

AS we can see our bench a.k.a SEB investment fund average performance outperforms our allocation in returns and Sharpe ratio and our allocation beats it in capital preservation. But since I am not ancient, I would prefer to be taking more risks for bigger winnings…but more on choosing optimal allocations in next article.

The plot

So we get to the fun part – data visualization. First thing I want to stress here is that we are dealing with dates for which we have prices for. So I would assume that all of these are working days. In other words, picking last 360 days would not be equal to 1 year. In this code I have made an assumption that each month has equal amount of business days = 21. And that means that quarterly data is 63 days and annual – 250. This will be relevant real soon.

And by soon I meant right now. We will check standard deviation (STD) for past 250 days and also check price movements for the same period with moving average for a month.

#Creating some visualizations for our comparison
#We use rolling means to smoothen the lines on chart.
print(totalPrices[-250:].pct_change().std())

plt.plot(totalPrices[-250:].pct_change().rolling(21).mean())
plt.title('Price change movement comparison')
plt.ylabel('Gains/Loses')
plt.xlabel('Dates')
plt.legend(totalPrices.columns)
plt.show()

For the STD that represents risk we see that our portfolio is 10x safer and more stable than the average:

PORTFOLIO 0.000254
SEB AVG 0.002815

But it has it’s trade-off in average gains:

At some points we would of course enjoy having the portfolio that we have here. Like, in all those price drops below 0, but in general we can see that average SEB fund enjoyed higher rise in price than our

allocation. Also what we can notice here is the fact that 250 days back is not equal one year. More like 3/4 of a year. We can note this in further work that’ s needed to be done. In this case for comparison reasons it’s fine as long as all date ranges are the same for single comparison.

Let’s take a peek at our price movement frequency in histogram:

#Checking how return distribution of our portfolio looks like
totalPrices['PORTFOLIO'].pct_change().plot.hist(title='Price change frequency', bins=20)

Yeah..so it’s not all bad if we do not take into account inflation and stuff. I mean it’s sort of mostly positive…right next to zero. The fact that price changes were mostly upwards is actually really good if You thin about it. It just sucks that those are not significantly large movements upwards.

Let’s move on and check the price movements for last 63 dates and also add some extra stats with it:

#latest price movements:
totalPrices[-63:].pct_change().rolling(5).mean().plot.line(title='Price change movements during last 63 working days (Rolling mean for 5 wdays)')
print('Risk:\n{}'.format(totalPrices[-63:].pct_change().std()))
print('\nReturn:\n{}'.format(totalPrices[-1:].mean()/totalPrices[-63:-62].mean()-1))
print('\nSHARPE: \n{}'.format((totalPrices[-1:].mean()/totalPrices[-63:-62].mean()-1)/totalPrices[-63:].pct_change().std()))

We also get the following output:

Risk:
PORTFOLIO 0.000212
SEB AVG 0.003091
dtype: float64

Return:
PORTFOLIO -0.001604
SEB AVG 0.015277
dtype: float64

SHARPE: 
PORTFOLIO -7.581200
SEB AVG 4.942028
dtype: float64

Nothing to be proud of since we are in the read for last ~3 months. This just keeps sending us signals that there is a better allocation among these funds. If we assume we have a choice among only these funds we can be sure that we can find a more optimal allocation than the one we have currently. And this assumption is true for me since I have a life insurance linked to these funds and I have the privilege of setting the allocation. (another reason why we will definitely discuss asset allocation optimization)

One last plot to view is a scatter plot for STD/returns to see how well risk and return go hand in hand. Obviously we want to have the least risk for the most returns. But let’s see if this will be the case. From previous metrics we can already anticipate low risk scores and not so good looking returns.:

allFunds = pd.read_excel('SEB funds.xlsx','DATA', index_col='DATE').dropna()
bb = pd.DataFrame({'RETURNS':100*(allFunds[-1:].mean()/allFunds[:1].mean()-1),'STD':allFunds.std(), 'SHARPE':(allFunds[-1:].mean()/allFunds[:1].mean()-1)/allFunds.std()})
final = bb.append(summary.filter(items=['STD','RETURNS','SHARPE']))

plt.scatter(bb['STD'].values, bb['RETURNS'].values, s=bb['SHARPE'].values*100, c='g')
plt.scatter(summary['STD'].values, summary['RETURNS'].values, s=summary['SHARPE'].values*100, c='r')
plt.title('Return vs Risk relationships')
plt.xlabel('Risk')
plt.ylabel('Return')
plt.show()

You might guess which one is our portfolio in there. It’s a clear indicator that we need to reallocate the portfolio for better performance.

Saving the summary figures to MS Excel file

So now we can save our results in an Excel spread sheets and close this topic so we can move on to the optimization. 🙂

summary = pd.DataFrame({\
                        'RETURNS.90':(totalPrices[-1:].mean()/totalPrices[-63:-62].mean()-1), \
                        'RETURNS.360':(totalPrices[-1:].mean()/totalPrices[-250:-249].mean()-1), \
                        'RETURNS.1000':(totalPrices[-1:].mean()/totalPrices[-1000:-999].mean()-1), \
                        'RETURNS':(totalPrices[-1:].mean()/totalPrices[:1].mean()-1), \
                        'STD.90':totalPrices[-63:].pct_change().std(), \
                        'STD.360':totalPrices[-250:].pct_change().std(), \
                        'STD.1000':totalPrices[-1000:].pct_change().std(), \
                        'STD':totalPrices.pct_change().std(), \
                        'SHARPE.90':(totalPrices[-1:].mean()/totalPrices[-63:-62].mean()-1)/totalPrices[-63:].pct_change().std(), \
                        'SHARPE.360':(totalPrices[-1:].mean()/totalPrices[-250:-249].mean()-1)/totalPrices[-250:].pct_change().std(), \
                        'SHARPE.1800':(totalPrices[-1:].mean()/totalPrices[-1250:-1249].mean()-1)/totalPrices[-1250:].pct_change().std(), \
                        'SHARPE':(totalPrices[-1:].mean()/totalPrices[:1].mean()-1)/totalPrices.pct_change().std(), \
                        }, dtype='float', index=totalPrices.columns)

#Getting todays date and saving the figures in one excel:
n = datetime.datetime.date(datetime.datetime.now())
with pd.ExcelWriter('PORTFOLIO SUMMARY ' + str(n) + '.xlsx') as pdExcel:
        summary.to_excel(pdExcel, 'OVERALL')
        totalPrices.pct_change().to_excel(pdExcel, 'PCT CHANGE')
        totalPrices.to_excel(pdExcel, 'PRICES')
        final.to_excel(pdExcel, 'PERFORMANCE')
        pdExcel.save()

print('\nData saved to MS Excel file: PORTFOLIO SUMMARY ' + str(n) + '.xlsx')

As You might have noticed I added bit more periods to the summary so it has quarterly, annual and 5 year figures. Or at least what we consider as equivalents of those time periods within this project.

Full code:

As always, here’s a full code for this project (go to web scraping fund prices article to get the code for SEBscraper):

#Importing standard libraries for data cleaning & visualization:
import pandas as pd
import matplotlib.pyplot as plt
import datetime

#We have Tables with SEB funds prices in MS Excel file "SEB funds"
#which we gathered using our Python project "SEBscraper"
def getPortPrices(allocation):
#   Always nice to inform yourself about opening file and other processes
    print('\nReading fund data...')
    fundPrices = pd.read_excel('SEB funds.xlsx','DATA', index_col='DATE')
    
#   Creating a list that will be used as a filter for ISIN columns latr on:
    filterColumn = []
    for c in fundPrices.columns:
        for a in allocation.index:
            if c == a:
                filterColumn.append(c)

#   Getting the prices for given allocation and calculating total average price
#   based on allocation weights
    p = []
    print('Calculating portfolio aggregated price...\n')
    for index, row in fundPrices.iterrows():
        for i in filterColumn:
            p.append([pd.to_datetime(index), float(row[i]*allocation[i])])
    portfolio = pd.DataFrame(p, columns=list(['DATES','PRICES']), dtype=float)
    
#   Making sure types are read correctly
    pd.to_numeric(portfolio['PRICES'])
    pd.to_datetime(portfolio['DATES'], errors='coerce').dropna()
    
#   We will work with Percent change or return figures going foward so let's 
#   caluclate them and drop N/A (first row)
    portfolio = portfolio.groupby('DATES')['PRICES'].sum()
    
    print('STD: {}%'.format(round(100*portfolio.pct_change().dropna().std(),5)))
    print('Mean return: {}%'.format(round(100*portfolio.pct_change().dropna().mean(),5)))
    print('Sharpe: {}\n'.format(round(portfolio.pct_change().dropna().std()/portfolio.pct_change().dropna().mean(),2)))
    
    return portfolio

def getISINs():
#   Importing isin variable from previous project    
    import SEBscraper as seb
    isin = seb.funds
    return isin

#And here goes the action code:
    
#Getting the AVG SEB fund performance:
print('Initialize price update and get the ISIN codes:')
isin = getISINs()['ISIN']
aBench = pd.DataFrame({'ISIN':isin, 'ALLOCATION':round(1/len(isin), 3)}).groupby('ISIN')['ALLOCATION'].sum()

#Getting the allocation we have set in our MS Excel file "SEB portfolio"
print('Getting the portfolio prices:')
fundAlloc = pd.read_excel('SEB portfolio.xlsm', 'FUNDS', index_col='Fund Name', dtype={'ISIN':str, 'ALLOCATION':float})
pAlloc = fundAlloc.filter(items=['ISIN', 'ALLOCATION']).loc[(fundAlloc['ALLOCATION'] != 0.00) & (fundAlloc.index != 'Total')].groupby('ISIN')['ALLOCATION'].sum()

#Setting up comparison DataFrame for prices and price changes:
totalPrices = pd.DataFrame({'PORTFOLIO':getPortPrices(pAlloc), 'SEB AVG':getPortPrices(aBench)}, index=getPortPrices(pAlloc).index).dropna()
    
"""
In reality working day count differs month to month and year to year but for our purposes
we will assume the count is fixed 21 working day per month, 63 per quarter, 250 per year and 1250 for 5 years
"""
#Creating some visualizations for our comparison
#We use rolling means to smoothen the lines on chart.
print(totalPrices[-250:].pct_change().std())
plt.plot(totalPrices[-250:].pct_change().rolling(60).mean())
plt.title('Price change movement comparison')
plt.ylabel('Gains/Loses')
plt.xlabel('Dates')
plt.legend(totalPrices.columns)
plt.show()

#Checking how return distribution of our portfolio looks like
totalPrices['PORTFOLIO'].pct_change().plot.hist(title='Price change frequency', bins=25)

#latest price movements:
totalPrices[-63:].pct_change().rolling(5).mean().plot.line(title='Price change movements during last 63 working days')
print('Risk:\n{}'.format(totalPrices[-63:].pct_change().std()))
print('\nReturn:\n{}'.format(totalPrices[-1:].mean()/totalPrices[-63:-62].mean()-1))
print('\nSHARPE: \n{}'.format((totalPrices[-1:].mean()/totalPrices[-63:-62].mean()-1)/totalPrices[-63:].pct_change().std()))

print('\nIn total there are {} records of prices from {} to {}\n'.format(len(totalPrices.index), totalPrices.index[0], totalPrices.index[-1]))

summary = pd.DataFrame({\
                        'RETURNS.90':(totalPrices[-1:].mean()/totalPrices[-63:-62].mean()-1), \
                        'RETURNS.360':(totalPrices[-1:].mean()/totalPrices[-250:-249].mean()-1), \
                        'RETURNS.1000':(totalPrices[-1:].mean()/totalPrices[-1000:-999].mean()-1), \
                        'RETURNS':(totalPrices[-1:].mean()/totalPrices[:1].mean()-1), \
                        'STD.90':totalPrices[-63:].pct_change().std(), \
                        'STD.360':totalPrices[-250:].pct_change().std(), \
                        'STD.1000':totalPrices[-1000:].pct_change().std(), \
                        'STD':totalPrices.pct_change().std(), \
                        'SHARPE.90':(totalPrices[-1:].mean()/totalPrices[-63:-62].mean()-1)/totalPrices[-63:].pct_change().std(), \
                        'SHARPE.360':(totalPrices[-1:].mean()/totalPrices[-250:-249].mean()-1)/totalPrices[-250:].pct_change().std(), \
                        'SHARPE.1800':(totalPrices[-1:].mean()/totalPrices[-1250:-1249].mean()-1)/totalPrices[-1250:].pct_change().std(), \
                        'SHARPE':(totalPrices[-1:].mean()/totalPrices[:1].mean()-1)/totalPrices.pct_change().std(), \
                        }, dtype='float', index=totalPrices.columns)

allFunds = pd.read_excel('SEB funds.xlsx','DATA', index_col='DATE').dropna()
bb = pd.DataFrame({'RETURNS':100*(allFunds[-1:].mean()/allFunds[:1].mean()-1),'STD':allFunds.std(), 'SHARPE':(allFunds[-1:].mean()/allFunds[:1].mean()-1)/allFunds.std()})
final = bb.append(summary.filter(items=['STD','RETURNS','SHARPE']))

plt.scatter(bb['STD'].values, bb['RETURNS'].values, s=bb['SHARPE'].values*100, c='g')
plt.scatter(summary['STD'].values, summary['RETURNS'].values, s=summary['SHARPE'].values*100, c='r')
plt.show()

#Getting todays date and saving the figures in one excel:
n = datetime.datetime.date(datetime.datetime.now())
with pd.ExcelWriter('D:\Development\Python Projects\FONDI\PORTFOLIO SUMMARY ' + str(n) + '.xlsx') as pdExcel:
        summary.to_excel(pdExcel, 'OVERALL')
        totalPrices.pct_change().to_excel(pdExcel, 'PCT CHANGE')
        totalPrices.to_excel(pdExcel, 'PRICES')
        final.to_excel(pdExcel, 'PERFORMANCE')
        pdExcel.save()

print('\nData saved to MS Excel file: PORTFOLIO SUMMARY ' + str(n) + '.xlsx')

Leave a Reply

Your email address will not be published. Required fields are marked *