Codementor Events

How I learned Python - Pandas

Published Jul 10, 2018Last updated Jan 06, 2019

About me

I'm a Bachelor in Business Administration by the University of São Paulo, Brazil. My background has always been in finance with a focus on Capital Markets. Now, I found my true passion: programing.

Why I wanted to learn Python - Pandas

I'm a strong believer that automation will take over all areas of the investment industry in the upcoming years, making them more cost-efficient and accessible. My goal is to merge the IT and the financial knowledges to make this transition as beneficial as possible to the world.

Learning programming was the initial and perhaps the most important step to put myself as an active agent in that process and, once I figured out how finance and programming talk to each other, both became ultra fun. It's now a matter of putting a lot of hard work to make them as viable as possible.

How I approached learning Python - Pandas

If you are not from computer science and want to learn programming as fast and as deeply as possible, my advice is to begin with a very difficult challenge right from the beginning. In my case, I wanted to create a Risk System that calculated the average term (Duration) and the parametric and non-parametric VaRs of the Hedge Funds' portfolios. This challenge can be broken down in the following steps:

  1. Get the portfolios in a very well structured data form. In my case, every portfolio was has its XML form, so I had to parse the XML Trees in order to make a readable Pandas DataFrame.

  2. For every portfolio, there are ISINs (an asset's "ID"), quantities and total values. The exception is the cash account since it doesn't actually have an ISIN.

  3. For every ISIN, it's possible to find the historical prices table. In the Hedge Fund I used to work for (and pretty much any fund), there is a Bloomberg terminal. It's possible to "connect" the terminal with python using the Bloomberg API for developers ^^. Because the data will be fetched in raw way, I recommend using python packages like tia and blpinterface: they transform any of the fetched data into nice pandas DataFrames.

  4. Now that we have the all the data, it's time to do the computations. Each portfolio has it's own percentage weights for any given asset and this can be written as a matrix. The parametric and non parametric VaRs are basically multiplications of the first matrix by the historical prices (of course, this is a big simplification). The result should be a single value indicating the risk. A similar rationale is adopted to the calculation of the duration. If you want to understand what VaR is even further, I recommend this page: https://en.wikipedia.org/wiki/Value_at_risk

  5. Because all data could be filled in a single sheet and it's possible it to convert it to html, it was possible to use it as in body when sending automated emails to my colleagues. After all, this was the final objective of my task.

Now, job done 😄

Challenges I faced

The first issue I faced was by getting the XML files of the portfolios. They came by mail... So here's what I did:

  • I found this code on GItHub https://gist.github.com/baali/2633554. It downloads all attachments into a single folder. The problem with this code is that it's very, very slow and I didn't always to download all attachments but to download files that contained certain keywords or the most recent files (i.e. from the last 150 emails). So I edited it (hard).
  • The attachment for this project (and some other ones) were downloaded all at once. After that, they would be moved to other folders where scripts would read them from.
  • This routine would happen many, many times during the day not to miss anything.

Another challenge is that most often the data will come with different structures, making things harder to manage. On the step 3 of the Learning Process I've mentioned before, the ISINs of the portfolios didn't match Bloomberg's ticker system, so I had to develop my own dictionary to reference them both. Bloomberg doesn't provide a nice table with ISINs an their respective Bloomberg Tickers... So I had to make one myself. The solution is to use the "SECF" function in the terminal to do that and spend some hours compiling the data and saving it in the csv format.

The final (and most difficult challenge) was that I found out that many of the XML files were simply wrong. This took me many calls with the broker to be fixed.

Key takeaways

Netherless to say, Oriented-Oriented-Programing is a must to make this project as easy as possible. If you are a beginner, I highly recommed Python to assimilate OOP concept. After this, learning any other language will be a matter of syntax.

Tips and advice

Programming, like traditional processes, is about controlling and manipulating flows of information. The more elegantly the code is written and the more elegantly processes are designed, the more efficient the final product will be. Seeing the productivity being multiplied by x times and reducing y costs is extremely rewarding.

And it's super fun.

Next steps

Find the next extra cool problem to be solved! 😄

Discover and read more posts from Rafael Klein Inaimo
get started
post commentsBe the first to share your opinion
Show more replies