Python for Finance in Excel — Filling in Blanks in Financial Trading Data

Bogdan Tudose
The Marquee Group
Published in
6 min readAug 30, 2023

--

A common data cleaning exercise in finance is to fill in blank rows when pulling in share prices of tickers from different stock exchanges. Data providers such as Bloomberg, Yahoo Finance and Refinitiv usually skip holidays and weekends when you are pulling in trading data. However, when you merge data from a company that is dual listed, you could have holidays on the NYSE that don’t exist on TSX, and vice versa. If you then plot the share prices, you might get your line charts randomly dropping to 0 if your data has blanks.

One solution in Excel is to do an IF Statement that says, “if there is no price, pull in the price from row above.” With Python, this is much simpler with a quick .fillna() formula from the pandas package.

The demo below walks through how I use .fillna() to populate missing share prices for Enbridge (ENB), a Canadian energy company that’s dual-listed on NYSE and TSX.

The demo Excel file can be downloaded from my Github repo, where I will be posting more demos in the future: https://github.com/dbogt/PythonExcel

Here is also the direct link to the file: https://github.com/dbogt/PythonExcel/raw/main/Python%20in%20Excel%20Demos%20-%20Demo%201%20-%20Fillna.xlsx

Step 1 — Grab the data with Excel

I used the Stocks feature on the Data ribbon of Excel to create the tickers and then the STOCKHISTORY formula to pull in data from Jan 4, 2022 to Dec 31, 2022.

Formulas in cells A6 and D6 below:

=STOCKHISTORY(A1,$A$3,$A$4,0,1)

=STOCKHISTORY(D1,$A$3,$A$4,0,1)

Stockhistory formula in Excel to pull historical share prices.

Step 2 — Merge the Data and Fill Blanks with Python in Excel

Before we can merge the two data sets with Python and fill in the blanks, we have to first create two variables that convert normal ranges in Excel into Python “dataframes” which are a special data type in the pandas package in Python. These dataframes are the equivalent of the Table tool in Excel. Once data is stored as a dataframe, you can easily refer to columns using tableName[‘Header’], very similar to how Excel uses Table1[Header] instead of cell references.

One nice thing I noticed while playing with the Python implementation in Excel, is that it handles dynamic arrays fairly well. The codes seen in screenshot below:

  • In cell H4 I created the variable to keep track of the TSX data of Enbridge. You can quickly generate this code by writing =PY( df_tsx= and then selecting the share prices that were pulled in Step 1 with StockHistory.
df_tsx = xl("A6#", headers=True)
  • In cell H5 I did the same type of code with the NYSE data of Enbridge:
df_nyse = xl("D6#", headers=True)
  • In cell H6 I merged the two tables using the pandas formula .merge and saved as variable “merged”
merged = df_tsx.merge(df_nyse, how='outer', on='Date', suffixes=('_TSX','_NYSE'))
  • In cell H7 I fixed the merged data set by sorting on Date (old to new) and then using the .fillna() formula to “copy” over prices from rows above to empty rows below whenever there was a blank
merged_fixed = merged.sort_values('Date').fillna(method='ffill')
Python code directly in Excel to merge data sets and fill in blanks.

Let’s take a step back and dive deeper on what the .merge and .fillna formulas are doing.

Merge Formula

The pandas cheat sheet brochure has a nice graphic on the second page on how merge function works:

https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf

In the pandas documentation there is also a whole section explaining how to merge data sets:

https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html

Let’s breakdown our code and explain piece by piece what each part of the formula was doing:

merged = df_tsx.merge(df_nyse, how='outer', on='Date', suffixes=('_TSX','_NYSE'))
  • The formula can be used in this style table1.merge(table2)
    – In our case table1 was df_tsx (TSX data of Enbridge) and table2 df_nyse (NYSE data of Enbridge)
    – These are also referred to as the table on the “left” and table on the “right” because one is on the left of the word .merge, and the other on the right, inside the formula
  • how = ‘outer’ is telling Python to keep all the rows for all the dates
    – Outer means keep “all” the data, even if there may be missing rows from one table or the other
    – If we had said how= ‘inner’, it would keep only dates that overlap between the two exchanges and exclude any days where there was a holiday in either country
    – If we had said how= ‘left’, it would have kept only the TSX trading days and “miss” out on the dates where there was a holding in Canada (even if NYSE was open those days) and vice versa if we had said how=’right’ (keep NYSE trading days, miss out on U.S. holidays)
  • on = ‘Date’ is telling Python to merge on the “Date” column
    – Think of this as what are we using for the “VLOOKUP” if we had done this using only Excel
  • suffixes = (‘_TSX’, ‘_NYSE’) is telling Python to add a suffix to columns that have the same name but are not being merged on
    – Since Date was used for the merge, it only shows up once in the new consolidated table
    – However, “Close” column is different in each data set (one is the CAD prices on TSX, and the other the USD prices on NYSE) so we need to keep track of which Close column is which
    – If we don’t specify a suffixes part in the formula, pandas package will just create “Close_x” and “Close_y” header names

Once the data sets are merged, we will have “NA’s” or blanks on days where there wasn’t an overlap in dates.

If you want to see what days will have “problems” check out the trading holidays on NYSE and TSX:

For example, in May 2022, in Canada markets were closed on Victoria Day on May 23, while in US markets were closed on May 30 for Memorial Day. The same issue happens for Thanksgiving. In Canada, markets were closed on October 10, 2022, and in US, markets were closed on November 24, 2022.

Missing share prices in May for Canadian and US holidays

Fillna formula

Let’s now go over how we fixed the merged data:

merged_fixed = merged.sort_values('Date').fillna(method='ffill')
  • We took the “merged” variable and made sure the dates were sorted first in ascending order by date using .sort_values(‘Date’)
  • .fillna(x) is then used to fill in any NA errors or blanks with “x”
  • For example .fillna(0) would fill in any blanks with the number 0
  • In this case we don’t want to have a day where the share price was 0, since it could ruin our time-series charts and any analysis we’re doing (e.g. if we’re calculating daily returns later, it might show a day where stock dropped 100%, even though that wasn’t the case)
  • Instead we used method=‘ffill’ to say that we want to “forward fill”, i.e. take the row above and fill it forward (or downward) in the blanks rows below

If we wanted to sort our table in descending order (newest date to old), then the code would look like this:

merged_fixed = merged.sort_values('Date', ascending=False).fillna(method='bfill')

Here is what those May 2022 days look like now after the fillna formula, the blue cells were copied over to the yellow blank rows below:

Fixed merged table next to original table. Share prices have been copied down to fill in missing prices.

Disclaimer: Depending on the analysis you are trying to do, this might not be the correct solution. Sometimes you may want to populate blanks with a 0, sometimes with the median or average of the numbers in the column, and sometimes you may just want to remove any blank rows. With the pandas package in Python you can remove rows with .dropna() which we’ll cover in a future demo.

Below are all the codes used in the Excel file:

--

--

Bogdan Tudose
The Marquee Group

Bogdan delivers a variety of courses at The Marquee Group that focus on financial modeling, data sciences, and programming. www.linkedin.com/in/tudosebogdan/