Spreadsheets Weren’t Meant to be Databases

Show of hands, how many of you have built an excel sheet and used it to store information?

Probably everyone, right? When you’re tracking your weight loss progress or tracking your stock holdings, it’s not a bad idea. The problem is when you start tracking things like businesses, contacts at those businesses, sales you’ve made, and what items were in the sale.

As you see this can get out of hand pretty quickly, especially when you start having fields that are uneven. For example we’ll compare two of my customers. We’ll call one A Electric and B Electric.

A Electric has 5 people at the entire company, this includes 2 electricians, a project manager, an owner who is also the estimator and salesman, and a receptionist/accountant.

B Electric has two project managers, two estimators, two superintendents, four foremen, an office manager, an accountant, sixteen electricians, an equipment manager, a business development manager, and a truck driver.

These fields aren’t going to align into matching columns.

What we need is a relational database.

I’m just going to do ahead build one using SQL lite.

Libraries Needed

I need three libraries to do this. Pandas, OS, and sqlite3. Pandas is the basic data science framework for python. There are others, but Pandas is the most commonly used. OS is my preferred library for pathing. sqlite3 is the library for building the database and interacting with it.

#importing libraries
import pandas as pd #pandas is our main ds framework
from os import path #os for pathing
import sqlite3 #for building and accessing our database

Pathing Directories

Next, we need data directories and a database directory. We can use multiple data directories if our data is coming from different sources. We can also have the database directory and data directory be the same, but I don’t do this because I want to be able to move the data later if needed.

#handle directories
DATADIR = 'C:\\Users\\Brian Reher\\Desktop\\ffb\\ltcwff-files-main\\data' #pathing our data
DBDIR = 'C:\\Users\\Brian Reher\\Desktop\\dataBase' #pathing to where I put the db

Creating a Database and Connecting it to the Directory

Next we need to create the database itself and a connection to it. This database will be called ‘dataBase’ and will be an sqlite file

#create connection which includes generation of the database itself
conn = sqlite3.connect(path.join(DBDIR, 'dataBase.sqlite'))

Loading the Data into Python

Before we can put the data into python, we need to load it into python.

#load the data into the CSV
playerGame = pd.read_csv(path.join(DATADIR, 'game_data_sample.csv')) #load PlayerGame
player = pd.read_csv(path.join(DATADIR, 'game_data_player_sample.csv'))
game = pd.read_csv(path.join(DATADIR, 'game_2017_sample.csv'))
team = pd.read_csv(path.join(DATADIR, 'teams.csv'))

Writing to the Database

Now all that’s left to do is to write the variables to the database. Each spreadsheet will be its own table within the database.

#Writing to SQL
playerGame.to_sql('player_game', conn, index=False, if_exists='replace')
player.to_sql('player', conn, index=False, if_exists='replace')
game.to_sql('game', conn, index=False, if_exists='replace')
team.to_sql('team', conn, index=False, if_exists='replace')

Testing

So if we’ve done everything right, we have a database. Looking in the database folder, we can see an SQL file was indeed created.

However that only tells us that there is a file, not what is in the file. Let’s run a query to test it. Python’s SQL lite library allows us to run SQL queries in the python IDE.

#select all player data
df = pd.read_sql( #this string tells pandas to look for sql code
    """
    SELECT * 
    FROM player
    """, conn)

We new have a dataframe (df) that is a selection of all of our player data. Let’s see how many rows and columns that is using the shape function.

df.shape
(50, 5)

We have 50 players selected and five columns. That sounds right. Let’s take the first 10 rows to make sure though.

df.head(5)

So we have a playerID, the season, team, position and player. There are methods of cleaning, joining, etc. But what we sought out to do was create an SQL database. Nothing more. We will save the rest for another time.

Leave a Reply

Your email address will not be published. Required fields are marked *