How to get Data Off Your Company’s Excel Forms (Part One)

“Hey Brian, can you work up a sales summary for the past year?”

Sure I can, let’s take a look a the records, and….. oh.

So there’s no database? Just a folder full of these custom, manually filled out forms like this? Well, we can do this. You owe me a beer though.

So we’re going to start with the “lunchbreak” version of this. Lunchbreak meaning I cook up some spaghetti code that does the job, but probably isn’t all that efficient for the user, but that’s the tradeoff for something that get done in my lunch hour.

What does this program need to do at the bare minimum?

  1. Open an excel file
  2. Take the information from specific cells in the excel file
  3. Write the information to a table so that we can make a nice sales summary

Libraries We Need

Some genius decided that we only needed to read files created at the turn of the century, so we can no longer use the read xlrd library. (Thanks dingus). So we’ll have to use a different library to read our sheet.

use pip to install openpyxl and you’re good. Now we can load our libraries.

#load applicable libraries
import pandas as pd
import pandas as pd
import os as os
from os import path
import openpyxl
import xlwt
from xlwt import Workbook

Now for this version, we’re only going to read a specific file. The jerk who asked us to skip lunch can manually enter the file name in the last part. Also I’m pathing to the desktop, don’t judge.

df = openpyxl.load_workbook('C:\\Users\\Brian\\Desktop\\fileRead1.xlsx')

Now I’m going to do the main thing here, probably why you clicked the link. We’re going to read the individual cells in the excel sheets.

#read specific cells
sheet = df.active
orderNum = sheet['C3'].value
cusNum = sheet['H3'].value
cusName = sheet['E6'].value
orderDate = sheet['H6'].value
#read specific cells
sheet = df.active

#info
orderNum = sheet['C3'].value
cusNum = sheet['H3'].value
cusName = sheet['E6'].value
orderDate = sheet['H6'].value
    

#Order one
itemOneName = sheet['C9'].value
itemOneQuant = sheet['F9'].value
itemOnePrice = sheet['G9'].value
itemOneSub = (itemOneQuant * itemOnePrice)

#order two
itemTwoName = sheet['C10'].value
itemTwoQuant = sheet['F10'].value
itemTwoPrice = sheet['G10'].value
itemTwoSub = (itemTwoQuant * itemTwoPrice)

#order three
itemThreeName = sheet['C11'].value
itemThreeQuant = sheet['F11'].value
itemThreePrice = sheet['G11'].value
itemThreeSub = (itemThreeQuant * itemThreePrice)

#order three
itemFourName = sheet['C12'].value
itemFourQuant = sheet['F12'].value
itemFourPrice = sheet['G12'].value
itemFourSub = (itemFourQuant * itemFourPrice)


This is basically the idea. Now one thing that kind of sucks, is that if the excel sheet has a formula, then you basically have to impute the formula instead of just copy/pasting the code and changing the cell numbers. You can see this on the itemFourSub variable above.

Also, when your boss made this form, he made it so that the sales rep entered the customer number in the same cell as the customer number heading. Here’s how we fix this mess.

#Cleaning
cusNum = cusNum.replace('Cust#: ','') #removes string from number

Now you get just the customer number from the cell instead of putting a bunch of garbage info in there that prevents it from being a number.

One last thing I want to do is to create variables for the totals. We add Illinois sales tax of 1.08 to this.

#total price
orderTotal = itemOneSub + itemTwoSub + itemThreeSub + itemFourSub
ilTax = 1.08
finalPrice = (orderTotal * ilTax)

Now finally, we write the code to a workbook

#write to a table on excel
# Workbook is created
wb = Workbook()
  
# add_sheet is used to create sheet.
sheet1 = wb.add_sheet('orders')

#header

#cusInfoHead
sheet1.write(0, 0, 'orderNum')
sheet1.write(0, 1, 'cusNum')
sheet1.write(0, 2, 'cusName')
sheet1.write(0, 3, 'orderDate')

#orderOneHead
sheet1.write(0, 4, 'itemOneName')
sheet1.write(0, 5, 'itemOneQuant')
sheet1.write(0, 6, 'itemOnePrice')
sheet1.write(0, 7, 'itemOneSub')

#orderTwoHead
sheet1.write(0, 8, 'itemTwoName')
sheet1.write(0, 9, 'itemTwoQuant')
sheet1.write(0, 10, 'itemTwoPrice')
sheet1.write(0, 11, 'itemTwoSub')

#order details
sheet1.write(0, 12, 'itemThreeName')
sheet1.write(0, 13, 'itemThreeQuant')
sheet1.write(0, 14, 'itemThreePrice')
sheet1.write(0, 15, 'itemThreeSub')

#order details
sheet1.write(0, 16, 'itemFourName')
sheet1.write(0, 17, 'itemFourQuant')
sheet1.write(0, 18, 'itemFourPrice')
sheet1.write(0, 19, 'itemFourSub')

#order details
sheet1.write(0, 20, 'orderTotal')
sheet1.write(0, 21, 'ilTax')
sheet1.write(0, 22, 'finalPrice')

#Write the data to the relevant fields

#customer info
sheet1.write(1, 0, orderNum)
sheet1.write(1, 1, cusNum)
sheet1.write(1, 2, cusName)
sheet1.write(1, 3, orderDate)

#order details
sheet1.write(1, 4, itemOneName)
sheet1.write(1, 5, itemOneQuant)
sheet1.write(1, 6, itemOnePrice)
sheet1.write(1, 7, itemOneSub)

#order details
sheet1.write(1, 8, itemTwoName)
sheet1.write(1, 9, itemTwoQuant)
sheet1.write(1, 10, itemTwoPrice)
sheet1.write(1, 11, itemTwoSub)

#order details
sheet1.write(1, 12, itemThreeName)
sheet1.write(1, 13, itemThreeQuant)
sheet1.write(1, 14, itemThreePrice)
sheet1.write(1, 15, itemThreeSub)

#order details
sheet1.write(1, 16, itemFourName)
sheet1.write(1, 17, itemFourQuant)
sheet1.write(1, 18, itemFourPrice)
sheet1.write(1, 19, itemFourSub)

#order details
sheet1.write(1, 20, orderTotal)
sheet1.write(1, 21, ilTax)
sheet1.write(1, 22, finalPrice)

wb.save('table.xls')

And what we’re left with is a nice row of data in an xls sheet with a header.

This leaves us with something that can be read by pandas, or can be copy/pasted into an excel sheet with other forms. There’s a lot of ways we can play with this and make it better, but this is the super basic code to make this solution work. I had fun with it though and will be applying it to my workplace after a few upgrades. (Hence the part one in the title). Would love to hear from anyone who has any suggestions.

Leave a Reply

Your email address will not be published. Required fields are marked *