Codementor Events

Putting Python to work

Published Jun 29, 2018Last updated Dec 26, 2018
Putting Python to work

When I began to learn to code, I was working as a lease operator or “Pumper” for an oil and gas company. My job involved driving out to the middle of nowhere and checking on pumping units along with oil and gas batteries. I was basically a human computer, measuring the production of oil, gas, and produced water.

While I enjoyed the independent nature of the job and working outside, it soon became very monotonous, especially the amount of repetitive data entry that was required. At the same time, as I was enrolled in college full-time studying computer science and so I decided to write a program that would automate the boring parts of the job.

However, at that point in my journey I just hadn’t learned enough to get the job done but it did lead me to discover coding in Python, and that’s where the journey got interesting.

python.png

I know this cartoon is a little old, but it’s exactly how I felt. All of a sudden I was about to do things with my computer that actually made it a tool! I wasn’t using someone else’s productivity software any more I was learning how to get things done on my own.

I also discovered the rich ecosystem of online tutorials, videos, and content created by so many who had gone before me and I’d be remiss not to name Al Sweigert, Michael Kennedy, Harrison Kinsley for their outstanding contributions in my early steps in Python.

Unless you’re a very special type of person, you don’t like data entry tasks. One of the first major tasks I was able to automate with Python and one that I’ve been able to create income with, is the automation of Excel spreadsheets.

I had a fairly easy in on finding ways to play with this early on. My wife, a great number of her friends, and, of course, coworkers are Petroleum Landmen, which basically means that they do the massive amounts of paper work required for oil companies to drill wells.

Any time my wife got together with any of her friends/coworkers it was basically a non-stop gripe fest about how bad their systems for dealing with all of the paperwork was and how much time they wasted manually transferring data from one spread sheet to another.

One thing led to another and I found myself volunteering to help a friend work on a data entry task that was taking her two weeks to complete by hand, the transfer of a massive PDF document into a multi-page Excel spreadsheet.


Here we’re dealing with a 1671 page example, but routinely it’s more than 2000 pages

I know that up this point I’ve been singing the praises of Python, but in my search for a way to move the large amounts of data over from the PDF to Excel, I found a very helpful tool, Tabula.

Tabula is basically able to capture all table data or similarly formatted data in a PDF and transfer it out to a CSV file. Now, it’s not perfect, there can be format errors but, it’s still a very fast way to pull large amounts of data from a PDF. The formatting you can handle afterwards. To capture the data you want, you can let tabula select the areas it thinks are tables automatically or you can you select the areas you want manually.


Tabula allows you to select a table area and use it on one or all pages of the document

The ability to customize the areas you want to pull text from makes for an easy data collection step and allows Tabula to be used across a wide variety of documents fairly easily. While there are a number of ways to pull PDF data using python like pdfminer, PyPDF2, etc but I found using Tabula to be faster and more flexible. I want to reemphasize that there were a number of formatting errors using tabula, however fixing those actually became a great learning tool later on.

I’m going to break now because I’m about to go on a long awaited, hopefully well deserved vacation. I hope to find the time while I’m relaxing to complete the next part of this series but I want to leave behind a few thoughts.

When I started the journey learning to program I began it in a multi-part way. I started by teaching myself, taking traditional CS classes at school, and devouring every online tutorial I could find. I found that heavy emphasis on web development to be hard to get past.

All I initially wanted was to make my life easier by automating some tasks and almost every resource I could find was dedicated to web development. While it’s incredibly important for any software developer to understand web development, I think that it scares off a large number of would be programmers because of how complex it can be.

I think that if more programmers were able to start out doing something like the task I’m writing about now and then easing their way into web development tasks, more might stick with it. Just my opinion.


See you after the vacation.

Discover and read more posts from Michael Porter
get started
post commentsBe the first to share your opinion
Show more replies