Codementor Events

How and why I built a Data Submission Platform

Published Jul 04, 2018
How and why I built a Data Submission Platform

About me

I have worked for Deloitte Consulting for a number of years - primarily in the public sector/Education space

The problem I wanted to solve

You've got 5600 client sites. They're spread across an area that's roughly 65 000 square miles (the size of Wisconsin). They have a mix of connectivity ranging from zero to full fibre.

Each of these sites needs to send you a file of about 100 MB each month, every month and they need it to be:

a) fast
b) reportable
c) repeatable
d) easy to use

This all needs to be loaded into a GIANT SQL Warehouse with a maximum time of 5 minutes per submission (from upload to import).

What is Data Submission Platform?

I built a series of tools:

  1. Submission - handles auth, uploads and verification
  2. Governor - data checks, validation, rejection and queuing
  3. Uploader - fast bulk import into data warehouse

Tech stack

PHP/MySQL

C#/ASP.net & MS SQL

The process of building Data Submission Platform

First the PHP/MySQL submission system was created - this was the initial blueprint for the system.

Next, I created the Upload tool - also running on PHP/MySQL. This used the CodeIgniter framework and several custom PHP modules that I created to fit the purpose. Authentication was handled using a custom auth module.

Once this proved too slow, I changed the entire system by:

  1. Splitting functions into 3 complete applications (Submission,Governor,Uploader)
  2. Changing the tech stack to C#/ASP.net/MSSQL
  3. Massive rework of the Upload Tool to parallel process and async

Challenges I faced

My initial build did everything in one - a PHP/MySQL monstrosity that required constant updating and monitoring. It also violated the primary requirement - speed. When 5600 people all try to upload 100M at the same time, you have new and exciting problems.

Then, when 5600 100MB files land on your server and you have to process them as quickly as possible, you realise that your current speed of 5 minutes per submission means 19 days of solid uploading. That's not going to work.

Completely overhauling and rewriting the entire tech system brought the time down to ~ 30s per submission. Much healthier.

Key learnings

  1. Never stop improving, never stop recoding
  2. Kill your darlings - just because you like a function or method doesn't mean it should live.
  3. In some cases, speed is everything. It doesn't matter if the UI is as pretty as you'd like, make it functional, make it efficient.
  4. As mammoth as it may seem, don't be afraid to completely change your tech stack if it's neccesary.

Tips and advice

  1. Bulk Insert is your friend - for bulk transactions into SQL Server, it's great.
  2. Parallel processing is even more your friend. If you can do one in 60s, you can find a way to do two in 30s.
  3. Document! Document everything. All the time. If I hadn't documented my initial system properly, my time to pivot would have been much longer.

Final thoughts and next steps

I now need to build a really robust reporting framework around this system and retool some of the current processes into Node/React. Extra fun.

Discover and read more posts from Rob Valentine
get started
post commentsBe the first to share your opinion
Show more replies