Patrick B Cullinane

Patrick B Cullinane

Mentor
5.0
(1 reviews)
US$15.00
For every 15 mins
1
Sessions/Jobs
free badge
First 15 mins free for your first session
ABOUT ME
Data Scientist with experience tackling real-world business problems
Data Scientist with experience tackling real-world business problems

I am a self-taught programmer and data scientist with experience in finance, marketing, and the construction industries. I've built machine learning pipelines that using python, spark, and other cloud technologies. I am passionate about natural language processing and in particular topic modeling. I've been writing Python for 3+ years, and I am deeply knowledgable in its applications in data science and data engineering.

Eastern Time (US & Canada) (-04:00)
Joined March 2020
EXPERTISE
3 years experience | 1 endorsement
I am a self-taught programmer with experience writing python both in Jupyter notebooks and script. My specialties are data-wrangling with...
I am a self-taught programmer with experience writing python both in Jupyter notebooks and script. My specialties are data-wrangling with pandas, and application of functional programming in machine learning. I am also skilled at object-oriented programming.
View more
View more
3 years experience
1 year experience
I have worked applying machine learning to real-world use cases for a year professional and several years before that on my own. I am dee...
I have worked applying machine learning to real-world use cases for a year professional and several years before that on my own. I am deeply skilled with sklearn and the various steps needed to create a viable machine learning pipeline.
1 year experience
I have worked on cloud-based spark machine learning pipelines.
I have worked on cloud-based spark machine learning pipelines.
1 year experience
1 year experience
Experience working with CloudFront, EMR, S3, and other technologies.
Experience working with CloudFront, EMR, S3, and other technologies.
1 year experience

REVIEWS FROM CLIENTS

5.0
(1 reviews)
Connor James
Connor James
April 2020
Patrick is excellent, very experienced and good at explaining and resolving problems. Highly recommend
SOCIAL PRESENCE
GitHub
predicting_stock
Jupyter Notebook
1
0
Fantasy_football
multidimensional knapsack problem
Python
1
2
EMPLOYMENTS
Data Scientist
IBM
2019-10-01-Present
-Create cloud-based machine learning pipelines using spark -Create NLP models for use in network analysis, sentiment analysis, and other ...
-Create cloud-based machine learning pipelines using spark -Create NLP models for use in network analysis, sentiment analysis, and other business applications. -Use optimization techniques such as linear programming to maximize business outcomes for clients.
C++
Pandas
Nltk
View more
C++
Pandas
Nltk
Docker
Python 3
NLP (Natural Language Processing)
Neural Networks
Apache Spark
Kubernetes
View more
Data Scientist Intern
BerlandTeam
2019-06-01-2019-10-01
-Create machine learning pipelines for the analysis of social media data. -Conduct marketing-survey analysis to include Factor Analysis, ...
-Create machine learning pipelines for the analysis of social media data. -Conduct marketing-survey analysis to include Factor Analysis, PCA, and other dimensionality reduction techniques.
Python 3
Google Cloud Platform
View more
Python 3
Google Cloud Platform
View more
Business Analyst
Bond Brothers
2017-01-01-2019-06-01
-Analyze project budgets for schedule and cost tracking purposes. -Create data analysis pipelines for use in cost reporting.
-Analyze project budgets for schedule and cost tracking purposes. -Create data analysis pipelines for use in cost reporting.
Python 3
Python 3
PROJECTS
Contract AnalyzerView Project
self
2019
The purpose of this project is to take a legal document, like a contract, model the topics and create a pipeline to tag parts of the docu...
The purpose of this project is to take a legal document, like a contract, model the topics and create a pipeline to tag parts of the document with a relevant label. This notebook will focus on the the preprocessing of the data, the topic modeling and the creation of the training set. Ultimately the code in this repo will be useful for people who want to understand a complex legal document such as a credit card agreement more clearly. The data comes from the following link: https://www.consumerfinance.gov/credit-cards/agreements/ The Consumer Financial Protection Bureau (CFPB) collects credit card agreements from creditors on a quarterly basis and posts them at the link above. The CFPB organizes the data by putting each participating company in a directory and then collecting all the statements in a directory for each company. For Q4 of 2018 there are 652 companies and each company has on average 2-4 agreements. For most people contract documents are not fun to read because they are usually written in complex legal jargon and the style of writing is purposely dry so as to spell out worst-case scenarios. That said it is important to understand what you or your business is getting into before signing any sort of agreement. Because it takes a certain type of expertise to understand these documents I feel it would be interesting to see if we can leverage natural language techniques to tag this these documents This repo will enable you to insert a credit card agreement pdf and output labeled sections of the documents to make it easier to read the document. Please see example.ipynb for a walkthrough on how to use this repo. The notebook contract_reader.ipynb has further details on how the repo is constructed.
GitHub
Pandas
Machine Learning
View more
GitHub
Pandas
Machine Learning
Nltk
Python 3
NLP (Natural Language Processing)
View more
Headline AnalysisView Project
self
2019
Overall the dataset contains over 200K headlines from the Huffington Post between 2012 and 2018. The dataset has six columns that capture...
Overall the dataset contains over 200K headlines from the Huffington Post between 2012 and 2018. The dataset has six columns that capture the category, headlines, author, link, description, and date the article was published. Overall there are 40 different categories ranging from politics to education. In general the top categories are politics, wellness, and entertainment. For the purposes of this notebook we won't be using the other columnns but it is worthy noting that each date may have more than one headline. More information about the data can be found below this abstract. The goal of the notebook will be to take the headline column and use topic modeling to recreate the categories. Since we already have hand-labeled category information it will be interesting to see if our models match the ground truth data that we have. To accomplish this we will use non-negative matrix factorization (NMF) to 1) choose the optimal number of topics and 2) associate documents/terms with those topics. NMF is explained in further detail below, but basically it decomposes a document-term matrix into factors by which you can parse document/topics and document/terms from. The project will happen in multiple stages consisting of 1) preprocess the text, 2) create a document-term matrix using tf-idf 3) create the NMF model using the doc-term matrix 4) select the optimal number of topics using word2vec and calculate topic coherence 5) based on the optimal k topics print out top terms, documents and compare to original labels for accuracy.
Python
Pandas
Machine Learning
View more
Python
Pandas
Machine Learning
Nltk
NLP (Natural Language Processing)
View more