Josiah Berkebile

Josiah Berkebile

Away on vacation
Mentor
Rising Codementor
US$10.00
For every 15 mins
5
Sessions/Jobs
ABOUT ME
Hadoop Big Data Cloud Engineer
Hadoop Big Data Cloud Engineer

I have specialized in Big Data technologies, especially Hadoop technologies like Apache Spark, Flume, HBase, HDFS, Hive LLAP, Impala, etc. This career has lead me into developing applications that implement Machine Learning models, predictive algorithms, NLP algorithms, and ingest large datasets. I'm very well versed in concurrent and parallel programming and am really good with both Object Oriented as well as Functional programming approaches.

I really love teaching people and sharing my knowledge. I promise that in the time that I spend mentoring you, I will pour into you as much of my knowledge as I can to give you the best chance possible in the industry.

English
Central Time (US & Canada) (-05:00)
Joined October 2016
EXPERTISE
4 years experience
I worked in a fast-paced, goal-oriented Ruby on Rails Agile web app development team at Healthcare IT company for 6-9 months, and then si...
I worked in a fast-paced, goal-oriented Ruby on Rails Agile web app development team at Healthcare IT company for 6-9 months, and then single-handedly maintained a large Ruby on Rails app for a large financial institution for about 2 years. I've also done a few hackathons that involved building a web app for a non-profit in less than 24 hours. For 2 of the 3 hackathons, I organized and lead the team. Here's the end result of some of those challenges: http://principalsconnect.com/ http://adoption.kvc.org/
Software architectsRvmRuby on Rails
View more
8 years experience
I have 4 years of extensive experience writing Java applications that run in a Hadoop environment. I've developed data pipelines in Apac...
I have 4 years of extensive experience writing Java applications that run in a Hadoop environment. I've developed data pipelines in Apache Crunch, created custom Flume Clients, and also Flume Sources that plugin to the running Flume Agents per the Flume API, and written MapReduce algorithms that process data in the petabytes scale. Some of these Java applications ran as services and used Jersey to expose a web API and Solr to index data sitting in NoSQL storage. Spring is a bit of a heavy-weight. As a result, the various Hadoop engineering teams I've been on have chosen Dagger, rather than Spring, as the dependency injection framework for our applications since it's faster and more lightweight.
4 years experience
I have four (4) years of experience using Scala in Hadoop and data engineering environments. I've written a LOT of Scala. Some of the F...
I have four (4) years of experience using Scala in Hadoop and data engineering environments. I've written a LOT of Scala. Some of the Flume plugins I've written have been written in Scala. I've written a Flume Client that used Scalatra to listen for events using Webhooks and then distributed them to an array of Flume Clients using the Akka concurrency framework. I've extended and refactored a Scala Akka web scraper. I've leveraged ScalaTest and ScalaCheck in practically all of my Scala projects. In my latest projects, I've leveraged the Typelevel/Cats library to take advantage of the categorical types that library provides (which facilitate's a more "Haskell" style to the code architecture). I've also written a few Spark applications in Scala. Some of my smaller projects have involved implementing a PageRank and Collaborative Filtering algorithms, and the most ambitious project I've done on Spark was an NLP classifier. I have a decent amount of experience architecting, profiling, and tuning Spark applications. All of the applications I've written in Scala have been highly concurrent, some of them have leveraged Software Transactional Memory libraries like ScalaSTM to simplify multi-threaded interactions with shared memory.
Apache HadoopApache SparkStanfordnlp
View more
8 years experience
Python is a versatile language, and my experience in it is also very versatile. I've used Python to do data exploration using NumPy and ...
Python is a versatile language, and my experience in it is also very versatile. I've used Python to do data exploration using NumPy and Pandas, I've taken Python code from data scientists and translated it into PySpark or into Scala Spark applications, I've written some systems automation scripts in Python, and written scripts for moving or ingesting data between databases and systems on the petabytes scale. Most of my experience is in Python 2.7, but I am also familiar with legacy Python 2.6 and also Python 3.
2 years experience
I used Haskell as my vehicle for learning Functional Programming. Since Haskell is a pure functional language, it would not let me fall ...
I used Haskell as my vehicle for learning Functional Programming. Since Haskell is a pure functional language, it would not let me fall back on my old procedural object-oriented habits. I succeeded in learning enough Haskell on my own to become productive in the language. This learning enabled me to pick-up Scala more easily than most other members of the engineering teams I've been a part of. If you are learning Functional Programming or just getting started with Haskell, I can certainly get you to a level where you will be productive in the language.

REVIEWS FROM CLIENTS

Josiah's profile has been carefully vetted and approved as a Codementor. Connect with Josiah now, and leave a review for them once you're done!
EMPLOYMENTS
Senior Data Engineer
Pinsight Media
2018-04-01-Present
Mostly writing Spark processing pipelines on very large (1/2 petabyte or more) datasets.
Mostly writing Spark processing pipelines on very large (1/2 petabyte or more) datasets.
Scala
Linux
Shell
View more
Scala
Linux
Shell
Pandas
Apache Spark
Apache Kafka
Apache Hadoop
Apache Airflow
View more
Hadoop Architect
Triple-I Corporation | AMC Theatres
2017-05-01-2018-04-01
I'm playing a lead role in getting AMC Theatres Big Data initiative off the ground. Responsibilities and Accomplishments: - Extended a S...
I'm playing a lead role in getting AMC Theatres Big Data initiative off the ground. Responsibilities and Accomplishments: - Extended a Spark Sentiment Analyzer written in Scala using Stanford CoreNLP to analyze complex customer feedback. - wrote a custom Flume Source plugin in Java and Scala + Cats for ingesting a vendor's realtime HTTPS event stream - used Scala, Akka, Scalatra, and Cats to develop an HTTP-based Custom Flume Client - Co-Administrator of a CDH5 (Cloudera) cluster - Training for Hadoop software development and Scala programming to peers/engineers - Development process and workflow advisor - Exploratory research and project idea generation - Develop new solutions/Apps leveraging Hadoop technologies including Flume, Spark, Impala, Hive, and HBase - Deploy new Hadoop apps and plugins to a Kerberized CDH 5 cluster - Rig applications to execute through Sysvinit, Upstart, or Systemd - Rig system-initiated applications to auto-authenticate to Kerberos using keytabs - Automation Engineer and advisor - Haskell-style functional programming in Scala using Cats - Imported deeply nested JSON files into Hive and Impala and flattened it out into a traditional SQL table structure. - wrote real-time data ingestion to HDFS apps using Linux Shell scripting, Python, Java, and Scala - Created a Docker CDH 5 development sandbox for prototyping.
Java
Scala
Pandas
View more
Java
Scala
Pandas
Machine Learning
NLP (Natural Language Processing)
Apache Spark
Apache Hadoop
Apache flume
Python 2
View more
Hadoop Architect
Oalva, Inc
2016-10-01-2017-04-01
Automated full SQL translations of databases with schemas containing thousands of schemas and over 100 thousand queries using Python. Thi...
Automated full SQL translations of databases with schemas containing thousands of schemas and over 100 thousand queries using Python. This Python translated SQL between multiple data stores such as Teradata, Hive LLAP, and HAWQ/Greenplum. Ran performance profiling, tuning, and analysis on Hive LLAP and HAWQ/HDP/Greenplum databases. Wrote a data comparator using Python and Pandas which provided the differences between data sets and the reason for each difference (precision error vs incorrect value vs NULL value, etc). Wrote a concurrent Python script for automating the transfer of data from Hive into both Netezza and Hbase. Data extraction and data loading were both done using dynamic concurrent processing in Python. PoC'd a Spark SQL on top of Hive LLAP application.
SQL
Pandas
HBase
View more
SQL
Pandas
HBase
Teradata
Systems Programming
Netezza
Apache hawq
Hive llap
Python 2
Apache Hive
View more
PROJECTS
Overnight Website ChallengeView Project
KVC Health Systems, The Nerdery
2014
Built a new website for KVC Health Systems in 24 hours.
Built a new website for KVC Health Systems in 24 hours.
HTML/CSS
Ruby on Rails
PostgreSQL
View more
HTML/CSS
Ruby on Rails
PostgreSQL
Heroku
JavaScript
View more
Overnight Website ChallengeView Project
PrincipalsConnect
2017
Built a website in 24 hours for PrincipalsConnect
Built a website in 24 hours for PrincipalsConnect
HTML/CSS
Ruby on Rails
PostgreSQL
View more
HTML/CSS
Ruby on Rails
PostgreSQL
Heroku
Continuous Integration
Docker
React
JavaScript
Continuous Deployment
Redux
View more