Data Science Tutorials and Insights

Learn about the latest trends in Data Science. Read tutorials, posts, and insights from top Data Science experts and developers for free.

GET STARTED

Data Science tutorials, posts, and more

Cheat Sheet: Python For Data Science

The cheat sheet is a handy addition to your learning, as it covers the basics, brought together in seven topics, that any beginner needs to know to get started doing data science with Python.
Cheat Sheet: Python For Data Science

How to Set Up Hadoop in Pseudo Distributed Mode on Single Cluster

This step-by-step tutorial will teach you how to set up Apache Hadoop in Pseudo-Distributed Mode on Single cluster.
How to Set Up Hadoop in Pseudo Distributed Mode on Single Cluster

Spark & Python: MLlib Basic Statistics & Exploratory Data Analysis

In this Spark and Python tutorial, you'll learn more about MLlib basic statistics and exploratory data analysis.

Data Science with Python & R: Data Frames II

This is the continued tutorial for learning data science with Python & R. In this part, you will be learning about data selection and function mapping.
Data Science with Python & R: Data Frames II

Spark & R: Downloading data and Starting with SparkR using Jupyter notebooks

In this tutorial we will use the 2013 American Community Survey dataset and start up a SparkR cluster using IPython/Jupyter notebooks.
Spark & R: Downloading data and Starting with SparkR using Jupyter notebooks

Data Science with Python & R: Data Frames I

These series of tutorials on Data Science will try to compare how different concepts in the discipline can be implemented into the two dominant ecosystems nowadays: R and Python.
Data Science with Python & R: Data Frames I

Data Science with Python & R: Exploratory Data Analysis

In this article, we will take a exploratory look at the crucial steps in Python's and R's data analytics process.
Data Science with Python & R: Exploratory Data Analysis

Spark & Python: Working with RDDs (II)

This is a Spark and Python tutorial that teaches you how to work with RDDs (Part II).

Spark & Python: MLlib Decision Trees

In this tutorial, you'll learn how to use Spark's machine learning library MLlib to build a Decision Tree classifier for network attack detection and use the complete datasets to test Spark capabilities with large datasets.

Adding Flow Control to Apache Pig using Python

So you like Pig but its cramping your style? Are you not sure what Pig is about? Are you keen to write some code to write code for you? If yes, then this is for you.
Adding Flow Control to Apache Pig using Python

Spark & Python: MLlib Logistic Regression

In this tutorial, you will learn how to use Spark's machine learning library MLlib to build a Logistic Regression classifier for network attack detection.

Spark & Python: Working with RDDs (I)

This tutorial introduces two different ways of getting data into the basic Spark data structure, RDD.

Building a Movie Recommendation Service with Apache Spark & Flask - Part 2

This Apache Spark tutorial goes into detail on how to use Spark machine learning models, or even another kind of data analytics objects, within a web service. By using the Python language, we make this task very easy, thanks to Spark own Python capabilities and to Python-based frameworks such as Flask.
Building a Movie Recommendation Service with Apache Spark & Flask - Part 2

Building a Movie Recommendation Service with Apache Spark & Flask - Part 1

Spark & Python: SQL & DataFrames

This tutorial will introduce you to Spark capabilities. By using SQL language and data frames, you can perform exploratory data analysis easily.

Building Web Data Products with R & Shiny

In this tutorial, we will introduce Shiny, a web development framework and application server for the R language. In simple terms, Shiny can make data analysis into interactive web apps.
Building Web Data Products with R & Shiny

Data Science with Python & R: Sentiment Classification Using Linear Methods

In this tutorial, you'll learn how to create sentiment classification using linear methods with Python and R
Data Science with Python & R: Sentiment Classification Using Linear Methods

Data Science with Python & R: Dimensionality Reduction and Clustering

An important step in data analysis is data exploration and representation. In this tutorial we will see how by combining a technique called Principal Component Analysis (PCA) together with Cluster, we can represent in a two-dimensional space data defined in a higher dimensional one while, at the same time, be able to group this data in similar groups or clusters and find hidden relationships in our data.
Data Science with Python & R: Dimensionality Reduction and Clustering

Spark & R: Loading Data into SparkSQL Data Frames

In this second Spark & R tutorial, we will read data into a SparkSQL data frame as well as have a quick look at the schema.
Spark & R: Loading Data into SparkSQL Data Frames

Building Data Products with Python: Adding User Management to a Django website

This is the second tutorial on our series on how to build data products with Python. In this second tutorial, we will add user management. This is an important part. Once we are able to identify individual users, we will be ready to generate user recommendations through machine learning.
Building Data Products with Python: Adding User Management to a Django website