The cheat sheet is a handy addition to your learning, as it covers the basics, brought together in seven topics, that any beginner needs to know to get started doing data science with Python.
This step-by-step tutorial will teach you how to set up Apache Hadoop in Pseudo-Distributed Mode on Single cluster.
In this tutorial, you’ll be briefly introduced to machine learning with Python and Weka, a data processing and machine learning tool. The activity is to build a simple spam filter for emails and learn machine learning concepts.
This tutorial is going to go through the steps required to install Cassandra and Spark on a Debian system and how to get them to play nice via Scala.
In this analysis we will use SparkR machine learning capabilities in order to try to predict property value in relation to other variables present in the 2013 American Community Survey dataset.
The present analysis will exploit the power of SparkR to analyse large datasets in order to explore the 2013 American Community Survey dataset, more concretely its geographical features.
In this third tutorial (see the previous one) we will introduce more advanced concepts about SparkSQL with R that you can find in the SparkR documentation, applied to the 2013 American Community Survey housing data. These concepts are related with data frame manipulation, including data slicing, summary statistics, and aggregations.
In this second Spark & R tutorial, we will read data into a SparkSQL data frame as well as have a quick look at the schema.
In this tutorial we will use the 2013 American Community Survey dataset and start up a SparkR cluster using IPython/Jupyter notebooks.
This is the third part of our tutorial on how to build a web-based wine review and recommendation system using Python technologies such as Django, Pandas, SciPy, and Scikit-learn. In this part, you will learn how to use machine-learning to recommend users wines based on their preferences.