Pandas is a data analysis library written in Python. In this post, I will show you how powerful it is to help you quickly get some insight from different dataset.
We will install pyenv first, pyenv is a conveient tool if you want to use multi python version in your laptop.
$ brew install pyenv
Install python 3 and pandas
We use python 3 here
$ pyenv install 3.5.0 $ pyenv global 3.5.0 $ pyenv rehash $ pip install pandas
First, we need to import pandas library. Just create a
demo.py file, and add the line below.
import pandas as pd
You can read json file using
pd.read_json, it will store the data in DataFrame, you can imagine DataFrame like a virtual table
## read data from json and store in dataframe user_df = pd.read_json('users.json') ## show first 5 data user_df.head()
Load csv data, basically the same operation like above, just different file format, pandas suport a lot file format like json, csv, excel...
quiz_df = pd.read_csv('quiz.csv') quiz_df.head()
Now we can start find some insight in data, first let's try to find max year in quiz
# find max year in quiz data max_years = quiz_df['years'].max() print(max_years)
Try to get data with max year in quiz, pandas use boolean mask to filter data, you will find boolean mask is a powerful tool when you want to query data with some complicate condition
quiz_df['years'] == max_years quiz_df[quiz_df['years'] == max_years]
# aggregate average years in quiz data mean_years = quiz_df['years'].mean() print(mean_years) #%% # agregate familiar language count result = quiz_df["familiar language"].value_counts() print(result) #%% # find user using the most popular language popular_language = result.index quiz_user_with_popular_language = quiz_df[quiz_df['familiar language']==popular_language] print(quiz_user_with_popular_language) # join quiz with user using right join #%% quiz_with_user = pd.merge(user_df, quiz_df, how='right', left_on = 'email', right_on = 'email') print(quiz_with_user) # drop na user data #%% result = quiz_with_user.dropna() print(result) # find user willing to use code editor result = result[result['will you want to use code editor']=='T']