× {{alert.msg}} Never ask again
Get notified about new tutorials RECEIVE NEW TUTORIALS

Counting number of occurrences on Pandas DataFrame columns

Gustavo Bragança
Aug 27, 2016
<p><a href="https://upload.wikimedia.org/wikipedia/commons/4/49/Veronica_Belmont_interviewing.jpg"><img alt="interview" class="right" src="https://upload.wikimedia.org/wikipedia/commons/4/49/Veronica_Belmont_interviewing.jpg" style="height:338px; margin-left:100px; margin-right:100px; width:512px"></a></p> <p> </p> <p>Let's suppose you interviewed 12 people and asked them if they sleep and if they are hungry. Then, you have loaded your table to Python as a <a href="http://pandas.pydata.org/">Pandas</a> <a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html">DatraFrame</a> and it looks like the one below.</p> <p> </p> <table border="1"> <thead> <tr> <th> </th> <th>ID</th> <th>Do you sleep?</th> <th>Are you hungry?</th> </tr> </thead> <tbody> <tr> <th>0</th> <td>1</td> <td>Yes</td> <td>No</td> </tr> <tr> <th>1</th> <td>2</td> <td>Yes</td> <td>Maybe</td> </tr> <tr> <th>2</th> <td>3</td> <td>No</td> <td>Yes</td> </tr> <tr> <th>3</th> <td>4</td> <td>Yes</td> <td>No</td> </tr> <tr> <th>4</th> <td>5</td> <td>No</td> <td>Yes</td> </tr> <tr> <th>5</th> <td>6</td> <td>No</td> <td>Maybe</td> </tr> <tr> <th>6</th> <td>7</td> <td>Yes</td> <td>Yes</td> </tr> <tr> <th>7</th> <td>8</td> <td>No</td> <td>No</td> </tr> <tr> <th>8</th> <td>9</td> <td>Yes</td> <td>Maybe</td> </tr> <tr> <th>9</th> <td>10</td> <td>Yes</td> <td>Yes</td> </tr> <tr> <th>10</th> <td>11</td> <td>Maybe</td> <td>Maybe</td> </tr> <tr> <th>11</th> <td>12</td> <td>Maybe</td> <td>Yes</td> </tr> </tbody> </table> <p> </p> <p>And you want to count the number of answers for each question. This can be obtained through two steps:</p> <ol> <li>We first <a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.melt.html">melt</a> the data: <pre># We don't need the ID's, so we skip that collumn. melted_data = pd.melt(data, value_vars=['Do you sleep?', 'Are you hungry?'],                        var_name='question', value_name='answer') </pre> </li> <li>Then we '<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.groupby.html">groupby</a>' it: <pre>melted_data.groupby(by=['question', 'answer'])['answer'].count()</pre> </li> </ol> <p><br> And we get:</p> <pre>question answer Are you hungry? Maybe 4 No 3 Yes 5 Do you sleep? Maybe 2 No 4 Yes 6 Name: answer, dtype: int64</pre> <p><br> Very cool, right? </p> <p> </p> <p> </p> <p> </p>
comments powered by Disqus