× {{alert.msg}} Never ask again
Get notified about new tutorials RECEIVE NEW TUTORIALS

How to I add intermediate sum columns to a pandas dataframe?

Martin Czygan
May 12, 2015
<p>Create an initial frame:</p> <pre><code>&gt;&gt;&gt; import pandas as pd &gt;&gt;&gt; df = pd.DataFrame(['A', 'C', 'D', 'D', 'A', 'B'], columns=['Winner']) </code></pre> <p>We will use the unique column names, so stash them:</p> <pre><code>&gt;&gt;&gt; names = ('A', 'B', 'C', 'D') # sorted(df["Winner"].unique().tolist()) </code></pre> <p>Derive the "win" event frame:</p> <pre><code>&gt;&gt;&gt; events = pd.DataFrame([[int(i==j) for i in names] for j in df["Winner"]], columns=names) </code></pre> <p>The <code>events</code> looks like this:</p> <pre><code>&gt;&gt;&gt; events A B C D 0 1 0 0 0 1 0 0 1 0 2 0 0 0 1 3 0 0 0 1 4 1 0 0 0 5 0 1 0 0 </code></pre> <p>Now we can use pandas' <a href="http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.cumsum.html" rel="nofollow">cumulative sum function</a>.</p> <pre><code>&gt;&gt;&gt; events.cumsum() A B C D 0 1 0 0 0 1 1 0 1 0 2 1 0 1 1 3 1 0 1 2 4 2 0 1 2 5 2 1 1 2 </code></pre> <p>Finally, just join what you need:</p> <pre><code>&gt;&gt;&gt; df.join(events.cumsum()) Winner A B C D 0 A 1 0 0 0 1 C 1 0 1 0 2 D 1 0 1 1 3 D 1 0 1 2 4 A 2 0 1 2 5 B 2 1 1 2 </code></pre> <p>This tip was originally posted on <a href="http://stackoverflow.com/questions/30095096/How%20to%20I%20add%20intermediate%20sum%20columns%20to%20a%20pandas%20dataframe?/30096059">Stack Overflow</a>.</p>

Get New Tutorials Delivered to Your Inbox

New tutorials will be sent to your Inbox once a week.

comments powered by Disqus