× {{alert.msg}} Never ask again
Get notified about new tutorials RECEIVE NEW TUTORIALS

Calculating the mean of the data

Ray Phan
Feb 02, 2015
<p>Before we start, let's take a little bit of time to explain what the parameter <code>trim</code> is.</p> <p><code>trim</code> denotes how much fraction of data you want to cut off from both ends of the data before you want to compute what you need, assuming this is in sorted order. By doing <code>trim = 0.5</code>, you are cutting everything off except for considering the middle, which is the median. By doing <code>trim = 0.1</code> for example, the first 10% and the last 10% of the data are discarded, and you only compute the mean within the remaining 80% of the data. Note that this is a normalized fraction between <code>[0,1]</code>. This fraction is then multiplied by <code>size()</code> to determine which index in your data we need to start from when computing the mean - denoted by <code>low</code>, and also which index to stop at - denoted by <code>high</code>. <code>high</code> is simply computed by <code>size() - low</code>, as the amount of data to cut off on both sides needs to be symmetric. This is actually sometimes called the <strong>alpha trimmed mean</strong>, or more commonly known as the <a href="http://en.wikipedia.org/wiki/Truncated_mean" rel="nofollow"><strong>truncated mean</strong></a>. The reason why it is also called alpha trimmed mean is because alpha defines how much of a fraction you want to cut off from the beginning and end of your sorted data. Equivalently in our case, <code>alpha = trim</code>.</p> <p>Now onto your questions.</p> <hr> <h1>Question #1</h1> <p>The <code>*this</code> is referring to an instance of the current class which is of type <code>StatData</code>, and is ultimately trying to access <code>items</code>, which seems to be a container that contains some numbers of type <code>Real</code>. However, as Neil Kirk explained in his comment, and with what Hi I'm Dan has said, this is a very unsafe way of using <code>const_cast</code> so that you're able to access <code>items</code> so that you can sort these items. This is <strong>very bad</strong>.</p> <h1>Question #2</h1> <p>This is basically to ensure that when you're calculating the mean, you aren't dividing by zero. This condition will never be <code>&gt; 2*low</code> because the size of your data will never get higher than this point. They check to see if <code>size() &lt; 2*low</code> to ensure that you are going to divide the summation of your data by a number <code>&gt; 0</code>, which is what we expect from the arithmetic mean. Should this condition fail, this means that computing the mean is not possible, and it should output an error.</p> <h1>Question #3</h1> <p>You are dividing by <code>size() - 2*low</code> because you are using <code>trim</code> to discard the proportion of data from the beginning and from the end of your data you don't need. This exactly corresponds to <code>low</code> on the one side and <code>low</code> on the other side. Take note that <code>high</code> computes where we need to stop accumulating at the upper end, and the proportion of data that exists after this point is <code>low</code>. As such, the combination of these proportions that are eliminated is <code>2*low</code>, which is why you need to subtract this away from <code>size()</code> as you aren't using that data anymore.</p> <p>This tip was originally posted on <a href="http://stackoverflow.com/questions/25335544/Calculating%20the%20mean%20of%20the%20data/25335587">Stack Overflow</a>.</p>
comments powered by Disqus