× {{alert.msg}} Never ask again
Get notified about new tutorials RECEIVE NEW TUTORIALS

BigQuery count distinct ColX is not equal to count of group by ColX

Marton
Nov 18, 2015
<p>If you use the DISTINCT keyword, the function returns the number of distinct values for the specified field. <strong>Note that the returned value for DISTINCT is a statistical approximation and is not guaranteed to be exact - the documentation is also clear about this.</strong></p> <p>If you require greater accuracy from COUNT(DISTINCT), you can specify a second parameter, n, which gives the threshold below which exact results are guaranteed. By default, n is 1000, but if you give a larger n, you will get exact results for COUNT(DISTINCT) up to that value of n. However, giving larger values of n will reduce scalability of this operator and may substantially increase query execution time or cause the query to fail.</p> <p>To compute the exact number of distinct values, use <code>EXACT_COUNT_DISTINCT</code>. Or, for a more scalable approach, consider using GROUP BY on the relevant field(s) and then applying COUNT(*). The GROUP BY approach is more scalable but might incur a slight up-front performance penalty.</p> <p><a href="https://cloud.google.com/bigquery/query-reference#aggfunctions" rel="nofollow">https://cloud.google.com/bigquery/query-reference#aggfunctions</a></p> <p>This tip was originally posted on <a href="http://stackoverflow.com/questions/32431490/BigQuery%20count%20distinct%20ColX%20is%20not%20equal%20to%20count%20of%20group%20by%20ColX/32432994">Stack Overflow</a>.</p>
comments powered by Disqus