Codementor Events

Why doesn't my application scale?

Published Mar 21, 2018
Why doesn't my application scale?

Writing applications that scale while demand increases is critical for their success in production. Whether it is an increase in user requests, computing operations, or handling data, developers must ensure their applications perform regardless of it.
While I’ve developed business applications for many years I recently acquired an interest in scientific performance of algorithm implementation. In this article I look at how the mergesort algorithm’s performance will vary when implemented using .NET System.Collection objects (i.e. List<T>) or the primitive types (i.e. Int [] ).

Simulation
I ran a Monte Carlo Simulation with sets of random data to get the average running time for each implementation. Starting with a set of 100 all the way to 10,000,000 items.
Implementation A (credit to Daniel Knight ) uses 2 List<int> objects to store the data while it is being sorted and then merged.
Here is a small snippet of the code
1*TwW5SJcOkaGQKC4ZzE211g.png

Mergesort using Collections
Implementation B (credit to Robert Sedgewick and Kevin Wayne from Princeton University) uses Int[] arrays. Here is a small snippet of the code
1*HL3X2HDELm0aNxXb7oCqQw.png

Mergesort using primitive types
While both implementations yield the same results, a sorted list, the difference in performance is evident when data increases. Have a look at this chart
1*Rx78PGCo05pqFFAQGkcUFA.png

The running time for 10 MM items using implementation took so long I had to kill the process!
Now, I’ll be the first to admit that .NET does help developers a lot by providing objects that already sort, merge, resize, etc. and I often write code using its classes without knowing what’s happening behind the scenes.

Analysis
So why is the performance so different if the algorithm is the same? It has to do with the fact that List<T> is a generic version of an ArrayList, whose size is dynamically increased when needed. This is great when the size of your data is not fixed. But resizing the array comes with a cost. In Implementation A we are basically using the Add and the Remove methods. If we add enough items to the list, .NET will automatically resize it; the same happens when we remove items, .NET will shrink the array to reuse memory.
The List.Add method is a O(1) operation when the item count is less than the list’s capacity. However, if the List is resized it becomes an O(n) operation, where n is the item count. This matters when we are talking about 100K, 1MM items as you can tell by the chart above.
The List.Remove method performs a linear search, it is an O(n) operation.
On the other hand, operations using the int[ ] object are constant O(1).
So while the .NET Collection objects provide great functionality to write code, we need to understand how operations are implemented before we decide to solve problems with them.
When dealing with big volumes of data, paraphrasing Robert Sedgwick, you can spend a lot of money in hardware or a lot of time to solve a problem, or you can implement a good algorithm and be much more effective.

References
msdn.microsoft.com

Discover and read more posts from Paolo Délano Alonso
get started
post commentsBe the first to share your opinion
Show more replies