Collections /

Cache Threading and Extensibility

Last updated Jan 20, 2021
1

An overview of how to support threading and optimize accessibility when working with caches, with specific reference to the n-way associative cache problem and the solution code available here: https://github.com/jordankingg/NWaySetAssocCache

Threading

Multiple threads can be used in a program to reduce effective processing time, thus speeding up your programs. However, there is an overhead required for the system to manage multiple threads, as well as an additional overhead from the code required to protect your code from breaking in a multi-threaded environment.

While adding support for multi-threading will cause a performance impact on your application, if done properly this impact will usually be fairly minor and the performance benefits of using multiple threads will far outweigh this necessary overhead.

Cache threading considerations

For "library" applications (ex. a cache app designed for use within other programs), actually creating and using multiple threads within the library is relatively uncommon - after all, we don't want library threading to interfere with threading performed within the containing app using this library!

However, it is common practice to add support for multiple threads within libraries when there is a reasonable chance that a containing app may wish to use threads. If this is not done in the library code, then containing apps using multiple threads may encounter poor performance, data corruption, and/or freezing and crashing when executing the library functions.

If performance is paramount, there may also be multiple versions of a library released (ex. both a "thread-safe" and a "non thread-safe" version), where the non thread-safe version will offer better performance in single-threaded applications.

Threading in sample solution

The sample solution is definitely NOT thread-safe. It does not use thread-safe collections, and there is no "locking" code to directly protect against threading concerns.

So in this section, I'll instead explain why it is not thread safe. Let's use an example: consider a multi-threaded application that is using your cache app. We have Thread 1 and Thread 2 both looking for value "X" in the cache at around the same time.

Thread 1 doesn't find the value in the cache, and is about to perform a replacement of the existing value "Y" with the value "X" from main memory. However just before it does, Thread 2 takes over and does it's search for "X". Since Thread 1 didn't get to finish the replacement, Thread 2 still doesn't see "X" and so begins to perform a second identical search for "X" in main memory.

This is already an extra, unnecessary search that would not have been needed if Thread 1 had just waited a tiny bit longer. But even worse, Thread 2 might have taken over right after the replacement algorithm removed a value from the cache but before it added the new value - in this case, Thread 2 might complete its own search and remove ANOTHER value (remember that Thread 1 already removed a value for the replacement algorithm, but Thread 2 doesn't know that), and then add the "X" value into the cache. Thread 1 takes over again, and will continue where it left off - adding this same value to the cache a second time.

Can you see the main issue with the base code here? Why is it possible for this threading issue to occur, and what can be done to prevent it (while still actually using multiple threads)?

The main issue here is that the replacement operation can be interrupted part way through. As such, the simplest way to prevent this issue in a multi-threaded environment is to ensure that once a thread starts the sensitive operation, it completes the entire operation before allowing any other thread to take over control.

This "locking" of the thread for the operation can be done a few different ways, See "Managed threading" and "Thread-safe collections" sections for more details.

Managed threading

The manual adjustment to your code to support multiple threads is called "managed threading", and while it might not take too many changes to the code to implement this, it can take some time to figure out all the places where threading can cause issues and determine the best way to fix it. Multi-threading can get complicated quickly!

The standard way to handle multiple threads in your code is to utilize a "lock", "semaphore", or "mutex", as appropriate. This requires a solid understanding of threading concepts to implement, mainly because you need to know where exactly in your code to implement these protections.

Be sure to take a look at the "Thread-safe collections" section for an easier method to make your cache app thread-safe!

Thread-safe collections

There is a "quick and easy" way to protect your cache library app against these issues - namely, thread-safe collections!

The default collection libraries like Set, Queue, Dictionary, etc. are non thread-safe, which is why your app might experience issues in a multi-threaded environment when managing the cache using these libraries. Thankfully, C# includes alternative thread-safe Collections libraries that you can use instead!

These collections work just the same for you with all the same methods available, the difference being that the internal code executed when you call a function like "put" or "delete" is thread-safe. In other words, these alternative thread-safe collections have already done the "managed threading" work for you. It is possible that a cache app might have operations on custom objects like CacheSet or Algorithm that require extra coding to protect for mutli-threaded environments, but from what I can see in the sample solution a simple switch to using these thread-safe collections would be all that's needed to make the cache app thread-safe.

As a side-note, it is worth noting that the main concern for threading issues in a cache app is performing write operations on the cache itself (ie. adding, deleting, or updating values in the cache). If you only ever read data from a collection, it doesn't really matter how multiple threads interact with your code. It is when this data is changed in one thread that issues in other threads could occur.

Extensibility

The term "extensible" can refer to code being well-designed and easy to extend in general, or more specifically it can refer to using interfaces and/or delegates (though most commonly, interfaces) to allow more freedom when working with your code.

In the "general" sense, this essentially means writing good, clean code. Included here are some links to help get an idea what "good code" really means.

Extensibility in sample solution

The sample solution does a good job of being extensible, in both the "specific" and "general" sense.

In general, this code is nicely organized into classes, with clear, formatted comments for each function. It consistently uses the camelCase naming convention, and has clear, informative variable and function names. Each function does what the name of the function implies that it does, and nothing else.

The ICacheAlgorithm is a great example of the "specific" type of extensibility. Using an interface to represent a replacement algorithm allows the user to specify all the of functionality the replacement algorithm must support. It is also broken into separate functions (ie. as opposed to a single "doReplacement" function), which adds specificity to the requirements and makes it a bit more clear to a developer how they can implement their own algorithm to work for this.

Along the same lines, the <K, V> generics seen throughout the code allow this cache to work with essentially any data types imaginable.

Now, extensibility in this sense can definitely go too far. For example, this developer could have created an abstract class "StandardAlgorithm" that implemented ICacheAlgorithm and provided some functionality common to both LRU and MRU, with the LRU and MRU classes extending this StandardAlgorithm class.

Technically, this would have been even more extensible, as a user could choose to extend StandardAlgorithm for their custom algorithm and save themselves a bit of work since StandardAlgorithm already did some functions for them. But, would this really be necessary?

And that is really the question to ask when deciding how "extensible" to make your code. If you can see it being useful to make things a bit more flexible (ex. using generics or interfaces), then do it! But if it wouldn't really add much to the program and would just be extra lines of code for a developer to sift through, it's probably not worth it.

Like this collection? Save it to read later.
1