Mikko's code blog

.NET C# cache-class for caching task-objects

April 13, 2016 • ☕️☕️☕️ 13 min read

Caching is useful, because it saves resources by persisting local versions of values, that are in some sense expensive to produce from the original source. Expensive usually means that getting the value takes a relatively long time, which is often the case e.g. when fetching values from a HTTP API.

Since C# also has a Task-concept, and async-await syntax, for long lasting operations, caches and Task-objects seem to have a somewhat natural relationship.

For a simple cache, there is, of course, no need to reinvent the wheel as .NET provides e.g. System.Runtime.Caching.MemoryCache out of the box. MemoryCache is a very general cache-implementation that we will be using when we implement a special cache that can handle Task-objects.

Why to cache Task-objects?

A traditional solution for caching would be to cache the end results of the expensive operations, not the operation-objects (Task-object in this case) them selves.

Caching end results can have one problem. Namely, if more than one cache user needs a certain value at relatively same time, some cache users might end up generating the same expensive value simultaneously. Using cache was meant to prevent unnecessary expensive operations! All the (relatively concurrent) cache users will have to first run the expensive operation to get the end result, because the cache remains empty while someone already is generating the value. Then, when all the users of the cache have their own version of the value, they all try to add it to cache, but only the first value will be used and the others will be thrown away. So much expensive value generation done for nothing!

Saving non-completed Task-objects to cache displays to all the users of the cache, that generating a certain value has already begun, and thus there is no need to start to generate another one. Also, running the generation is cache’s responsibility in this cache implementation.

What if the Task-object fails?

Task could fail when e.g. a function that retrieves a value over HTTP fails to produce a value because of a network or server error. In practice, an exception is thrown. Caching failed values is called negative caching. Our TaskCache is not a negative cache in the sense, that it does not store tasks that are known to be in a failed state. TaskCache removes failed Tasks, but there is still the problem, that before the Task runs into a failed state, any number of cache users could have gotten it. There is no way of knowing if a Task will succeed, before it actually does. This means that users of the cache can receive failing tasks.

Any cache call site, of course, would in any case have to prepare to handle the failures in the value generation, even if there was no cache. Someone would have to generate the value, and no matter where this is done, there is the same risk of a failure.

A potential problem with the TaskCache is that a cache call site will now have to prepare to handle the exceptions that could result from the factory functions provided by the call site itself or any other call site. Probably the usual case is that all the call sites, if there even is more than one, use the same factory function, but this does not have to be the case.

ITaskCache interface

public interface ITaskCache
{
    /// <summary>
    /// Return from the cache the value for the given key. If value is already present in cache,
    /// that value will be returned. Otherwise value is first generated with the given method.
    ///
    /// Return value can be a completed or running task-object. If the task-object is completed,
    /// it has run succesfully to completion. Most often when a running task is returned,
    /// it is the task returned by the function the caller has given as a parameter, but the
    /// returned task might also have a different origin (from another call to this same method).
    /// If the cache contains a task that will end up throwing an exception in the future, the same
    /// task instance is returned to all the callers of this method. This means that any given
    /// caller of this method should anticipate the type of exceptions that could be thrown from
    /// the updateFunc used by any of the caller of this method.
    ///
    /// To prevent the problem described above, as a convention, all the call sites of his method
    /// (if more than one) should use the same updateFunc-parameter and also be prepared for the
    /// exceptions that the the updateFunc could throw.
    /// </summary>
    /// <typeparam name="T">Type of the value.</typeparam>
    /// <param name="key">Key that matches the wanted return value.</param>
    /// <param name="valueFactory">Function that is run only if a value for the given key is not already present in the cache.</param>
    /// <returns>Returned task-object can be completed or running. Note that the task might result in exception.</returns>
    Task<T> AddOrGetExisting<T>(string key, Func<Task<T>> valueFactory);

    /// <summary>
    /// Invalidate the value for the given key, if value exists.
    /// </summary>
    /// <param name="key"></param>
    void Invalidate(string key);

    /// <summary>
    /// Does the cache alrealy contain a value for the key.
    /// </summary>
    /// <param name="key"></param>
    /// <returns></returns>
    bool Contains(string key);

    /// <summary>
    /// Empties the cache from all entries.
    /// </summary>
    void Clear();
}

At first, AddOrGetExisting might seem a weird function name. I mean, add or get, which is it? What that really means is: I want to get the value for this key. If you don’t happen to already have it, here’s a recipe how to make it, so that you can make it and then return it to me. So the function name should maybe be GetAndAlsoFirstGenerateAndAddItIfYouDontHaveIt. I chose to use AddOrGetExisting, because System.Runtime.Caching.MemoryCache uses same method name for similar purpose. Note, that this method in MemoryCache returns null, if there was not an existing value. On the other hand, AddOrGetExisting in our TaskCache always returns a value.

TaskCache Implementation

public class TaskCache : ITaskCache
{
    private MemoryCache _cache { get; } = MemoryCache.Default;
    private CacheItemPolicy _defaultPolicy { get; } = new CacheItemPolicy();

    public async Task<T> AddOrGetExisting<T>(string key, Func<Task<T>> valueFactory)
    {

        var asyncLazyValue = new AsyncLazy<T>(valueFactory);
        var existingValue = (AsyncLazy<T>)_cache.AddOrGetExisting(key, asyncLazyValue, _defaultPolicy);

        if (existingValue != null)
        {
            asyncLazyValue = existingValue;
        }

        try
        {
            var result = await asyncLazyValue;

            // The awaited Task has completed. Check that the task still is the same version
            // that the cache returns (i.e. the awaited task has not been invalidated during the await).    
            if (asyncLazyValue != _cache.AddOrGetExisting(key, new AsyncLazy<T>(valueFactory), _defaultPolicy))
            {
                // The awaited value is no more the most recent one.
                // Get the most recent value with a recursive call.
                return await AddOrGetExisting(key, valueFactory);
            }
            return result;
        }
        catch (Exception)
        {
            // Task object for the given key failed with exception. Remove the task from the cache.
            _cache.Remove(key);
            // Re throw the exception to be handled by the caller.
            throw;
        }
    }

    public void Invalidate(string key)
    {
        _cache.Remove(key);
    }

    public bool Contains(string key)
    {
        return _cache.Contains(key);
    }

    public void Clear()
    {
        // A snapshot of keys is taken to avoid enumerating collection during changes.
        var keys = _cache.Select(i => i.Key).ToList();
        keys.ForEach(k => _cache.Remove(k));
    }
}

Using MemoryCache provides an out-of-the-box solution for storing objects and value eviction policy and functionality. MemoryCache can for example remove the oldest records from the cache when system memory starts to be used up to a certain level.

AsyncLazy is a convenience class for handling Task-objects with Lazy-class. For implementation, check out the complete code at GitHub or the original source.

AsyncLazy guarantees that the value generation is only run when actually needed, and not every time someone calls AddOrGetExisting.

Testing TaskCache

Every decent implementation comes with a set of tests. These are implemented using NUnit framework. All tests are successful

Tests ensure among other things that the value generation is only done when needed.

[TestFixture]
public class TaskCacheTests
{
    private class TestValue
    {
        public string Value { get; set; }

        public TestValue(string value)
        {
            Value = value;
        }
    }


    private ITaskCache _cache;

    [SetUp]
    public void Setup()
    {
        _cache = new TaskCache();
        _cache.Clear();
    }

    /// <summary>
    /// Test that the cache returns the value that the valueFactory function generates.
    /// </summary>
    [Test]
    public async Task AddOrGetExisting_ReturnsValueFromValueFactory()
    {
        string testValue = "value1";
        Func<Task<TestValue>> valueFactory = () => Task.FromResult(new TestValue(testValue));

        var value = await _cache.AddOrGetExisting("key1", valueFactory);

        Assert.AreEqual(testValue, value.Value);
    }

    /// <summary>
    /// Test that for subsequent calls the value is only generated once, and after that
    /// the same generated value is returned.
    /// </summary>
    /// <returns></returns>
    [Test]
    public async Task AddOrGetExisting_GeneratesTheValueOnlyOnce()
    {
        string testValue = "value1";
        string testkey = "key1";
        int valueGeneratedTimes = 0;
        Func<Task<TestValue>> valueFactory = () =>
        {
            valueGeneratedTimes++;
            return Task.FromResult(new TestValue(testValue));
        };

        var value1 = await _cache.AddOrGetExisting(testkey, valueFactory);
        Assert.AreEqual(testValue, value1.Value);

        var value2 = await _cache.AddOrGetExisting(testkey, valueFactory);
        Assert.AreEqual(testValue, value2.Value);

        var value3 = await _cache.AddOrGetExisting(testkey, valueFactory);
        Assert.AreEqual(testValue, value3.Value);

        Assert.AreEqual(1, valueGeneratedTimes, "Value should be generated only once.");
    }

    [Test]
    public async Task AddOrGetExisting_ValueIsRebuildAfterInvalidation()
    {
        string testValue = "value1";
        string testkey = "key1";
        int valueGeneratedTimes = 0;
        Func<Task<TestValue>> valueFactory = () =>
        {
            valueGeneratedTimes++;
            return Task.FromResult(new TestValue(testValue));
        };

        var value1 = await _cache.AddOrGetExisting(testkey, valueFactory);
        Assert.AreEqual(testValue, value1.Value);

        var value2 = await _cache.AddOrGetExisting(testkey, valueFactory);
        Assert.AreEqual(testValue, value2.Value);

        Assert.AreEqual(1, valueGeneratedTimes, "Value should be generated only once.");

        _cache.Invalidate(testkey);

        var value3 = await _cache.AddOrGetExisting(testkey, valueFactory);
        Assert.AreEqual(testValue, value3.Value);

        Assert.AreEqual(2, valueGeneratedTimes, "Value should be regenerated after invalidation.");
    }

    /// <summary>
    /// Modifying a key-value-pair in the cache should not affect other key-value-pairs
    /// (when eviction policys are not causing changes).
    /// </summary>
    [Test]
    public async Task AddOrGetExisting_DifferentKeysInCacheFunctionIndependently()
    {
        string testValue1 = "value1";
        string testValue2 = "value2";
        string testkey1 = "key1";
        string testkey2 = "key2";

        int value1GeneratedTimes = 0;
        Func<Task<TestValue>> buildValue1Func = () =>
        {
            value1GeneratedTimes++;
            return Task.FromResult(new TestValue(testValue1));
        };

        int value2GeneratedTimes = 0;
        Func<Task<TestValue>> buildValue2Func = () =>
        {
            value2GeneratedTimes++;
            return Task.FromResult(new TestValue(testValue2));
        };


        var value1get1 = await _cache.AddOrGetExisting(testkey1, buildValue1Func);
        var value2get1 = await _cache.AddOrGetExisting(testkey2, buildValue2Func);
        var value1get2 = await _cache.AddOrGetExisting(testkey1, buildValue1Func);
        var value2get2 = await _cache.AddOrGetExisting(testkey2, buildValue2Func);

        Assert.AreEqual(1, value1GeneratedTimes, "Value 1 should be built only once.");
        Assert.AreEqual(1, value1GeneratedTimes, "Value 2 should be built only once.");

        Assert.AreEqual(testValue1, value1get1.Value);
        Assert.AreEqual(testValue1, value1get2.Value);
        Assert.AreEqual(testValue2, value2get1.Value);
        Assert.AreEqual(testValue2, value2get2.Value);

        // Invalidation should affect only the right key-value-pair.
        _cache.Invalidate(testkey1);

        var value1get3 = await _cache.AddOrGetExisting(testkey1, buildValue1Func);
        var value2get3 = await _cache.AddOrGetExisting(testkey2, buildValue2Func);

        Assert.AreEqual(2, value1GeneratedTimes, "Value 1 should be rebuilt.");
        Assert.AreEqual(1, value2GeneratedTimes, "Value 2 should (still) be built only once.");

        Assert.AreEqual(testValue1, value1get3.Value);
        Assert.AreEqual(testValue2, value2get3.Value);
    }

    [Test]
    public async Task AddOrGetExisting_FailedTasksAreNotPersisted()
    {
        string testValue = "value1";
        string testkey = "key1";
        string exceptionMessage = "First two calls will fail.";

        int valueGeneratedTimes = 0;
        Func<Task<TestValue>> valueFactory = () =>
        {
            valueGeneratedTimes++;

            return Task.Factory.StartNew(() =>
            {
                if (valueGeneratedTimes <= 2)
                {
                    throw new Exception(exceptionMessage);
                }
                return new TestValue(testValue);
            });
        };

        var cacheTask = _cache.AddOrGetExisting(testkey, valueFactory);
        await SilentlyHandleFaultingTask(cacheTask, exceptionMessage);
        Assert.IsTrue(cacheTask.IsFaulted, "First value generation should fail.");
        Assert.AreEqual(1, valueGeneratedTimes, "Value should be build 1 times.");

        cacheTask = _cache.AddOrGetExisting(testkey, valueFactory);
        await SilentlyHandleFaultingTask(cacheTask, exceptionMessage);
        Assert.IsTrue(cacheTask.IsFaulted, "Second value generation should fail.");
        Assert.AreEqual(2, valueGeneratedTimes, "Value should be build 2 times, because first failed.");

        cacheTask = _cache.AddOrGetExisting(testkey, valueFactory);
        var cacheValue = await cacheTask;
        Assert.IsTrue(cacheTask.IsCompleted, "Value generation should succeed the third time.");
        Assert.AreEqual(3, valueGeneratedTimes, "Value should be build 3 times, because first two times failed.");
        Assert.AreEqual(testValue, cacheValue.Value, "Cache should return correct value.");

        cacheTask = _cache.AddOrGetExisting(testkey, valueFactory);
        cacheValue = await cacheTask;
        Assert.IsTrue(cacheTask.IsCompleted, "Value generation should succeed the fourth time.");
        Assert.AreEqual(3, valueGeneratedTimes, "Value should be build 3 times, because first two times failed, but third succeeded.");
        Assert.AreEqual(testValue, cacheValue.Value, "Cache should return correct value.");
    }

    [Test]
    public async Task Contains_ReturnsTrueWhenKeyExists()
    {
        string testkey1 = "key1";
        string testkey2 = "key2";

        Func<Task<TestValue>> valueFactory = () =>
        {
            return Task.FromResult(new TestValue("test"));
        };


        Assert.AreEqual(false, _cache.Contains(testkey1));
        Assert.AreEqual(false, _cache.Contains(testkey2));

        await _cache.AddOrGetExisting(testkey1, valueFactory);

        Assert.AreEqual(true, _cache.Contains(testkey1));
        Assert.AreEqual(false, _cache.Contains(testkey2));
    }

    [Test]
    public async Task AddOrGetExisting_ExceptionsFromValueGenerationCanBeHandled()
    {
        string testkey = "key";
        string testValue = "value";
        string testExceptionMessage = "this is exception";

        int valueFactoryCalledTimes = 0;
        Func<Task<TestValue>> valueFactory = () =>
        {
            valueFactoryCalledTimes++;
            return Task.Factory.StartNew(() =>
            {
                Thread.Sleep(10);
                if (valueFactoryCalledTimes != -9999)
                {
                    // Throw always
                    throw new Exception(testExceptionMessage);
                }
                return new TestValue(testValue);
            });
        };

        Exception exception = null;

        // Use the cache.
        try
        {
            var v = await _cache.AddOrGetExisting(testkey, async () =>
            {
                var res = await valueFactory();
                return res.Value;
            });
            Assert.Fail("This point should never be reached, because the valueFactory should always throw.");
        }
        catch (Exception ex)
        {
            exception = ex;
        }

        Assert.AreEqual(testExceptionMessage, exception.Message, "Exception from valueFactory should be catched.");

        // Use the cache again.
        try
        {
            var v = await _cache.AddOrGetExisting(testkey, async () =>
            {
                var res = await valueFactory();
                return res.Value;
            });
            Assert.Fail("This point should never be reached, because the valueFactory should always throw.");
        }
        catch (Exception ex)
        {
            exception = ex;
        }

        Assert.AreEqual(2, valueFactoryCalledTimes, "Value generation should be called two times, because failed results should be evicted from the cache.");
    }

    /// <summary>
    /// Test the following scenario:
    /// - Cache user A gets a Task from the cache and starts to await it.
    /// - While A is awaiting, the value is invalidated.
    /// - After the A's await is done, A should have a result that was generated after the invalidation, and not the original (now invalidated) result that A started to await in the first step. This "result swich" should be invicible to A.
    /// </summary>
    /// <returns></returns>
    [Test]
    public async Task AddOrGetExisting_DoesNotReturnResultsThatWereInvalidatedDuringAwait()
    {
        string key = "key";
        string earlierValue = "first";
        string laterValue = "second";

        var laterTaskStart = new ManualResetEvent(false);
        var firstValueFactoryStarted = new ManualResetEvent(false);
        var laterValueFactoryStarted = new ManualResetEvent(false);
        var firstValueFactoryContinue = new ManualResetEvent(false);
        var laterValueFactoryContinue = new ManualResetEvent(false);

        int valueFactory1Executed = 0;
        int valueFactory2Executed = 0;

        Func<Task<TestValue>> valueFactory1 = () =>
        {
            return Task.Factory.StartNew(() =>
            {
                firstValueFactoryStarted.Set();
                firstValueFactoryContinue.WaitOne();
                valueFactory1Executed++;
                return new TestValue(earlierValue);
            });
        };

        Func<Task<TestValue>> valueFactory2 = () =>
        {
            return Task.Factory.StartNew(() =>
            {
                laterValueFactoryStarted.Set();
                laterValueFactoryContinue.WaitOne();
                valueFactory2Executed++;
                return new TestValue(laterValue);
            });
        };


        var cacheUserTask1 = Task.Factory.StartNew(async () =>
        {
            return await _cache.AddOrGetExisting(key, valueFactory1);
        });

        var cacheUserTask2 = Task.Factory.StartNew(async () =>
        {
            laterTaskStart.WaitOne();
            return await _cache.AddOrGetExisting(key, valueFactory2);
        });

        // Wait until the first value get from cache is in the middle of the value generation.
        // At this point, a Task that is running but not completed has been added to the cache.
        // CacheUserTask1 is awaiting for the Task to complete.
        firstValueFactoryStarted.WaitOne();

        // While the first value get is still running, invalidate the value.
        _cache.Invalidate(key);

        // Second get from the cache can now begin.
        // Because the first (still uncompleted) value was invalidated, cacheUserTask2's fetch should start a new value generation.
        laterTaskStart.Set();

        // New value generation has started but not yet completed.
        laterValueFactoryStarted.WaitOne();

        // Let first value generation run to completion.
        firstValueFactoryContinue.Set();

        // Let second value generation run to completion.
        laterValueFactoryContinue.Set();

        await Task.WhenAll(new List<Task>() { cacheUserTask1, cacheUserTask2 });


        Assert.AreEqual(laterValue, cacheUserTask1.Result.Result.Value,
            "The first fetch from the cache should have returned the value generated by the second fetch, because the first value was invalidated while still running.");
        Assert.AreEqual(laterValue, cacheUserTask2.Result.Result.Value,
            "The second fetch should have returned the later value.");

        Assert.AreEqual(1, valueFactory1Executed, "The first valueFactory should have been called once.");
        Assert.AreEqual(1, valueFactory2Executed, "The second valueFactory should have been called once.");
    }

    private async Task SilentlyHandleFaultingTask(Task task, string expectedExceptionMessage)
    {
        try
        {
            await task;
        }
        catch(Exception ex)
        {
            Assert.AreEqual(expectedExceptionMessage, ex.Message);
        }
    }
}

The complete code in a Visual Studio solution is available at my GitHub page.


Mikko Haapanen

Written by Mikko Haapanen who lives and works in Helsinki, Finland building useful things. Twitter