As a result, the small cache solution cannot scale out to multiple clusters. However, if we only put a cache in front of each clusters, the load between clusters can be unbalanced. One way to mitigate this issue is to use multiple upper-layer nodes, but it brings the question of how to allocate hot objects to the upper-layer cache nodes. Traditional cache allocation mechanisms are suboptimal. Cache partition has low overhead for cache coherence, but cannot increase the cache throughput linearly with the number of cache nodes; cache replication achieves the opposite.