Caching

Introduction

Caches take advantage of the locality of reference principle: recently requested data is likely to be requested again. A cache is like short-term memory: it has a limited amount of space, but is typically faster than the original data source and contains the most recently accessed items. Caches can exist at all levels in architecture, but are often found at the level nearest to the front end where they are implemented to return data quickly without taxing downstream levels.

Distributed application stores data permanently in secondary storage (mostly SQL or No-SQL database). Reading and writing to secondary storage is costly operation. But In today’s era, read / write operations should be fast. Here, Caching helps us in retrieving data faster. Caching stores data in primary memory. As we know, primary memory is faster than secondary memory. However, primary memory is limited. It means that we can store limited amount of data in cache.

Cache server can be placed anywhere in distributed system. Like in between of web server and application server and / or in between of application server and database server.

Cache invalidation

Cache requires some maintenance like cache data should be coherent with source of truth i.e. database. If data updated in database then it should also updated in cache otherwise application gives inconsistent behavior to the user. For that there are three schemes which we can use here. Write-through cache:

  1. Write-through cache
    We will write data at both places first in cache and then in database. Since data are going to written at both places, consistency will be maintained. Also, data will not lost in case of crash or failure of system as we have stored our data at permanent storage also.
    Disadvantage with this scheme is that as we are going to write data at both places in same transaction itself, latency of operation will be higher.
  2. Write-around cache
    In this scheme, we will bypass writing data to cache and we will directly write data to database.
    Disadvantage with this scheme is that if read request came for recently written data then it will be a “cache miss” and data needs to read from back-end storage which will increase latency of operation.
  3. Write-back cache
    In this scheme, we will write data to cache only. Later, at regular interval or upon certain conditions, we will write data to database. This will make our write operation really faster.
    Disadvantage with this scheme is that initially, we are writing data to cache only and its high risk of loosing data in case of crash or system failure.

Cache eviction policies

  1. First In First Out (FIFO): The cache evicts the first block accessed first without any regard to how often or how many times it was accessed before.
  2. Last In First Out (LIFO): The cache evicts the block accessed most recently first without any regard to how often or how many times it was accessed before.
  3. Least Recently Used (LRU): Discards the least recently used items first.
  4. Most Recently Used (MRU): Discards, in contrast to LRU, the most recently used items first.
  5. Least Frequently Used (LFU): Counts how often an item is needed. Those that are used least often are discarded first.
  6. Random Replacement (RR): Randomly selects a candidate item and
    discards it to make space when necessary.

Leave a Reply

Your email address will not be published. Required fields are marked *