Write-Amount-Aware Management Policies for STT-RAM Caches
Abstract:
Spin-transfer torque random access memory (STT-RAM) technology has emerged as one of the most promising memory technologies owing to its non-volatility, high density, and low-leakage power characteristics. However, STT-RAM has certain drawbacks such as high write energy consumption and limits to the number of write cycles. To enable the adoption of STT-RAM in the implementation of cache memories, new cache hierarchy management policies are required to overcome such drawbacks. In this brief, we evaluated several cache hierarchy management policies in the context of static random access memory L1 caches and an STT-RAM L2 cache. We found that a nonexclusive policy is superior to non-inclusive and exclusive policies in terms of energy consumption and endurance. We also propose a subblock-based management policy because the write energy consumption and endurance are proportional and inversely proportional to the amount of written data, respectively. A combination of the proposed policy with a nonexclusive policy reduces the L2 cache energy consumption by 33.3% (31.5%) and improves the lifetime by 56.3% (56.8%) in a single-core (quad-core) system. The proposed architecture of this paper analysis the logic size, area and power consumption using Xilinx 14.2.
Existing System:
Parket al. proposed a partial line update (PLU) scheme. This scheme partitions a cache line into fixed-size sub-blocks in the L1 cache and assigns a dirty bit per sub-block. When dirty blocks are evicted from the L1 cache, the dirty bits array of these blocks is transferred together to the STT-RAM-based L2 cache. Only modified sub-blocks are written by checking the array to determine whether corresponding cache line has already resided in the L2 cache. If a write-back miss occurs in the L2 cache, an entire cache line should be written irrespective of the modified amount. Thus, as the number of write-back misses increases, the effectiveness of the PLU scheme decreases.
Figure 1: Three sub-block-based cache management schemes. (a) PLU.
Alveset al. proposed a dead sub-block prediction (DSBP) scheme for SRAM-based caches. This scheme saves the leakage power of SRAM caches by turning OFF dead sub-blocks. However, it requires downward forwarding of sub-block requests to record the pattern history in non-L1 caches. This means that their scheme is based on an inclusive cache hierarchy, thereby allowing redundant writes to lower level caches. Actually, redundant writes are not a critical problem in their scheme because it is proposed for power gating in SRAM-based caches. This scheme may result in blocking upon sub-block misses owing to the absence of miss handling mechanism for missed sub-block units. A sector cache scheme was proposed to reduce the tag bits overhead by enabling adjacent cache lines to share the same tag bits. Fig. 1 compares sub-block-based schemes with regard to their write-back operations in which sub-block writes are indicated by arrows. PLU allows only dirty sub-blocks to be written into the corresponding cache block on a write-back hit. However, upon a write-back miss, PLU does not allow partial updates because all the data should be maintained. In inclusive caches, we do not need to consider this situation because a write-back miss cannot occur. In contrast, other hierarchies may encounter such a situation. In addition, PLU never captures useless clean sub-blocks. On the other hand, DSBP and our Discard Unused schemes do not maintain all data in both L1 and L2 caches. Instead, both schemes keep only usable portions of data. DSBP was designed for power gating in SRAM-based caches, thereby allowing writes of un-useful sub-blocks before power gating. Our scheme prevents useless clean write-backs regardless of the residence of the corresponding cache block in the L2 cache.
Disadvantages:
- Power consumption is High
Proposed System:
We implemented our proposed policy on top of a nonexclusive policy because, in comparison with a non-inclusive policy, the proposed policy showed a reduction in L2 cache energy consumption by 33.3% (31.5%) and an improvement in lifetime by 56.3% (56.8%) in a single-core (quad-core) system, whereas an exclusive policy showed worse performance, energy, and lifetime results. Our proposed subblock-based policy showed an additional 14% (11.8%) reduction in L2 cache energy consumption and 18.6% (19.4%) improvement in lifetime with a small performance hit. We made the following two major contributions:
1) We evaluated inclusion-related policies for the first time and found that the nonexclusive policy is the best choice with regard to L2 cache energy consumption and lifetime.
2) We proposed a sub-block-based cache hierarchy management policy to reduce STT-RAM write energy consumption and improve the lifetime.
Cache Hierarchy Management Policies:
Multilevel cache structures come in several designs. Fig. 2 shows four possible cache hierarchy management policies. In some processors, all data in the L1 caches must be included in the L2 cache. This policy is called an inclusive policy. An advantage of an inclusive cache is that when processors in the chip multiprocessors want to remove a cache block, they need to check only the L2 cache. In cache hierarchies that do not enforce such an inclusion, the L1 caches must also be checked. Whenever a cache block is evicted from the L2 cache, the corresponding L1 cache block is evicted to satisfy the inclusion property, which is called a back-invalidation step.
Figure 2: Four cache hierarchy management policies. (a) Inclusive. (b) Exclusive. (c) Non-inclusive. (d) Nonexclusive
Sub-Block-Based Scheme:
Conventional caches communicate with other level caches and the main memory at cache block granularity. Thus, an entire cache block must be loaded and written back irrespective of its usage. A certain portion of a cache block is never used during its residence in the caches. Consequently, this method of cache block management consumes unnecessary write energy. To conserve the write energy, we propose a policy called D is card Unused, which enables subblock-sized data movement. Our Discard Unused scheme is based on prediction that the spatial locality is revealed during residence in the L1 data cache and that unused sub-blocks will not be used later. The Discard Unused scheme tracks used sub-blocks while a cache block resides in the L1 data cache by putting an additional status bit per sub-block as well as a dirty bit. Fig. 3 shows an overview of Discard Unused cache structure.
Figure 3: Overview of Discard Unused cache management policy
Advantages:
- Power consumption is low
Software implementation:
- Modelsim
- Xilinx ISE