Resource-Efficient SRAM-based Ternary Content Addressable Memory

Abstract:

Static random access memory (SRAM)-based ternary content addressable memory (TCAM) offers TCAM functionality by emulating it with SRAM. However, this emulation suffers from reduced memory efficiency while mapping the TCAM table on SRAM units. This is due to the limited capacity of the physical addresses in the SRAM unit. This brief offers a novel memory architecture called a resource-efficient SRAM-based TCAM (REST), which emulates TCAM functionality using optimal resources. The SRAM unit is divided into multiple virtual blocks to store the address information presented in the TCAM table. This approach virtually increases the overall address space of the SRAM unit, mapping a greater portion of the TCAM table in SRAM and increasing the overall emulated TCAM bits/SRAM at the cost of reduced throughput. A 72×28-bit REST consumes only one 36-kbit SRAM and a few distributed RAMs via implementation on a Xilinx Kintex-7 field-programmable gate array. It uses only 3.5% of the memory resources compared with a conventional SRAM-based TCAM (hybrid-partitioned TCAM). The proposed architecture of this paper analysis the logic size, area and power consumption using Xilinx 14.2.

Existing System:

A typical TCAM has the advantage of fast searching over SRAM-based searching solutions, but it also has drawbacks. A TCAM cell uses more transistors than an SRAM cell. Hence, it has high production cost per bit of memory storage and exhibits less storage efficiency than SRAM devices of comparable bit density and access time. This drawback is mainly because of emulating the do not care state, which requires additional hardware memory resources in terms of bits used for the emulation. To overcome the drawbacks in a typical TCAM, such as relatively high energy consumption, complex memory structure, limited storage density, low scalability, heavy licensing, and royalty costs by some TCAM vendors, a research area called SRAM-based TCAM has been proposed in the latest research.

The SRAM-based TCAM memory architectures generally require SRAMs to map TCAM bits [5]–[7], [13]. The search operation requires a fixed length binary query string (QS) as the address to access the SRAM. It generates the vector that contains information about matched address locations. The logical 1s and 0s in the vector are treated as matched and mismatched address locations, respectively. Later, a priority encoder is used to prioritize the locations of the vector when multiple matching bits are present in the vector.

Various SRAM-based TCAMs have been implemented on the FPGA platform. A scalable TCAM is implemented on the Xilinx FPGA. It uses Xilinx primitive block RAMs (BRAMs) and distributed RAMs to emulate classical TCAM. In this brief, all the SRAMs are activated during a search operation, which increases the overall power consumption. To avoid the increase in power consumption, a hierarchical low power search operation is proposed, which activates RAMs hierarchically based on the match condition found in previous RAM blocks. A similar FPGA-based packet classification engine and UE-TCAM implemented TCAM using Altera and Xilinx primitive BRAMs, respectively.

Another large TCAM was previously handled by logically dividing the TCAM into relatively small TCAMs. Researchers implemented hybrid-portioned SRAM-based TCAM, HP-TCAM , Z-TCAM , and E-TCAM. All used SRAMs and logic circuits to construct TCAM functionality. They partitioned the set of TCAM bits into groups of various sizes and mapped them to their SRAM-based TCAM architecture. The architecture uses two SRAMs. The first SRAM stores the information of the presence of QS in TCAM bits. The second SRAM stores the address information (AI) for the corresponding QS. The HP-TCAM additionally stores the index in the first SRAM that is later used to generate the address for the second SRAM.

Disadvantages:

Power consumption is high
Area coverage is high

Proposed System:

REST is the emulation of classical TCAM in resource-efficient ways. It can search QS against a set of the same size strings (data string) in TCAM table and outputs the matched address for the string.

The REST memory architecture makes use of VBs in the SRAM units to achieve memory efficiency at the cost of reduced throughput. High-level organization of the memory architecture of a single REST block of dimension W×H−W addresses and H bits per address—is shown in Fig. 1. A unit REST block includes a single-port SRAM with m VBs for an AI table, m number of distributed RAMs for early elimination (EE) tables, a single multiplexer with m inputs, a single de-multiplexer with m outputs, a priority encoder, and multiple AND gates to conduct the TCAM emulation.

Figure 1: Memory architecture of a single REST block of dimensionsW×H

The H-bit input QS is partitioned into the m number of b-bit substrings. The partitioned substrings are simultaneously used as the addresses of m number of distributed RAMs with a dimension of 2b×1 bit. In the REST memory architecture, distributed RAMs are used as the EE table, as shown in Fig. 1. Precomputed lookup table (LUT) data in distributed RAM are built to find the presence of b-bit substring at an early stage. The output of each distributed memory is single bit. The m number of bits from m distributed RAMs are logically ANDed together to produce the signal used as the ENABLE signal to the AI table. If the enable signal is logically low, the rest of REST block operations are inactivated, and this corresponds to the EE operation.

Figure 2: Organization of the RESTblocks to create larger TCAM

To emulate large TCAMs, it is indispensable to logically partition TCAM into smaller sizes of TCAM, and the partitioned TCAM is then mapped to one REST block. Fig. 2 shows an example of the array structure of REST blocks without priority encoders, bitwise AND gate, and a priority encoder. All n-REST blocks are executed in parallel and produce nW-bit AI for the bitwise AND operation. The priority encoder selects 1 bit from n W bits when multiple matching addresses exist.

Virtual blocks of SRAM:

Typically, a single-port SRAM unit with 2n addresses and W bits per address—SRAM (2n×W)—can be used to store the W addresses and n-bit data strings in the TCAM table for the purpose of emulating TCAM with the dimension of W×n. These W bits correspond to the AI in the AI table.

Figure 3: VBs of single-port SRAM unit

The size of each VB that determines the dimension of the TCAM table is partitioned. Fig. 3 shows a generalized example of the 2n×W SRAM used in REST memory architecture with m VBs; each VB has the dimension of (2n÷m) ×W. In totality, m× [(2n÷m)×W]-bit TCAM can be emulated.

Advantages:

Power consumption is low
Area coverage is low

Software implementation:

Modelsim
Xilinx ISE

Resource-Efficient SRAM-based Ternary Content Addressable Memory

Recent Post

Project Categories