Energy-Efficient Reduce-and-Rank UsingInput-Adaptive Approximations

**Abstract:**

Approximate computing is an emerging design paradigm that exploits the intrinsic ability of applications to produce acceptable outputs even when their computations are executed approximately. In this paper, we explore approximate computing for a key computation pattern, reduce-andrank (RnR), which is prevalent in a wide range of workloads, including video processing, recognition, search, and data mining. An RnR kernel performs a reduction operation (e.g., distance computation, dot product, and L1-norm) between an input vector and each of a set of reference vectors, and ranks the reduction outputs to select the top reference vectors for the current input. We propose three complementary approximation strategies for the RnR computation pattern. The first is interleaved reductionand-ranking, wherein the vector reductions are decomposed into multiple partial reductions and interleaved with the rank computation. Leveraging this transformation, we propose the use of intermediate reduction results and ranks to identify future computations that are likely to have a low impact on the output, and can, hence, be approximated. The second strategy, inputsimilarity-based approximation, exploits the spatial or temporal correlation of inputs (e.g., pixels of an image or frames of a video) to identify computations that are amenable to approximation. The third strategy, reference vector reordering, rearranges the order in which the reference vectors are processed such that vectors that are relatively more critical in evaluating the correct output, are processed at the beginning of RnR operation. The number of these critical reference vectors is usually small, which renders a substantial portion of the total computation to be amenable to approximation. These strategies address a key challenge in approximate computing—identification of which computations to approximate—and may be used to drive any approximation mechanism, such as computation skipping or precision scaling to realize performance and energy improvements. A second key challenge in approximate computing is that the extent to which computations can be approximated varies significantly from application to application, and across inputs for even a single application. Hence, input-adaptive approximation, or the ability to automatically modulate the degree of approximation based on the nature of each individual input, is essential for obtaining optimal energy savings. In addition, to enable quality configurability in RnR kernels, we propose a kernel-level quality metric that correlates well to application-level quality, and identify key parameters that can be used to tune the proposed approximation strategies dynamically. We develop a runtime framework that modulates the identified parameters during the execution of RnR kernels to minimize their energy while meeting a given target quality. To evaluate the proposed concepts, we designed quality-configurable hardware implementations of six RnR-based applications from the recognition, mining, search, and video processing application domains in 45-nm technology.The proposed architecture of this paper analysis the logic size, area and power consumption using Xilinx 14.2.

**Existing System:**

Applications from several domains, such asrecognition, data mining, analytics, vision, search,graphics, multimedia, and signal processing exhibit theproperty of intrinsic application resilience, which is theability to produce outputs of acceptable quality even whenthe underlying computations are executed in an impreciseor approximate manner. This intrinsic resilience arises fromseveral factors: 1) the robustness of these applicationsto noisy/redundant input data; 2) the lack of a unique,golden answer, or the inability of humans to perceive minorvariations in the application output; and 3) the statistical anditerative nature of their computations. Approximatecomputing is an emerging design paradigm that leverages theintrinsic resilience of applications to execute computationsapproximately, and more efficiently, leading to improvementsin energy or performance.

We develop three complementary strategies to approximateRnR kernels, viz., interleaved reduction-and-ranking, inputsimilarity-based approximation, and reference vector reordering. The first strategy, interleaved reduction-and-ranking,decomposes the reduction computation into multiple partialreductions and utilizes the intermediate partial reduction output to modulate the degree of approximation introduced in subsequent computations. Recall that the reduction output is usedto rank and select a subset of reference vectors. Therefore,the partial reduction output can be used to predict whether thecurrent reference vector is likely to eventually appear in theselected subset. The second strategy, input-similarity-basedapproximation, leverages the spatial or temporal correlation ofsuccessive inputs (e.g., pixels in an image or frames of a video)present in many applications. We utilize the similarity betweenthe current and previous input to infer the degree to which thecomputations on the current input can be approximated. Thethird strategy, reference vectorreordering, rearranges the orderin which reference vectors are processed by the RnR kernelso that vectors that are likely to appear in the eventual top-ksubset are identified earlier than the original case, resulting ina higher number of computations being skipped. Reorderingthe vectors also enables us to terminate RnR operation inadvance without even processing a substantial portion of thereference vectors, yet face minimal quality loss. Note thatthese approximation strategies only identify, which futurecomputations could be approximated based on intermediateresults, and, hence, can be used in conjunction with anyapproximate computing mechanism to improve energy efficiency. Toward this end, we utilize two popular approximatedesign techniques, viz., computation skipping andprecision scaling.

**Disadvantages**:

- Power consumption is high

**Proposed System:**

The primary objective of this paper is to designinput-adaptive systems that are equipped with the ability todynamically modulate the extent to which computations areapproximated at runtime according to the nature of eachindividual input and the specified quality constraint. Theneed for input adaptability stems from the fact that evenfor a given application output quality, there is significantheterogeneity in the degree to which different inputs to theapplication can be approximated. On the other hand, qualityconfigurability is essential, since the intrinsic resilience of anapplication varies based on the context in which its outputs areconsumed, and hence, the same application may be required tooperate at different output qualities for optimal power savings.In this section, we demonstrate the need for input adaptiveas well quality configurable execution using an example.We consider a popular classification algorithm,k-nearestneighbors (KNNs), and use it in two different applicationcontexts—digit recognition and eye detection.

Given an input, the KNN algorithm identifiesk vectorsfrom the training set that are closest to the input (i.e., top-k).It then assigns the input to the class that is most commona the neighbors. A typical implementation of KNN wouldmaintain a list of top-kvectors and iteratively insert vectors into the list based on its proximity to the input. If the distanceof the vector is greater than those in the top-k list, it issimply discarded. Since the discarded vectors do not affectthe application output, the number of updates to the top-kvector list provides a measure of the intrinsic resilience ofthe algorithm.

Quality – configurablereducesand-rank:designapproach:

In this paper, we consider the design of quality-configurablesystems in the context of a commonly used computationpattern, namely, RnR. RnR computational kernels are prevalentin a number of emerging and existing application domains,such as recognition, data mining, vision, search, and videoprocessing. Some prominent examples of RnR algorithmsinclude KNNs, KMEANS, MPEG encoder, generalizedlearning vector quantization (GLVQ), image segmentation (IMG-SEG), and the SOBEL operator. In theseapplications, RnR constitutes a significant fraction of theruntime, as shown in Table I. The computation involved in anRnR kernel is shown in Fig. 1. It takes an input vector (A)anda set of reference vectors (R1…RN) as its input and producesa ranked list of the reference vectors as its output. It consistsof two steps. The reduce step performs vector reductionof the input vector with each reference vector to producea list of scalars. These scalars are then fully or partially(top-kscalars) sorted in the rank step to produce the output.The KNNalgorithm, is an example ofan algorithm comprising the RnR kernel. In this case, the inputvector is reduced with each training vector to compute a list ofdistances, which are then ranked based on the distance valuesto obtain the top-k nearest training vectors.

Table 1: PREVALENCE OFRnR KERNEL

Table 2: RnR kernel operation.

**Advantages:**

- Power consumption is low

**Software implementation:**

- Modelsim
- Xilinx ISE