Interconnection Allocation between Functional Units and Registers in High-Level Synthesis
Abstract:
Data path interconnection on VLSI chips usually consumes a significant amount of both power and area. In this paper, we focus on the port assignment problem for binary commutative operators for interconnection complexity reduction. First, the port assignment problem is formulated on a constraint graph, and a practical method is proposed to find a valid and initial solution. For solution optimization, an elementary spanning-tree-transformation-based local search algorithm is proposed. To improve the efficiency of optimization, a matrix formulation, which meets the simplex tabuleau format, is proposed and thus the simplex method is adopted for optimization. Moreover, operation pivoting and successive pivoting are discussed for algorithm speedup. The experimental results show that on the randomly generated test cases, the matrix based algorithm shows the highest solution optimality and is five times faster than the elementary transformation method. On the real high-level synthesis benchmarks, the matrix-based method reduced 14% interconnections, while the previous greedy algorithm reduced 8% on average. The proposed architecture of this paper analysis the logic size, area and power consumption using Vivado HLS.
Existing System:
The port assignment is an important step in connectivity binding in the HLS to reduce interconnection complexity. Given a fixed binding of FUs and registers, the interconnection is allocated from the registers to FUs through MUXes, which is called port assignment problem. It is first addressed and is proved to be NP-complete. When the operators are all binary commutative, it is more accurately defined as the port assignment problem for binary commutative operators (PAP-BCO). Since the port assignment is performed on each FU after binding, its high-quality solutions are highly expected since they directly determine the interconnection complexity and MUX usage. Besides, if the port assignment can be combined into register binding step, it can help evaluate the solution quality of register binding in terms of interconnection complexity. Therefore, the execution time of port assignment algorithm must be short when it is combined into an iterative register binding algorithm and being solved repeatedly. Consequently, producing optimal solutions and estimating interconnection complexity quickly and accurately are the essential demands for port assignment algorithms.
There are already several literature works that deal with the port assignment problem. The work performed global permutation of all the inputs of an FU during MUX generation, and the work designed an integer linear programming (ILP) algorithm; for both algorithms, the complexity is a concern. Chenet al. proposed a greedy algorithm by swapping the operands of an operator if some operands are assigned to the registers that drive both ports. However, it has a limitation that operand swapping will not help when there is a series of operations that has circular dependencies. The work tried to decrease the MUX size discrepancy between the two ports. It reformulated the PAP-BCO problem as PAP-BCO*, which accounted for evenly distributing input signals a two MUXs, but it did not propose practical algorithms for PAP-BCO problem.
We also have published several works for port assignment problem. Conget al. were the first to propose a practical spanning-tree-based algorithm, where elementary tree transformation is adopted for solution optimization. Conget al.extended the algorithm from to also take MUX power into consideration. Conget al.still adopted the spanning tree for initial solution generation, but used a Fiduccia and Mattheyses (FM)-based method for optimization instead of elementary tree transformation. All these proposed algorithms excel other existing works, but they still have problems. For example, the elementary tree transformation used for solution optimization was time consuming, which degraded the algorithm efficiency. For port assignment solution perturbation, the FM-based algorithm allowed only one vertex being moved or two vertices being swapped in each iteration of local search; thus the search area was small, which limited the solution optimality.
Disadvantages:
- Area is high
Proposed System:
We improve by proposing a new matrix-based algorithm and compare it. The contributions of this paper are stated as follows, a which 1) and 2) have been proposed and 3) to 5) are additional contributions.
1) A spanning tree and conflict graph based method is first proposed to obtain feasible solutions for the port assignment problem. It not only generates high-quality initial solutions but can also estimate interconnection complexity during register binding with little time cost.
2) An elementary-tree-transformation-based method is proposed for solution optimization. In each iteration, the spanning tree structure is changed by elementary tree transformation to get a smaller conflict graph.
3) To improve the efficiency of solution optimization, a matrix formulation is proposed by defining two operations ⊕ and ⊗; because the coefficient matrix agrees with the simplex tableau format, the simplex method is adopted and pivoting are performed for optimization.
4) To improve the efficiency of pivot selection, the properties of pivotings are studied and successive pivotings are proposed; three pivoting rules are proposed to filter unpromising or redundant pivoting calculations, which significantly speeds up the pivoting selection.
5) Our algorithm is tested on real benchmarks to evaluate the improvements in terms of power, area, and delay; moreover, the testbench FFT is implemented on an FPGA to get actual improvements of power, area, and clock frequency achieved by our algorithm.
Conflict-graph-based initial solution generation:
We first give some graph notations from, which will be used in this paper.
1) A spanning tree of graph G is denoted by T. The tree edge son the spanning tree are denoted b yet and the non-tree edge are denoted by ec.
2) A fundamental cycle, also called fundamental loop, is denoted by FL. Each fundamental loop FL has only one nontree edge ec and each ec has a unique fundamental loop, denoted by FL(ec).
3) A fundamental cut set is denoted by FC. Similarly, each tree edge et has a unique fundamental cut set, denoted By FC(et).
4) The fundamental loop size refers to the number of edges in the loop. The parity of a non-tree edge ec, denoted by p(ec), is defined on the size of fundamental loop FL(ec): if the loop size is odd, the parity of ec is odd and ec is an odd edge; otherwise, the parity of ec is even and ec is an even edge.
Advantages:
- Area is reduced
Software implementation:
- Vivado HLS