FAULT-TOLERANT ADAPTIVE ROUTING IN DRAGONFIY NETWORKS
ABSTRACT
Dragonfly networks have been widely used in the current high-performance computers or high-end servers. Fault-tolerantrouting in dragonfly networks is essential. The rich interconnects provide good fault-tolerance ability for the network. A new deadlockfreeadaptive fault-tolerant routing algorithm based on a new two-layer safety information model, is proposed by mapping routers ina group, and groups of the dragonfly network into two separate hypercubes. The new fault-tolerant routing algorithm tolerates staticand dynamic faults. Our method can determine whether a packet can reach the destination at the source by using the new safetyinformation model, which avoids dead-ends and aimless misrouting. Sufficient simulation results show that the proposed fault-tolerantrouting algorithm even outperforms the previous minimal routing algorithm in fault-free networks in many cases.
PROPOSED SYSTEM:
We present a ault-tolerant routing algorithm for dragonflynetworks. The faulty dragonfly network is mappedto two hypercubes (one for router groups, and one forthe routers in the group that contains the source ordestination), according to which a new safety model isproposed. The new safety model is different from the onecalled local safety in because of the new fault model.The fault-tolerant routing algorithm is quite differentfrom the one in and the one. The workin presented fault-tolerant routing for the crossbarbasedfully connected on-chip networks, which is quitedifferent from dragonfly networks.Our method requires two indistinguishable buffers foreach input port, which can provide better performancecompared to designs with two different VCs. The impactof the design on the performance was studied ,which can certainly provide balancing utilization of thebuffer resources. The main difference is that VC allocationmay be not necessary. The design cost can be thesame as that of the two VC design because the sameamount of buffer is used for each input port. The newfault-tolerant routing algorithm uses the flow controlscheme proposed in that paper.
EXISTING SYSTEM:
Kim, Dally, Scott, Abts in applied the dragonflynetwork to supercomputers. They proposed a minimumrouting algorithm in dragonfly networks, according towhich any packet requires traversing a single global link.The initial idea of the indirect routing scheme appearedin. The method in introduced indirect globaladaptive routing decision uses information that is notdirectly available at the source router. Jiang et al. proposed four indirect routing schemes: credit round trip(CRT), progressive adaptive routing (PAR), piggybackrouting (PB), and reservation routing (RES). Jiang et al.in proposed two new endpoint congestion-controlprotocols, Small-Message SRP (SMSRP) and Last-HopReservation Protocol (LHRP).The Cray Cascade system utilizes a variant dragonflyarchitectures. It is a two-layer dragonfly network,which requires multiple VCs to provide deadlock-freeadaptive routing. Hastings, et al. in studied theimpact of global link arranggements.Camarero, et al. in presented a comprehensiveanalysis of the topological properties of the dragonflynetwork, providing balancing conditions for networkdimensioning, as well as introducing and classifying severalalternatives for the global connectivity and trunkinglevel. Garcia, et al.proposed a new deadlock-freemisrouting scheme by employing an escape subnetworkto prevent deadlocks, rather than a fixed order in theVCs. Two misrouting schemes were presented.Opportunistic Local Misrouting obtains the best performanceby providing the highest routing freedom, andrelying on a deadlock-free escape path to the destinationfor every packet. Faizian, et al. proposed to a trafficpattern-based adaptation mechanism for intra-groupcommunication in Dragonfly. The idea is to explicitly usethe link usage statistics that are collected in performancecounters to infer the traffic pattern, and to take theinferred traffic pattern plus link loads into considerationwhen making adaptive routing decisions.
CONCLUSIONS
The dragonfly network is widely used in the currentcommercial machines, such as, IBM Power 7-IH, and775 Systems, Cray Cascade and its later versions. Itis essential to propose an efficient fault-tolerant routingalgorithm in dragonfly networks.A new deadlock-free adaptive fault-tolerant routingalgorithm in dragonfly networks based on a new twolayersafety information model, is proposed by mappingrouters in a group, and groups of the dragonfly networkinto two separate hypercubes. The new fault-tolerantrouting algorithm does not require any virtual channels,just two indistinguishable buffers at each input port.The new method tolerates static and dynamic faults.Our method determines whether a packet can reachthe destination at the source by using the new safetyinformation model. Sufficient simulation results showthat the proposed fault-tolerant routing algorithm evenoutperforms the previous minimal routing algorithmMIN in the fault-free network in many cases andthe general fault-tolerant routing algorithm.
REFERENCES
[1] R. V. Boppana and S. Chalasani, “A framework for designingdeadlock-free wormhole routing algorithms,” IEEE Trans. Paralleland Distributed Systems, vol. 7, no. 2, pp. 169-183, Feb. 1996.[2] R. V. Boppana and S. Chalasani,“Fault-tolerant wormhole routingalgorithms for mesh networks,” IEEE Trans. Computers, vol. 44,no. 7, pp. 848-864, July 1995.
[3] A. Bhatele, W. D. Gropp, N. Jain, and L. V. Kale, “Avoiding hotspotson two-level direct networks,” in Proc. of Int. Conf. on HighPerformanceComputing, Networking, Storage and Analysis, articleNo. 76, http://dx.doi.org/10.1145/2063384.2063486, 2011.
[4] C. Camarero, E. Vallejo, and R. Beivide, “Topological characterizationof Hamming and dragonfly networks and its implicationson routing, ACM Trans. Architec. Code Optim.,vol.11,no.4,Article39, Dec. 2014.
[5] C. L. Chen and G. M. Chiu, “A fault-tolerant routing scheme formeshes with nonconvex faults,” IEEE Trans. Parallel and DistributedSystems, vol. 12, no. 5, pp. 467-475, May 2001.
[6] G. -M. Chiu and S.-P. Wu, “A fault-tolerant routing strategy inhypercube multicomputers,” IEEE Trans. Computers,vol.45,no.2,pp. 143-155, Feb. 1996.
[7] B. V. Dao, J. Duato, and S. Yalamanchili,“Dynamically configurablemessage flow control for fault-tolerant routing,” IEEETrans. Parallel and Distributed Systems, vol. 10, no. 1, pp. 7-22, Jan.1999.
[8] G.Faanes,A.Bataineh,D.Roweth,T.Court,E.Froese,B.Alverson,T. Johnson, J. Kopnick, M. Higgins, and J. Reinhard, “CrayCascade: A scalable HPC system based on a dragonfly network,”in Proc. SC12, Article No. 103, Dec. 2012.
[9] E. Faizian, M. S. Rahman, M. A. Mollah, X. Yuan, S. Pakin, andM. Lang, “Traffic pattern-based adaptive routing for intra-groupcommunication in dragonfly networks,” in Proc. of 24th AnnualSymposium on High-Performance Interconnects, pp. 19-26, 2016.
[10] M. Garcia, E. Vallejo, R. Beivide, M. Odriozola, C. Camarero,G. Rodrguez, J. Labarta, and C. Minkenberg, “On-the-fly adaptiverouting in high-radix hierarchical networks,” in Proc. of Int. Conf.on Parallel Processing, pp. 280-288, 2012.