A Design Tool for High Performance Image Processing on Multi core Platforms
Abstract
Design and implementation of smart vision systems often involve the mapping of complex image processing algorithms into efficient, real-time implementations on multicore platforms. In this paper, we describe a novel design tool that is developed to address this important challenge. A key component of the tool is a new approach to hierarchical dataflow scheduling that integrates a global scheduler and multiple local schedulers. The local schedulers are lightweight modules that work independently. The global scheduler interacts with the local schedulers to optimize overall memory usage and execution time. The proposed design tool is demonstrated through a case study involving an
image stitching application for large scale microscopy images.
Exiting System
Many high-level languages and tools have been developed for signal processing applications that incorporate dataflow models of computation. Examples include CAL/Orcc [5], [6],PREESM [7], Multi-Dataflow Composer (MDC) tool [8],
DIF [4], and HTGS [1]. HI-HTGS places a special emphasis on high performance multicore implementation, and integrated optimization of memory management and task scheduling using a single actor, dynamic invocation (SADI) scheduling
model [2].HI-HTGS combines the abstract dataflow graph analysis features of DIF with the APIs of HTGS, which enable construction, integration and iterative optimization of high performance software components and task graphs. While
DIF focuses on high level dataflow analysis in which the detailed functionality of individual graph components (actors)is abstracted, HTGS provides extensive infrastructure for creating fully functional, high performance task graph implementations. Thus, the features of DIF and HTGS are highly complementary, and their integration through HI-HTGS provides new capabilities for automated, model-based analysis, implementation, and optimization of multicore signal and information processing systems.
Proposed System
In this section, we experimentally study the performance of HI-HTGS on an image stitching application for large-scale microscopy. In this application, a microscope collects a grid of overlapping images. Each image in the grid is called a tile. The objective of image stitching is to derive positional translations between adjacent pairs of image tiles, and integrate the translated tiles into a single mosaic. This application is data intensive. Because each image tile is as large as 3 MB and its converted FFT result is about 12 MB, the application needs to allocate at least 15 MB of memory for a single image tile. The dataset that we experimented with is
a 42×59 grid of 2478 image tiles. The grid encompasses over 35 GB of pixel data. If intermediate results are not released in a timely manner, the application can quickly run out of physical memory on the targeted computing platform. To implement the image stitching application using HIHTGS, we model the application as a WSDF graph and specify this graphical model using the DIF Language.
CONCLUSIONS
In this paper, we have presented a software tool for design and implementation of multicore image processing systems. This tool consists of two main parts — the DIF-based analysis engine, which applies the Dataflow Interchange Format(DIF) Package, and the HTGS-based runtime system, which builds on the Hybrid Task Graph Scheduler (HTGS). The tool allows system designers to incorporate powerful techniques for performance optimization and memory management while
specifying applications at a high level of abstraction and using significant amounts of automation. Our experiments demonstrate the ability of our new design tool to provide this high level of abstraction and automation while generating efficient implementations on a diverse set of platforms. Useful directions for future work include extending the hierarchical scheduling techniques developed in this work to heterogeneous platforms, such as CPU/GPU platforms.
REFERENCES
[1] T. Blattner, W. Keyrouz, M. Halem, M. Brady, and S. S. Bhattacharyya, “A hybrid task graph scheduler for high performance image processing workflows,” in Proceedings of the IEEE Global Conference on Signal and Information Processing, 2015, pp. 634–637.
[2] J. Wu, T. Blattner, W. Keyrouz, and S. S. Bhattacharyya, “Model-based dynamic scheduling for multicore implementation of image processing systems,” in Proceedings of the IEEE Workshop on Signal Processing Systems, 2017.
[3] S. Sriram and S. S. Bhattacharyya, Embedded Multiprocessors: Scheduling and Synchronization, 2nd ed. CRC Press, 2009, iSBN:1420048015. [4] C. Hsu, M. Ko, and S. S. Bhattacharyya, “Software synthesis from the dataflow interchange format,” in Proceedings of the International Workshop on Software and Compilers for Embedded Systems, Dallas, Texas, September 2005, pp. 37–49.
[5] J. Eker and J. W. Janneck, “Dataflow programming in CAL — balancing expressiveness, analyzability, and implementability,” in Proceedings of the IEEE Asilomar Conference on Signals, Systems, and Computers, 2012, pp. 1120–1124.
[6] H. Yviquel, A. Lorence, K. Jerbi, G. Cocherel, A. Sanchez, and M. Raulet, “Orcc: multimedia development made easy,” in Proceedings of the ACM International Conference on Multimedia, 2013, pp. 863–866.
[7] M. Pelcat, J. Piat, M. Wipliez, S. Aridhi, and J.-F. Nezan, “An open framework for rapid prototyping of signal processing applications,” EURASIP Journal on Embedded Systems, vol. 2009, January 2009, article No. 11.
[8] F. Palumbo, N. Carta, and L. Raffo, “The multi-dataflow composer tool: A runtime reconfigurable HDL platform composer,” in Proceedings of the Conference on Design and Architectures for Signal and Image Processing, 2011.
[9] J. Keinert, C. Haubelt, and J. Teich, “Modeling and analysis of windowed synchronous algorithms,” in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, May 2006.
[10] E. A. Lee and D. G. Messerschmitt, “Static scheduling of synchronous data flow programs for digital signal processing,” IEEE Transactions on Computers, vol. 36, no. 1, pp. 24–35, 1987.
[11] E. A. Lee and S. Ha, “Scheduling strategies for multiprocessor real time DSP,” in Proceedings of the Global Telecommunications Conference, vol. 2, 1989, pp. 1279–1283.
[12] Google, Inc., “Protocol buffers,” 2017, https://developers.google.com/ protocol-buffers/, visited on April 26, 2017.
[13] P. Hamill, Unit Test Frameworks. O’Reilly & Associates, Inc., 2004. [14] C. D. Kuglin and D. C. Hines, “The phase correlation image alignment method,” in Proceedings of the International Conference on Cybernetics
and Society, 1975, pp. 163–165.