Low-Cost Heterogeneous Embedded Multiprocessor Architecture for Real-Time Stream Processing Applications.
PhD thesis, Univ. of Twente.
CTIT Ph.D.-thesis series No. 15-368
Full text available as:
Official URL: http://dx.doi.org/10.3990/1.9789036539159
SDR applications are often stream processing applications that are computationally intensive which results in a low throughput on homogeneous multi-core architectures and thus could benefit significantly from the use of stream processing accelerators.
The integration of stream processing accelerators in an architecture is often facilitated by a NoC.
Crossbars or mesh-based NoCs provide guaranteed throughput but tend to have unacceptably high hardware costs.
We propose a low-cost heterogeneous multi-processor architecture for real-time stream processing applications together with dataflow models for real-time analysis.
This architecture allows compositional temporal dataflow analysis based on independently characterized components.
The proposed architecture contains a low-cost ring-shaped interconnect which provides all-to-all guaranteed throughput communication while being work-conserving.
Furthermore, cost-effective integration of stream processing accelerators is enabled by combining two low-cost rings and using a small shell in each NI, thereby realizing credit-based hardware flow control for accelerators.
To improve the utilization of stream processing accelerators, we propose a sharing approach to multiplex multiple real-time streams of data over accelerators.
Data streams between tasks are transferred using our dual-ring interconnect.
Software tasks communicate directly using our distributed software FIFO implementation while communication involving stream processing accelerators is handled by our hardware credit-based flow control.
In order to reason about the worst-case behavior of our architecture, temporal dataflow models are constructed to obtain bounds on throughput and latency.
Three case studies have been carried out to evaluate the hardware costs and performance of the proposed architecture.
For these case studies, several instances of the proposed architecture have been implemented on a Xilinx Virtex-6 FPGA.
We show that in our architecture the use of accelerators improves maximum throughput by 366% and sharing accelerators can reduce hardware costs over 63%.
The results from our case studies show that our ring interconnect has a very small hardware cost and performs within the bounds derived by our dataflow analysis models.
We conclude that a considerable reduction of hardware costs can be attained by replacing traditional interconnects by our dual communication ring interconnect.
We also conclude that cost-effective shared accelerator integration can improve application performance which demonstrates the merit of our approach.
|Item Type:||PhD Thesis|
|Research Group:||EWI-CAES: Computer Architecture for Embedded Systems|
|Research Project:||COMMIT/SENSA: Sensor Networks for Public Safety|
|Uncontrolled Keywords:||real-time processing, data streaming, stream processing, accelerator sharing, dataflow|
|Deposited On:||04 November 2015|
Export this item as:
To correct this item please ask your editor
Repository Staff Only: edit this item