A Comparison of Real Time Stream Processing Frameworks

Jonathan Curtis, Dublin Institute of Technology, Ireland

Document Type Dissertation

A dissertation submitted in partial fulfilment of the requirements of Dublin Institute of Technology for the degree of M.Sc. in Computing (Advanced Software Development) 2018

Abstract

The need to process the ever-expanding volumes of information being generated daily in the modern world is driving radical changes in traditional data analysis techniques. As a result of this, a number of open source tools for handling real-time data streams has become available in recent years. Four, in particular, have gained significant traction: Apache Flink, Apache Samza, Apache Spark and Apache Storm. Despite the rising popularity of these frameworks, however, there are few studies that analyse their performance in terms of important metrics, such as throughput and latency. This study aims to correct this, by running several benchmarks against these frameworks.