Scalable real-time processing with Spark Streaming: implementation and design of a Car Information System
Philipp M. Grulich

TL;DR
This paper demonstrates how Spark Streaming can be used to build a scalable, real-time Car Information System, highlighting its architecture, implementation, and evaluation for fault tolerance and scalability.
Contribution
It presents a detailed design and implementation of a Car Information System using Spark Streaming, emphasizing scalability, fault tolerance, and adaptability for similar applications.
Findings
Spark Streaming enables scalable, fault-tolerant stream processing.
The system architecture is adaptable for various real-time data applications.
Evaluation shows effective performance but highlights challenges due to Spark's rapid development.
Abstract
Streaming data processing is a hot topic in big data these days, because it made it possible to process a huge amount of events within a low latency. One of the most common used open-source stream processing platforms is Spark Streaming, which is demonstrated and discussed based on a real-world use-case in this paper. The use-case is about a Car Information System, which is an example for a classic stream processing system. First the System is de- signed and engineered, whereby the application architecture is created carefully, because it should be adaptable for similar use-cases. At the end of this paper the CIS and Spark Streaming is evaluated by the use of the Goal Question Metric model. The evaluation proves that Spark Streaming is capable to create stream processing in a scalable and fault tolerant manner. But it also shows that Spark is a very fast moving project, which could…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Software System Performance and Reliability · Data Management and Algorithms
