SPO-Join: Efficient Stream Inequality Join

Abstract

Stream inequality join aims to combine tuples coming from different streams based on inequality conditions and is a fundamental operator in distributed data stream processing. It is known to be computationally expensive as indexing data structures for determining matching tuples must be continuously updated (the existing methods employ a variation of B+ tree). To significantly alleviate this problem, we propose SPO-Join, a novel solution that combines a mutable B+ tree for efficient insertions and an immutable sorted-array-based data structure for efficient searching. Further, our proposed method is designed to be efficiently executed with distributed stream processing engines—we provide an open-source implementation for Apache Storm. Our experiments on real-world and synthesized datasets suggest that the proposed SPO-Join exhibits superior performances compared to state-of-the-art index-based stream inequality join solutions..

Type
Conference paper
Publication
In Proceedings of the International Conference on Extending Database Technology (EDBT 2025)
Kaustubh Beedkar
Kaustubh Beedkar
Assistant Professor