SPO-Join: Efficient Stream Inequality Join

Adeel Aslam, Kaustubh Beedkar, Giovanni Simonini

June, 2024

Abstract

Stream inequality join aims to combine tuples coming from different streams based on inequality conditions and is a fundamental operator in distributed data stream processing. It is known to be computationally expensive as indexing data structures for determining matching tuples must be continuously updated (the existing methods employ a variation of B+ tree). To significantly alleviate this problem, we propose SPO-Join, a novel solution that combines a mutable B+ tree for efficient insertions and an immutable sorted-array-based data structure for efficient searching. Further, our proposed method is designed to be efficiently executed with distributed stream processing engines—we provide an open-source implementation for Apache Storm. Our experiments on real-world and synthesized datasets suggest that the proposed SPO-Join exhibits superior performances compared to state-of-the-art index-based stream inequality join solutions..

Type

Conference paper

Publication

In Proceedings of the International Conference on Extending Database Technology (EDBT 2025)

SPO-Join: Efficient Stream Inequality Join

Abstract

Kaustubh Beedkar

Assistant Professor