A Multiple Compression Approach using Attribute-based Signatures
Abstract
Background With the increasing volume of data collected for advanced analytical and AI applications, data storage remains a significant challenge. Despite advancements in storage technologies, the cost of maintaining vast datasets continues to grow. Compression techniques have been widely used to address this issue, but existing systems primarily rely on a single, typically lossless method, which limits adaptability to varying data characteristics. Methods This paper introduces COMPASS, a multiple compression approach that applies different compression techniques to different subsets of data within a database. COMPASS partitions relational data into rows or columns and selects the most suitable compression scheme for individual columns or column groups. Two versions of COMPASS are proposed: (i) COMPASS-D, which utilizes K-Means clustering based on data values; and (ii) COMPASS-E, which employs K-Means clustering based on column entropy to group similar columns efficiently. The effectiveness of COMPASS is evaluated using the Envmon dataset, a real-world environmental monitoring database, and compared against monolithic compression methods. Results Experimental results demonstrate that COMPASS significantly reduces disk space usage compared to traditional compression techniques. COMPASS-E achieves superior performance in terms of compression time and proximity to the optimal compression ratio, outperforming COMPASS-D. In worst-case scenarios, COMPASS methods offer 22% more savings compared to baseline techniques, with best-case savings reaching 56% (~2× improvement). Conclusion The proposed COMPASS framework offers a flexible and adaptive approach to database compression by leveraging multiple schemes tailored to different data subsets. This results in improved storage efficiency and reduced computational overhead. Future work will explore additional data characteristics and clustering methods to further enhance COMPASS’s adaptability and efficiency.
Related articles
Related articles are currently not available for this article.