SoK: A Tale of Reduction, Security, and Correctness-Evaluating Program Debloating Paradigms and Their Compositions

Muaz Ali, Muhammad Muzammil, Faraz Karim, Ayesha Naeem, Rukhshan Haroon, Muhammad Haris, Huzaifah Nadeem, Waseem Sabir, Fahad Shaon, Fareed Zaffar, Vinod Yegneswaran, Ashish Gehani, Sazzadur Rahaman

Automated software debloating of program source or binary code has tremendous potential to improve both application performance and security. Unfortunately, measuring and comparing the effectiveness of various debloating methods is challenging due to the absence of a universal benchmarking platform that can accommodate diverse approaches. In this paper, first, we present DebloatBenchA, an extensible and sustainable benchmarking platform that enables comparison of different research techniques. Then, we perform a holistic comparison of the techniques to assess the current progress.

Abstract

Automated software debloating of program source or binary code has tremendous potential to improve both application performance and security. Unfortunately, measuring and comparing the effectiveness of various debloating methods is challenging due to the absence of a universal benchmarking platform that can accommodate diverse approaches. In this paper, first, we present DebloatBenchA (debloating benchmark for application), an extensible and sustainable benchmarking platform that enables comparison of different research techniques. Then, we perform a holistic comparison of the techniques to assess the current progress.

In the current version, we integrated four software debloating research tools: Chisel, Occam, Razor, and Piece-wise. Each tool is representative of a different class of debloaters: program source, compiler intermediate representation, executable binary, and external library. Our evaluation revealed interesting insights (i.e., hidden and explicit tradeoffs) about existing techniques, which might inspire future research. For example, all the binaries produced by Occam and Piece-Wise were correct, while Chisel significantly outperformed others in binary size and Gadget class reductions. In a first-of-its-kind composition, we also combined multiple debloaters to debloat a single binary. Our performance evaluation showed that, in both ASLR-proof and Turing-complete gadget expressively cases, several compositions (e.g., Chisel-Occam, Chisel-Occam-Razor) significantly outperformed the best-performing single tool (i.e., Chisel).

Cite

@inproceedings{ali2023sok,
  title={SoK: A Tale of Reduction, Security, and Correctness-Evaluating Program Debloating Paradigms and Their Compositions},
  author={Ali, Muaz and Muzammil, Muhammad and Karim, Faraz and Naeem, Ayesha and Haroon, Rukhshan and Haris, Muhammad and Nadeem, Huzaifah and Sabir, Waseem and Shaon, Fahad and Zaffar, Fareed and others},
  year={2023},
  organization={ESORICS}
}

Artifacts

Tags

Program Debloating, Debloating Comparison, Benchmark