Pinpointing Performance Inefficiencies in Java
Pengfei Su, Qingsen Wang, Milind Chabbi, Xu Liu

TL;DR
JXPerf is a lightweight Java performance analysis tool that identifies wasteful memory operations with minimal overhead, enabling practical optimizations in production environments.
Contribution
The paper introduces JXPerf, a novel hardware-assisted tool for pinpointing wasteful memory operations in Java, overcoming traditional overhead and accuracy issues.
Findings
JXPerf achieves only 7% runtime overhead.
Using JXPerf-guided optimizations yields significant speedups.
The tool effectively attributes inefficiencies to specific code locations.
Abstract
Many performance inefficiencies such as inappropriate choice of algorithms or data structures, developers' inattention to performance, and missed compiler optimizations show up as wasteful memory operations. Wasteful memory operations are those that produce/consume data to/from memory that may have been avoided. We present, JXPerf, a lightweight performance analysis tool for pinpointing wasteful memory operations in Java programs. Traditional byte-code instrumentation for such analysis (1) introduces prohibitive overheads and (2) misses inefficiencies in machine code generation. JXPerf overcomes both of these problems. JXPerf uses hardware performance monitoring units to sample memory locations accessed by a program and uses hardware debug registers to monitor subsequent accesses to the same memory. The result is a lightweight measurement at machine-code level with attribution of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Software System Performance and Reliability · Distributed systems and fault tolerance
Pinpointing Performance Inefficiencies in Java
Pengfei Su
College of William & Mary, USA
,
Qingsen Wang
College of William & Mary, USA
,
Milind Chabbi
Scalable Machines Research, USA
and
Xu Liu
College of William & Mary, USA
(2019)
Abstract.
Many performance inefficiencies such as inappropriate choice of algorithms or data structures, developers’ inattention to performance, and missed compiler optimizations show up as wasteful memory operations. Wasteful memory operations are those that produce/consume data to/from memory that may have been avoided. We present, JXPerf, a lightweight performance analysis tool for pinpointing wasteful memory operations in Java programs. Traditional byte-code instrumentation for such analysis (1) introduces prohibitive overheads and (2) misses inefficiencies in machine code generation. JXPerf overcomes both of these problems. JXPerf uses hardware performance monitoring units to sample memory locations accessed by a program and uses hardware debug registers to monitor subsequent accesses to the same memory. The result is a lightweight measurement at machine-code level with attribution of inefficiencies to their provenance — machine and source code within full calling contexts. JXPerf introduces only 7% runtime overhead and 7% memory overhead making it useful in production. Guided by JXPerf, we optimize several Java applications by improving code generation and choosing superior data structures and algorithms, which yield significant speedups.
Java profiler, performance optimization, PMU, debug registers
††copyright: acmlicensed††journalyear: 2019††conference: Proceedings of the 27th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering; August 26–30, 2019; Tallinn, Estonia††booktitle: Proceedings of the 27th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE ’19), August 26–30, 2019, Tallinn, Estonia††price: 15.00††doi: 10.1145/3338906.3338923††isbn: 978-1-4503-5572-8/19/08††ccs: General and reference Metrics††ccs: General and reference Performance††ccs: Software and its engineering Software maintenance tools
1. Introduction
Managed languages, such as Java, have become increasingly popular in various domains, including web services, graphic interfaces, and mobile computing. Although managed languages significantly improve development velocity, they often suffer from worse performance compared with native languages. Being a step removed from the underlying hardware is one of the performance handicaps of programming in managed languages. Despite their best efforts, programmers, compilers, runtimes, and layers of libraries, can easily introduce various subtleties to find performance inefficiencies in managed program executions. Such inefficiencies can easily go unnoticed (if not carefully and periodically monitored) or remain hard to diagnose (due to layers of abstraction and detachment from the underlying code generation, libraries, and runtimes).
Performance profiles abound in the Java world to aid developers to understand their program behavior. Profiling for execution hotspots is the most popular one (perf; Levon:OProfile; jprofiler-WWW; yourkit-WWW; visualvm-WWW; oracle-studio-WWW). Hotspot analysis tools identify code regions that are frequently executed disregarding whether execution is efficient or inefficient (useful or wasteful) and hence significant burden is on the developer to make a judgement call on whether there is scope to optimize a hotspot. Derived metrics such as Cycles-Per-Instruction (CPI) or cache miss ratio offer slightly better intuition into hotspots but are still not a panacea. Consider a loop repeatedly computing the exponential of the same number, which is obviously a wasteful work; the CPI metric simply acclaims such code with a low CPI value, which is considered a metric of goodness.
There is a need for tools that specifically pinpoint wasteful work and guide developers to focus on code regions where the optimizations are demanded. Our observation, which is justified by myriad case studies in this paper, is that many inefficiencies show up as wasteful operations when inspected at the machine code level, and those which involve the memory subsystem are particularly egregious. Although this is not a new observation (Chabbi:2012:DTP:2259016.2259033; Wen:2017:REV:3037697.3037729; witch; Su:2019:RLS:3339505.3339628) in native languages, its application to Java code is new and the problem is particularly severe in managed languages. The following inefficiencies often show up as wasteful memory operations.
**Algorithmic inefficiencies:: **
frequently performing a linear search shows up as frequently loading the same value from the same memory location.
**Data structural inefficiencies:: **
using a dense array to store sparse data where the array is repeatedly reinitialized to store different data items shows up as frequent store-followed-by-store operations to the same memory location without an intervening load operation.
**Suboptimal code generations:****: **
missed inlining can show up as storing the same values to the same stack locations; missed scalar replacement shows up as loading the same value from the same, unmodified, memory location.
**Developers’ inattention to performance:****: **
recomputing the same method in successive loop iterations can show up as silent stores (consecutive writes of the same value to the same memory). For example, the Java implementation of NPB-3.0 benchmark IS (Bailey:1991:NPB:125826.125925) performs the expensive power method inside a loop and in each iteration, the power method pushes the same parameters on the same stack location. Interestingly, this inefficiency is absent in the C version of the code due to a careful implementation where the developer hoisted the power function out of the loop.
This list suffices to provide an intuition about the class of inefficiencies detectable by observing certain patterns of memory operations at runtime. Some recent Java profilers (Xu:2013:RTO:2509136.2509512; Nguyen:2013:CDC:2491411.2491416; Dhok:2016:DTG:2950290.2950360; toddler; ldoctor) identify inefficiencies of this form. However, these tools are based on exhaustive Java byte code instrumentation, which suffer from two drawbacks: (1) high (up to 200) runtime overhead, which prevents them from being used for production software; (2) missing insights into lower-level layers e.g., inefficiencies in machine code.
