\EuroVisShort\BibtexOrBiblatex\electronicVersion\PrintedOrElectronic
\teaser
High Smoothing Low Smoothing Original
(80%) (40%)
Examples of the (top) 4 input datasets (see section 4 for a description of the data). TopoLines results are shown for (middle) low and (bottom) high levels of smoothing, defined by the percent of local extrema removed. TopoLines works by preserving high amplitude extrema and flattening low amplitude ones while maintaining low residual error. See the supplement for the measures described in section 3.
TopoLines: Topological Smoothing for Line Charts
P. Rosen1\orcid0000-0002-0873-9518
and A. Suh1,2\orcid0000-0001-6513-8447
and C. Salgado1
and M. Hajij3\orcid0000-0002-2625-9286
1University of South Florida, Tampa FL, USA
2Tufts University, Medford, MA, USA
3KLA Tencor, Ann Arbor, MI, USA
Abstract
Line charts are commonly used to visualize a series of data values. When the data are noisy, smoothing is applied to make the signal more apparent. Conventional methods used to smooth line charts, e.g., using subsampling or filters, such as median, Gaussian, or low-pass, each optimize for different properties of the data. The properties generally do not include retaining peaks (i.e., local minima and maxima) in the data, which is an important feature for certain visual analytics tasks. We present TopoLines, a method for smoothing line charts using techniques from Topological Data Analysis. The design goal of TopoLines is to maintain prominent peaks in the data while minimizing any residual error. We evaluate TopoLines for 2 visual analytics tasks by comparing to 5 popular line smoothing methods with data from 4 application domains.
{CCSXML}
<ccs2012>
<concept>
<concept_id>10003120.10003145.10003147.10010923</concept_id>
<concept_desc>Human-centered computing Information visualization</concept_desc>
<concept_significance>500</concept_significance>
</concept>
<concept>
<concept_id>10003120.10003145.10011770</concept_id>
<concept_desc>Human-centered computing Visualization design and evaluation methods</concept_desc>
<concept_significance>500</concept_significance>
</concept>
</ccs2012>
\ccsdesc
[500]Human-centered computing Information visualization
\ccsdesc[500]Human-centered computing Visualization design and evaluation methods
\printccsdesc
††volume: 39††issue: 3
1 Introduction
Line charts are used to analyze data in a variety of applications, including identifying stock trends, tracking weather changes, understanding brain activity, etc. While significant increases in data availability allow users to create plots with many data points, relieving visual clutter requires performing additional data processing, such as smoothing. However, the way smoothing modifies the data can have an impact on the performance of visual analytics tasks. We consider smoothing in the context of 2 low-level tasks [AES05], finding extrema (i.e., local minima and maxima) and retrieving a value. These tasks, in essence, require that any smoothing method both retain extrema and minimize any residual error they introduce (i.e., the difference between the input and output data).
A variety of smoothing techniques are available. Uniform subsampling, for example, skips data on a regular interval, and while trivial to implement, the output optimizes upon no particular quality of the input. Other common methods, such as median, Gaussian, and low-pass cutoff filters, retain low-frequency aspects of the data but potentially lose extrema in the data. Irregular sampling, such as Douglas-Peucker [Ram72, DP73], does a better job preserving extrema, but it retains little detail in the smoothing process.
We address the weaknesses of prior approaches by applying Topological Data Analysis (TDA) to line chart smoothing. We do this by using TDA to capture a hierarchical relationship between extrema that allows removing those of “low importance”. At the same time, TopoLines minimizes the residual error between extrema, retaining much of the detail from the input data.
Previously, Kozlov and Weinkauf released Persistence1D, a TDA-based class for filtering 1D data using their persistence [KW14]. There has also been work done regarding topological smoothing of 2D and 3D functions [CSvdP10, EMP06, RSM*∗*19, TFL*∗*17]. However, we could find no prior studies that compared topological smoothing to conventional techniques in line charts. Therefore, our contributions are: 1) a description of 1D topological smoothing; 2) optimizations of topological smoothing for the visual analytics tasks of retrieving a value and finding extrema; and 3) an analytical evaluation of the effectiveness of TopoLines and 5 conventional smoothing methods on 4 dataset types.
Our results show that TopoLines is the most effective approach for many, but not all, combinations of data type and task. Almost as important, our results demonstrate the general ineffectiveness of several conventional methods, including median filters, cutoff filters, and uniform subsampling in the tasks and data evaluated.
2 TopoLines: Topologically Smoothed Line Charts
TopoLines smoothing requires 2 steps: 1) extraction of the topology of the data using persistent homology, and 2) smoothing the output by removing extrema based upon a user-selectable threshold.
2.1 Persistent Homology of a Line Chart
We provide a practical description of persistent homology using the line chart in Figure 2 and 2 as an example while leaving further details and theoretical justifications to [EH08].
We use the lower-star filtration of the simplicial complex, F, i.e., the points and edges, on the function f:F→R. The lower-star filtration of the data tracks the creation and merging of connected components of the sublevelset ∣F∣i=f−1(−∞,fi], as fi is swept from −∞→∞, represented by the blue region in Figure 2. The filtration is calculated by first sorting the points of f in increasing order. Then, points are inserted into ∣F∣ one at a time. An edge is added between any neighboring points already in ∣F∣i.
The relationship between connected components is tracked using a merge tree parameterized by f. When a component first appears at fi, caused by a local minimum, a leaf node is added to the merge tree at fi. For example in Figure 2(a), the orange connected component is formed at
E
, and an equivalent leaf node is created in the merge tree. As the plane is swept higher, as in Figure 2(b), new connected components—
A
in yellow and
C
in green—are created.
When 2 components merge, representing a local maximum, a merge node is created in the merge tree at fi and connected to the merged components. In Figure 2(c)/2(d), the green and orange connected components merge at
D
, a local maximum. The connected components are combined, in orange, and a merge node is added to the merge tree.
When a merge node is created, it is also paired with a leaf node (i.e., a local maximum is paired with a local minimum). In particular, it is paired with the minimum from the two merging components with the larger value. Referring to Figure 2(c)/2(d), the point
D
is paired with the minimum from the green and orange components with the larger value, in this case point
C
. In other words, f(\leavevmodeto11.72pt\vboxto11.72pt\pgfpicture\makeatletter\lower-5.8582ptto0.0pt\pgfsys@beginscope\pgfsys@invoke \definecolorpgfstrokecolorrgb0,0,0\pgfsys@color@rgb@stroke000\pgfsys@invoke \pgfsys@color@rgb@fill000\pgfsys@invoke \pgfsys@setlinewidth0.4pt\pgfsys@invoke \nullfontto0.0pt\pgfsys@beginscope\pgfsys@invoke \pgfsys@beginscope\pgfsys@invoke \pgfsys@moveto5.6582pt0.0pt\pgfsys@curveto5.6582pt3.12497pt3.12497pt5.6582pt0.0pt5.6582pt\pgfsys@curveto-3.12497pt5.6582pt-5.6582pt3.12497pt-5.6582pt0.0pt\pgfsys@curveto-5.6582pt-3.12497pt-3.12497pt-5.6582pt0.0pt-5.6582pt\pgfsys@curveto3.12497pt-5.6582pt5.6582pt-3.12497pt5.6582pt0.0pt\pgfsys@closepath\pgfsys@moveto0.0pt0.0pt\pgfsys@stroke\pgfsys@invoke \pgfsys@beginscope\pgfsys@invoke \pgfsys@transformcm1.00.00.01.0-3.61111pt-3.41666pt\pgfsys@invoke \definecolorpgfstrokecolorrgb0,0,0\pgfsys@color@rgb@stroke000\pgfsys@invoke \pgfsys@color@rgb@fill000\pgfsys@invoke C \pgfsys@invoke\lxSVG@closescope\pgfsys@endscope \pgfsys@invoke\lxSVG@closescope\pgfsys@endscope \pgfsys@invoke\lxSVG@closescope\pgfsys@endscope\hss\pgfsys@discardpath\pgfsys@invoke\lxSVG@closescope\pgfsys@endscope\hss\lxSVG@closescope\endpgfpicture)>f(\leavevmodeto11.42pt\vboxto11.42pt\pgfpicture\makeatletter\lower-5.70903ptto0.0pt\pgfsys@beginscope\pgfsys@invoke \definecolorpgfstrokecolorrgb0,0,0\pgfsys@color@rgb@stroke000\pgfsys@invoke \pgfsys@color@rgb@fill000\pgfsys@invoke \pgfsys@setlinewidth0.4pt\pgfsys@invoke \nullfontto0.0pt\pgfsys@beginscope\pgfsys@invoke \pgfsys@beginscope\pgfsys@invoke \pgfsys@moveto5.50903pt0.0pt\pgfsys@curveto5.50903pt3.04259pt3.04259pt5.50903pt0.0pt5.50903pt\pgfsys@curveto-3.04259pt5.50903pt-5.50903pt3.04259pt-5.50903pt0.0pt\pgfsys@curveto-5.50903pt-3.04259pt-3.04259pt-5.50903pt0.0pt-5.50903pt\pgfsys@curveto3.04259pt-5.50903pt5.50903pt-3.04259pt5.50903pt0.0pt\pgfsys@closepath\pgfsys@moveto0.0pt0.0pt\pgfsys@stroke\pgfsys@invoke \pgfsys@beginscope\pgfsys@invoke \pgfsys@transformcm1.00.00.01.0-3.40279pt-3.41666pt\pgfsys@invoke \definecolorpgfstrokecolorrgb0,0,0\pgfsys@color@rgb@stroke000\pgfsys@invoke \pgfsys@color@rgb@fill000\pgfsys@invoke E \pgfsys@invoke\lxSVG@closescope\pgfsys@endscope \pgfsys@invoke\lxSVG@closescope\pgfsys@endscope \pgfsys@invoke\lxSVG@closescope\pgfsys@endscope\hss\pgfsys@discardpath\pgfsys@invoke\lxSVG@closescope\pgfsys@endscope\hss\lxSVG@closescope\endpgfpicture), therefore, [\leavevmodeto11.72pt\vboxto11.72pt\pgfpicture\makeatletter\lower-5.8582ptto0.0pt\pgfsys@beginscope\pgfsys@invoke \definecolorpgfstrokecolorrgb0,0,0\pgfsys@color@rgb@stroke000\pgfsys@invoke \pgfsys@color@rgb@fill000\pgfsys@invoke \pgfsys@setlinewidth0.4pt\pgfsys@invoke \nullfontto0.0pt\pgfsys@beginscope\pgfsys@invoke \pgfsys@beginscope\pgfsys@invoke \pgfsys@moveto5.6582pt0.0pt\pgfsys@curveto5.6582pt3.12497pt3.12497pt5.6582pt0.0pt5.6582pt\pgfsys@curveto-3.12497pt5.6582pt-5.6582pt3.12497pt-5.6582pt0.0pt\pgfsys@curveto-5.6582pt-3.12497pt-3.12497pt-5.6582pt0.0pt-5.6582pt\pgfsys@curveto3.12497pt-5.6582pt5.6582pt-3.12497pt5.6582pt0.0pt\pgfsys@closepath\pgfsys@moveto0.0pt0.0pt\pgfsys@stroke\pgfsys@invoke \pgfsys@beginscope\pgfsys@invoke \pgfsys@transformcm1.00.00.01.0-3.61111pt-3.41666pt\pgfsys@invoke \definecolorpgfstrokecolorrgb0,0,0\pgfsys@color@rgb@stroke000\pgfsys@invoke \pgfsys@color@rgb@fill000\pgfsys@invoke C \pgfsys@invoke\lxSVG@closescope\pgfsys@endscope \pgfsys@invoke\lxSVG@closescope\pgfsys@endscope \pgfsys@invoke\lxSVG@closescope\pgfsys@endscope\hss\pgfsys@discardpath\pgfsys@invoke\lxSVG@closescope\pgfsys@endscope\hss\lxSVG@closescope\endpgfpicture,\leavevmodeto12.04pt\vboxto12.04pt\pgfpicture\makeatletter\lower-6.01982ptto0.0pt\pgfsys@beginscope\pgfsys@invoke \definecolorpgfstrokecolorrgb0,0,0\pgfsys@color@rgb@stroke000\pgfsys@invoke \pgfsys@color@rgb@fill000\pgfsys@invoke \pgfsys@setlinewidth0.4pt\pgfsys@invoke \nullfontto0.0pt\pgfsys@beginscope\pgfsys@invoke \pgfsys@beginscope\pgfsys@invoke \pgfsys@moveto5.81982pt0.0pt\pgfsys@curveto5.81982pt3.21423pt3.21423pt5.81982pt0.0pt5.81982pt\pgfsys@curveto-3.21423pt5.81982pt-5.81982pt3.21423pt-5.81982pt0.0pt\pgfsys@curveto-5.81982pt-3.21423pt-3.21423pt-5.81982pt0.0pt-5.81982pt\pgfsys@curveto3.21423pt-5.81982pt5.81982pt-3.21423pt5.81982pt0.0pt\pgfsys@closepath\pgfsys@moveto0.0pt0.0pt\pgfsys@stroke\pgfsys@invoke \pgfsys@beginscope\pgfsys@invoke \pgfsys@transformcm1.00.00.01.0-3.81944pt-3.41666pt\pgfsys@invoke \definecolorpgfstrokecolorrgb0,0,0\pgfsys@color@rgb@stroke000\pgfsys@invoke \pgfsys@color@rgb@fill000\pgfsys@invoke D \pgfsys@invoke\lxSVG@closescope\pgfsys@endscope \pgfsys@invoke\lxSVG@closescope\pgfsys@endscope \pgfsys@invoke\lxSVG@closescope\pgfsys@endscope\hss\pgfsys@discardpath\pgfsys@invoke\lxSVG@closescope\pgfsys@endscope\hss\lxSVG@closescope\endpgfpicture) form an extrema pair. The new merged component in orange continues with minimum
E
. Similarly, in Figure 2(d)/2(e), at
B
, the value of the minimum of yellow f(\leavevmodeto11.97pt\vboxto11.97pt\pgfpicture\makeatletter\lower-5.9871ptto0.0pt\pgfsys@beginscope\pgfsys@invoke \definecolorpgfstrokecolorrgb0,0,0\pgfsys@color@rgb@stroke000\pgfsys@invoke \pgfsys@color@rgb@fill000\pgfsys@invoke \pgfsys@setlinewidth0.4pt\pgfsys@invoke \nullfontto0.0pt\pgfsys@beginscope\pgfsys@invoke \pgfsys@beginscope\pgfsys@invoke \pgfsys@moveto5.78711pt0.0pt\pgfsys@curveto5.78711pt3.19617pt3.19617pt5.78711pt0.0pt5.78711pt\pgfsys@curveto-3.19617pt5.78711pt-5.78711pt3.19617pt-5.78711pt0.0pt\pgfsys@curveto-5.78711pt-3.19617pt-3.19617pt-5.78711pt0.0pt-5.78711pt\pgfsys@curveto3.19617pt-5.78711pt5.78711pt-3.19617pt5.78711pt0.0pt\pgfsys@closepath\pgfsys@moveto0.0pt0.0pt\pgfsys@stroke\pgfsys@invoke \pgfsys@beginscope\pgfsys@invoke \pgfsys@transformcm1.00.00.01.0-3.75pt-3.41666pt\pgfsys@invoke \definecolorpgfstrokecolorrgb0,0,0\pgfsys@color@rgb@stroke000\pgfsys@invoke \pgfsys@color@rgb@fill000\pgfsys@invoke A \pgfsys@invoke\lxSVG@closescope\pgfsys@endscope \pgfsys@invoke\lxSVG@closescope\pgfsys@endscope \pgfsys@invoke\lxSVG@closescope\pgfsys@endscope\hss\pgfsys@discardpath\pgfsys@invoke\lxSVG@closescope\pgfsys@endscope\hss\lxSVG@closescope\endpgfpicture) and orange f(\leavevmodeto11.42pt\vboxto11.42pt\pgfpicture\makeatletter\lower-5.70903ptto0.0pt\pgfsys@beginscope\pgfsys@invoke \definecolorpgfstrokecolorrgb0,0,0\pgfsys@color@rgb@stroke000\pgfsys@invoke \pgfsys@color@rgb@fill000\pgfsys@invoke \pgfsys@setlinewidth0.4pt\pgfsys@invoke \nullfontto0.0pt\pgfsys@beginscope\pgfsys@invoke \pgfsys@beginscope\pgfsys@invoke \pgfsys@moveto5.50903pt0.0pt\pgfsys@curveto5.50903pt3.04259pt3.04259pt5.50903pt0.0pt5.50903pt\pgfsys@curveto-3.04259pt5.50903pt-5.50903pt3.04259pt-5.50903pt0.0pt\pgfsys@curveto-5.50903pt-3.04259pt-3.04259pt-5.50903pt0.0pt-5.50903pt\pgfsys@curveto3.04259pt-5.50903pt5.50903pt-3.04259pt5.50903pt0.0pt\pgfsys@closepath\pgfsys@moveto0.0pt0.0pt\pgfsys@stroke\pgfsys@invoke \pgfsys@beginscope\pgfsys@invoke \pgfsys@transformcm1.00.00.01.0-3.40279pt-3.41666pt\pgfsys@invoke \definecolorpgfstrokecolorrgb0,0,0\pgfsys@color@rgb@stroke000\pgfsys@invoke \pgfsys@color@rgb@fill000\pgfsys@invoke E \pgfsys@invoke\lxSVG@closescope\pgfsys@endscope \pgfsys@invoke\lxSVG@closescope\pgfsys@endscope \pgfsys@invoke\lxSVG@closescope\pgfsys@endscope\hss\pgfsys@discardpath\pgfsys@invoke\lxSVG@closescope\pgfsys@endscope\hss\lxSVG@closescope\endpgfpicture) are compared, and [\leavevmodeto11.97pt\vboxto11.97pt\pgfpicture\makeatletter\lower-5.9871ptto0.0pt\pgfsys@beginscope\pgfsys@invoke \definecolorpgfstrokecolorrgb0,0,0\pgfsys@color@rgb@stroke000\pgfsys@invoke \pgfsys@color@rgb@fill000\pgfsys@invoke \pgfsys@setlinewidth0.4pt\pgfsys@invoke \nullfontto0.0pt\pgfsys@beginscope\pgfsys@invoke \pgfsys@beginscope\pgfsys@invoke \pgfsys@moveto5.78711pt0.0pt\pgfsys@curveto5.78711pt3.19617pt3.19617pt5.78711pt0.0pt5.78711pt\pgfsys@curveto-3.19617pt5.78711pt-5.78711pt3.19617pt-5.78711pt0.0pt\pgfsys@curveto-5.78711pt-3.19617pt-3.19617pt-5.78711pt0.0pt-5.78711pt\pgfsys@curveto3.19617pt-5.78711pt5.78711pt-3.19617pt5.78711pt0.0pt\pgfsys@closepath\pgfsys@moveto0.0pt0.0pt\pgfsys@stroke\pgfsys@invoke \pgfsys@beginscope\pgfsys@invoke \pgfsys@transformcm1.00.00.01.0-3.75pt-3.41666pt\pgfsys@invoke \definecolorpgfstrokecolorrgb0,0,0\pgfsys@color@rgb@stroke000\pgfsys@invoke \pgfsys@color@rgb@fill000\pgfsys@invoke A \pgfsys@invoke\lxSVG@closescope\pgfsys@endscope \pgfsys@invoke\lxSVG@closescope\pgfsys@endscope \pgfsys@invoke\lxSVG@closescope\pgfsys@endscope\hss\pgfsys@discardpath\pgfsys@invoke\lxSVG@closescope\pgfsys@endscope\hss\lxSVG@closescope\endpgfpicture,\leavevmodeto11.65pt\vboxto11.65pt\pgfpicture\makeatletter\lower-5.82304ptto0.0pt\pgfsys@beginscope\pgfsys@invoke \definecolorpgfstrokecolorrgb0,0,0\pgfsys@color@rgb@stroke000\pgfsys@invoke \pgfsys@color@rgb@fill000\pgfsys@invoke \pgfsys@setlinewidth0.4pt\pgfsys@invoke \nullfontto0.0pt\pgfsys@beginscope\pgfsys@invoke \pgfsys@beginscope\pgfsys@invoke \pgfsys@moveto5.62305pt0.0pt\pgfsys@curveto5.62305pt3.10556pt3.10556pt5.62305pt0.0pt5.62305pt\pgfsys@curveto-3.10556pt5.62305pt-5.62305pt3.10556pt-5.62305pt0.0pt\pgfsys@curveto-5.62305pt-3.10556pt-3.10556pt-5.62305pt0.0pt-5.62305pt\pgfsys@curveto3.10556pt-5.62305pt5.62305pt-3.10556pt5.62305pt0.0pt\pgfsys@closepath\pgfsys@moveto0.0pt0.0pt\pgfsys@stroke\pgfsys@invoke \pgfsys@beginscope\pgfsys@invoke \pgfsys@transformcm1.00.00.01.0-3.54167pt-3.41666pt\pgfsys@invoke \definecolorpgfstrokecolorrgb0,0,0\pgfsys@color@rgb@stroke000\pgfsys@invoke \pgfsys@color@rgb@fill000\pgfsys@invoke B \pgfsys@invoke\lxSVG@closescope\pgfsys@endscope \pgfsys@invoke\lxSVG@closescope\pgfsys@endscope \pgfsys@invoke\lxSVG@closescope\pgfsys@endscope\hss\pgfsys@discardpath\pgfsys@invoke\lxSVG@closescope\pgfsys@endscope\hss\lxSVG@closescope\endpgfpicture) are paired.
The output of the operation is the set of all extrema pairs, C={[b0,d0),[b1,d1),...,[bm,dm)}, where bi and di are the local minimum and maximum, respectively, and m is the number of pairs.
Boundaries require special handling, as notable in Figure 2. If a boundary point is a local minimum, e.g.,
A
, it is connected to a point at +∞. Similarly, a local maximum boundary point is connected to −∞, e.g.,
F
. The additional points ensure all extrema are paired. The algorithm has O(nlogn) complexity by using the disjoint-set data structure to track connected components. The complexity improves to O(n+mlogm) by removing all non-extrema from the input before merge tree construction.
2.2 Topological Simplification
The set of extrema pairs, C, is used to guide smoothing, as follows. For each pair, a measure known as persistence is calculated, which is simply the difference in function value between the local minimum and local maximum of the pair, i.e., pi=∣f(di)−f(bi)∣. In effect, this measures the peak-to-peak amplitude.
The simplification is controlled by removing extrema pairs from the output through either a user-specified persistence threshold, t, to remove pairs, {Ci∣pi<t}, or by removing a percentage, q, of pairs by ranking/sorting them, {Ci∣rank(Ci)<q⋅m}. To reconstruct the line, the extrema that are not removed, in addition to the boundary points, are first placed into the output. For Figure 1(f), this includes
A
,
B
,
E
, and
F
. Next, the intermediate data is calculated.
As pointed out by prior work on 2D manifolds [EMP06] and contour trees [CSvdP10], removing a pair of critical points from the function is as simple as “flattening” the function.
For a 1D function, this equates to making the function monotonic between neighboring extrema. For example, in Figure 1(f), removing the [\leavevmodeto11.72pt\vboxto11.72pt\pgfpicture\makeatletter\lower-5.8582ptto0.0pt\pgfsys@beginscope\pgfsys@invoke \definecolorpgfstrokecolorrgb0,0,0\pgfsys@color@rgb@stroke000\pgfsys@invoke \pgfsys@color@rgb@fill000\pgfsys@invoke \pgfsys@setlinewidth0.4pt\pgfsys@invoke \nullfontto0.0pt\pgfsys@beginscope\pgfsys@invoke \pgfsys@beginscope\pgfsys@invoke \pgfsys@moveto5.6582pt0.0pt\pgfsys@curveto5.6582pt3.12497pt3.12497pt5.6582pt0.0pt5.6582pt\pgfsys@curveto-3.12497pt5.6582pt-5.6582pt3.12497pt-5.6582pt0.0pt\pgfsys@curveto-5.6582pt-3.12497pt-3.12497pt-5.6582pt0.0pt-5.6582pt\pgfsys@curveto3.12497pt-5.6582pt5.6582pt-3.12497pt5.6582pt0.0pt\pgfsys@closepath\pgfsys@moveto0.0pt0.0pt\pgfsys@stroke\pgfsys@invoke \pgfsys@beginscope\pgfsys@invoke \pgfsys@transformcm1.00.00.01.0-3.61111pt-3.41666pt\pgfsys@invoke \definecolorpgfstrokecolorrgb0,0,0\pgfsys@color@rgb@stroke000\pgfsys@invoke \pgfsys@color@rgb@fill000\pgfsys@invoke C \pgfsys@invoke\lxSVG@closescope\pgfsys@endscope \pgfsys@invoke\lxSVG@closescope\pgfsys@endscope \pgfsys@invoke\lxSVG@closescope\pgfsys@endscope\hss\pgfsys@discardpath\pgfsys@invoke\lxSVG@closescope\pgfsys@endscope\hss\lxSVG@closescope\endpgfpicture,\leavevmodeto12.04pt\vboxto12.04pt\pgfpicture\makeatletter\lower-6.01982ptto0.0pt\pgfsys@beginscope\pgfsys@invoke \definecolorpgfstrokecolorrgb0,0,0\pgfsys@color@rgb@stroke000\pgfsys@invoke \pgfsys@color@rgb@fill000\pgfsys@invoke \pgfsys@setlinewidth0.4pt\pgfsys@invoke \nullfontto0.0pt\pgfsys@beginscope\pgfsys@invoke \pgfsys@beginscope\pgfsys@invoke \pgfsys@moveto5.81982pt0.0pt\pgfsys@curveto5.81982pt3.21423pt3.21423pt5.81982pt0.0pt5.81982pt\pgfsys@curveto-3.21423pt5.81982pt-5.81982pt3.21423pt-5.81982pt0.0pt\pgfsys@curveto-5.81982pt-3.21423pt-3.21423pt-5.81982pt0.0pt-5.81982pt\pgfsys@curveto3.21423pt-5.81982pt5.81982pt-3.21423pt5.81982pt0.0pt\pgfsys@closepath\pgfsys@moveto0.0pt0.0pt\pgfsys@stroke\pgfsys@invoke \pgfsys@beginscope\pgfsys@invoke \pgfsys@transformcm1.00.00.01.0-3.81944pt-3.41666pt\pgfsys@invoke \definecolorpgfstrokecolorrgb0,0,0\pgfsys@color@rgb@stroke000\pgfsys@invoke \pgfsys@color@rgb@fill000\pgfsys@invoke D \pgfsys@invoke\lxSVG@closescope\pgfsys@endscope \pgfsys@invoke\lxSVG@closescope\pgfsys@endscope \pgfsys@invoke\lxSVG@closescope\pgfsys@endscope\hss\pgfsys@discardpath\pgfsys@invoke\lxSVG@closescope\pgfsys@endscope\hss\lxSVG@closescope\endpgfpicture) critical point pair requires modifying the function such that it is monotonically decreasing between critical points
B
and
E
.
The design space of possible modifications is quite broad—any monotonic function satisfies the topological constraint. We apply the additional constraint that the remainder of the function is modified as little as possible. To accomplish this, we use isotonic regression [Bar72], which is a monotonic regression technique that minimizes the least square error. The time complexity of isotonic regression and our reconstruction is O(n).
3 Evaluation
We compare TopoLines to 5 other smoothing methods:
A median filter (see Figure 3(a)) is a nonlinear rank filter, which is particularly good at removing salt-and-pepper noise [Arc05]. For each input datum, the filter extracts a surrounding neighborhood window and outputs its median value. Smoothing is increased by enlarging the window size.
The Gaussian filter (see Figure 3(b)) a commonly used convolutional filter in signal and image processing [KS11].
The approach applies a stencil, whose weights come from a normal distribution, to an input neighborhood. The smoothing level is changed by adjusting the standard deviation of the distribution.
A low-pass cutoff filter (see Figure 3(c)) converts the scalar data into the frequency domain via Discrete Fourier Transform (DFT) [CT65], zeros frequencies above a cutoff threshold, and computes the new scalar values with an inverse DFT. The level of smoothing is adjusted by modifying the cutoff frequency.
Uniform subsampling (see Figure 3(d)) selects points at regular intervals. Between selected points, linear interpolation is used. The smoothing level is increased by sampling fewer points.
Douglas-Peucker [Ram72, DP73] (see Figure 3(e)) is a non-uniform subsampling approach that optimizes the l∞-norm of the residual error. The algorithm starts by selecting the boundary points of the input and connects them with linear interpolation. Points are then iteratively added by inserting the input point with the largest distance to the output. The process repeats until a user-specified threshold distance is reached.
3.1 Task Analysis
We considered a variety of low-level tasks based upon the taxonomy of Amar et al. [AES05] and settled upon 2 tasks that we hypothesized TopoLines would perform well. For each, we only consider the resulting impact on the modification of the data, not the perceptual impact of the smoothing (see future work in section 5). For each task, we provide a brief description along with average and worst case analytical measures of performance.
Retrieve Value
is a task focused on finding a specific function value on a given chart. An example query would be, “What was the GOOG stock (Figure 1(d)) price on April 15, 2018?” The accuracy of retrieving a value is dependent upon how closely the values of the smoothed data reflect the values in the input data. We measure this by considering the residual error between the original and smoothed data using vector norms.
For the average case performance, we consider the l1-norm: ∥l∥1=∑i=1n∣xi−xi′∣, which measures the sum of the absolute value of errors. Since the data length is fixed, comparing the sum of errors is equivalent to comparing the average error.
For the worst case performance, we consider the l∞-norm: ∥l∥∞=imax∣xi−xi′∣, which measures only the point of the largest difference between the input and output data.
Find Extrema
task is concerned with identifying minima and maxima in the data. An example query would be, “What are the dates of the top 3 peaks of GOOG (Figure 1(d))?” The performance of this task requires that in smoothing, extrema remain in the data. To measure the performance, we calculated the topological difference between the input and smoothed data using methods from TDA [EH10].
First, the persistent homology of the original and smoothed data are calculated, as described in subsection 2.1, to create 2 sets of extrema pairs C and C′, respectively. For technical reasons, all pairs with infinite persistence are removed, and all pairs of 0-persistence [c,c) are added to make the cardinality infinite [KMN17]. Let η be a bijection between the 2 sets.
The average case is measured using the 1-Wasserstein distance,
W1(C,C′)=η:C→C′infΣc∈C∥c−η(c)∥1, between the input and output extrema pairs, which identifies the average perturbation of extrema. The worst case is measured using Bottleneck distance, W∞(C,C′)=infη:C→C′supc∈C∥c−η(c)∥∞, which only returns the difference in the extrema with the largest distortion.
Baseline.
Each smoothing method offers an adjustable simplification parameter, whose interpretation and output are approach dependent. This variation prevents us from directly using the threshold for comparing methods.
Instead, we use approximate entropy as a calibration measure since it has been shown to be a good proxy for line chart complexity [RMCW19] (see Figure 3).
Comparison.
To compare methods, we evaluated each technique using the 4 metrics, described above, across the full range of approximate entropy values. Each technique/metric then had the best fit line calculated, and the approaches were ranked by their area under the curve from smallest to largest. In other words, for a given measure, the methods are ranked by which produces the lowest error across the range of entropy values. See the supplemental materials for all measures and best fit lines.
4 Results and Discussion
We test our method using 4 application domains (see TopoLines: Topological Smoothing for Line Charts) of 5 datasets each.
Radio astronomy data are 5 spectral “lines” that measure the frequency and amplitude of radio waves emitted by extraterrestrial matter (i.e., gas and dust) and was downloaded from [alm].
Climate is a measure of daily high temperature recorded from July to July over 5 periods (20-13/14 through 20-17/18) at a large metropolitan airport downloaded from [YKI*∗*18].
The EEG data each contain a window from 5 (of 32 total) channels of a single subject undergoing a visual attention task and was acquired from [Del].
Stock trends contain daily closing values for 5 companies (Amazon, Google, Intel, Toyota, and Tesla) over 5 years, starting in February 2015, collected from Yahoo Finance.
All source code is available at https://github.com/USFDataVisualization/TopoLines, and results are available at https://USFDataVisualization.github.io/TopoLines.
The results for all data, measures, and smoothing approaches are summarized in Figure 4.
For all datasets, TopoLines performed best in both average and worst case for the retrieve value task, with the only exception being a second-best finish for average case with the stock data. For the find extrema task, TopoLines performed best or second-best in the average and worst cases for astro and climate data. For the EEG and stock data, TopoLines performed mostly unremarkably. Our best guess as to this result is that the high frequency of the noise makes many of the local extrema that TopoLines is trying to preserve unimportant for these data.
Among the conventional smoothing methods, it is relevant to note that for the retrieve value task, Gaussian smoothing performed reasonably well overall, and for finding extrema, Douglas-Peucker performed well. Among the other methods, uniform subsampling, cutoff filter, and median filter, none performed consistently well at either task on multiple data types. We, therefore, recommend care in choosing to use them, at least for the tasks we evaluated.
5 Conclusions
In conclusion, we presented a topology-based line chart smoothing method called TopoLines. In the process, we showed that TopoLines has the potential to perform well for certain visual analysis tasks. However, all of these methods, including TopoLines, would benefit from an evaluation framework that considers a broader set of tasks and perceptual differences resulting from their use. In the future, we would like to build upon our current tasks list and run user studies to evaluate how the effect of smoothing on line charts are perceived. We hope to formulate a set of guidelines, based on these studies, that would be helpful for deciding which smoothing methods are best to use in practice.
Acknowledgments
We would like to thank Bei Wang for providing valuable feedback on this project. This work was supported in part by a grant from National Science Foundation (IIS-1513616 and IIS-1845204).