The Joint Weighted Average (JWA) Operator

Stephen B. Broomell; Christian Wagner

arXiv:2302.11885·cs.AI·May 6, 2024

The Joint Weighted Average (JWA) Operator

Stephen B. Broomell, Christian Wagner

PDF

Open Access

TL;DR

This paper introduces a novel joint weighted averaging operator that seamlessly combines source worth and evidence worth in information aggregation, grounded in compositional geometry, with broad interdisciplinary potential.

Contribution

It conceptually integrates source and evidence weighting approaches into a unified operator using compositional geometry, providing a new semantic interpretation.

Findings

01

Systematic integration of source and evidence weights.

02

Framework based on compositional geometry.

03

Potential applications across disciplines.

Abstract

Information aggregation is a vital tool for human and machine decision making in the presence of uncertainty. Traditionally, approaches to aggregation broadly diverge into two categories, those which attribute a worth or weight to information sources and those which attribute said worth to the evidence arising from said sources. The latter is pervasive in the physical sciences, underpinning linear order statistics and enabling non-linear aggregation. The former is popular in the social sciences, providing interpretable insight on the sources. While prior work has identified the need to apply both approaches simultaneously, it has yet to conceptually integrate both approaches and provide a semantic interpretation of the arising aggregation approach. Here, we conceptually integrate both approaches in a novel joint weighted averaging operator. We leverage compositional geometry to underpin…

Tables2

Table 1. Table 1: Aggregation Example

	Expert 1	Expert 2	Expert 3
Linear Weights	0.60	0.30	0.10
	Min	Mid	Max
Order Weights	0.05	0.50	0.45
Expert	#1 (Max)	#2 (Mid)	#3 (Min)
Joint Weights	0.64	0.35	0.01
Judgments	90	50	10
	Aggregates
Linear Weights	70.00
Ordered Weights	66.00
Joint Weights	74.94

Table 2. Table 2: Variable Simulation Parameters

Set	Validity $σ_{x_{i}, y}$										var( $σ_{x_{i}, y}$ )
1	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	0.00
2	0.91	0.95	0.97	0.99	1.01	1.02	1.03	1.04	1.04	1.05	0.00
3	0.68	0.80	0.89	0.96	1.01	1.06	1.10	1.14	1.17	1.20	0.03
4	0.03	0.10	0.23	0.42	0.65	0.94	1.27	1.66	2.10	2.60	0.79
5	0.00	0.01	0.03	0.10	0.25	0.51	0.95	1.62	2.59	3.95	1.79
6	0.00	0.00	0.00	0.00	0.02	0.10	0.34	1.00	2.57	5.96	3.70
7	0.00	0.00	0.00	0.00	0.00	0.00	0.03	0.23	1.52	8.22	6.65

Equations16

L W A (x_{1}, \dots, x_{n}) = i = 1 \sum n w_{i} x_{i} .

L W A (x_{1}, \dots, x_{n}) = i = 1 \sum n w_{i} x_{i} .

O W A (x_{1}, \dots, x_{n}) = i = 1 \sum n v_{i} x_{π (i)}

O W A (x_{1}, \dots, x_{n}) = i = 1 \sum n v_{i} x_{π (i)}

O W A W A (x) = α L W A (x) + (1 - α) O W A (x) .

O W A W A (x) = α L W A (x) + (1 - α) O W A (x) .

S D O W A (x) = G (w, v) L W A^{w} (x) + (1 - G (w, v)) O W A^{v} (x),

S D O W A (x) = G (w, v) L W A^{w} (x) + (1 - G (w, v)) O W A^{v} (x),

G (w, v) = \frac{s d ( w )}{s d ( w ) + s d ( v )} .

G (w, v) = \frac{s d ( w )}{s d ( w ) + s d ( v )} .

a >= 0; b >= 0; c >= 0; \mbox an d a + b + c = d .

a >= 0; b >= 0; c >= 0; \mbox an d a + b + c = d .

x \oplus y = {\frac{x _{1} * y _{1}}{\sum _{k = 1}^{K} x _{k} * y _{k}}, \dots, \frac{x _{K} * y _{K}}{\sum _{k = 1}^{K} x _{k} * y _{k}}} .

x \oplus y = {\frac{x _{1} * y _{1}}{\sum _{k = 1}^{K} x _{k} * y _{k}}, \dots, \frac{x _{K} * y _{K}}{\sum _{k = 1}^{K} x _{k} * y _{k}}} .

J W A = (w_{π} \oplus v) x_{π}^{T} .

J W A = (w_{π} \oplus v) x_{π}^{T} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsForecasting Techniques and Applications · Geochemistry and Geologic Mapping · Multi-Criteria Decision Making

Full text

The Joint Weighted Average (JWA) Operator

Stephen B. Broomell

Department of Psychological Sciences

Purdue University

West Lafayette, IN, USA

Christian Wagner

Lab for Uncertainty in Data and Decision Making (LUCID)

School of Computer Science

University of Nottingham

Nottingham, UK

Abstract

Information aggregation is a vital tool for human and machine decision making, especially in the presence of noise and uncertainty. Traditionally, approaches to aggregation broadly diverge into two categories, those which attribute a worth or weight to information sources and those which attribute said worth to the evidence arising from said sources. The latter is pervasive in particular in the physical sciences, underpinning linear order statistics and enabling non-linear aggregation. The former is popular in the social sciences, providing interpretable insight on the sources. Thus far, limited work has sought to integrate both approaches, applying either approach to a different degree. In this paper, we put forward an approach which integrates–rather than partially applies–both approaches, resulting in a novel joint weighted averaging operator. We show how this operator provides a systematic approach to integrating a priori beliefs about the worth of both source and evidence by leveraging compositional geometry–producing results unachievable by traditional operators. We conclude and highlight the potential of the operator across disciplines, from machine learning to psychology.

1 Introduction

Information aggregation is a vital tool for decision making, especially in the presence of noise and uncertainty. Whether it be for a computerized system or for human decision making, evidence derived from information sources serves as decision inputs, and the discord between pieces of evidence can often be addressed by considering multiple of said sources. For example, museums commonly use multiple thermometers to ensure the maintenance of an ideal temperature reliably, while expert committees rely on the presence of multiple experts to address discord and individual knowledge-gaps. Prior research on this topic can be divided into two different approaches, based on distinct priorities often associated with the research discipline. The first approach determines the worth of any piece of evidence based on the worth of it’s source [Clemen and Winkler, 1993, Weiss and Shanteau, 2003, Winkler and Clemen, 1992]. The second approach determines the worth of a given piece of evidence relative to a given criterion or function [Yager, 1988, Grabisch et al., 2000, Havens et al., 2015, Wagner et al., 2017].

In the social sciences, the traditional emphasis is on human judgments, with the discord between judgments reflecting uncertainty about the validity, redundancy, and reliability of a set of expert judges [Weiss and Shanteau, 2003, Winkler and Clemen, 1992]. Aggregation across a suitably diverse set of such judges typically leads to improved judgment accuracy, coined the wisdom of crowds [Davis-Stober et al., 2014]. As such, the focus here is on the information sources themselves, for example, understanding the properties of each judge to identify the best judges to rely on for a given decision context [Budescu and Chen, 2015, Chen et al., 2016, Clemen and Winkler, 1993, Davis-Stober et al., 2015, Lamberson and Page, 2012]. Viewing the work from this perspective highlights a potential problem–the diversity of the set of experts is not universal, i.e. there may be some questions for which the set of experts provide a diverse set of responses, while the responses for another question by the same set of experts may not be at all diverse. This may be desirable or it may result in unexpected bias. We will not discuss this further here, but future work will explore it in relation to the operator proposed in this paper. 111In the social sciences, frequently the emphasis of research is on gaining understanding of the information sources rather than generating a general process that can apply to novel contexts. Even in research areas such as on the wisdom of crowds–which are interested in accurate prediction–approaches from a social science perspective are often guided by a focus on the relationships between the information sources as opposed to the relationship between the evidence and the question in focus.

Conversely, in the physical sciences, broadly speaking, the primary emphasis is on the quality or performance of fusing data across a number of sources, such as from sensors. These sources may have different properties or may simply behave differently in different environments due to contextual factors such as placement location. With aggregation here typically targeting robust performance across environments through aggregation operators such as the fuzzy integral [Grabisch et al., 2000], less emphasis is often placed on the explanation of the fusion process, e.g., in terms of properties of sources.222We highlight that we are not referring to work seeking to explain the behavior of the operator, such as [Murray et al., 2021]

The differing emphases of these two areas of study has resulted in two different strategies for information aggregation. In this paper, we propose a novel approach to integrate these two strategies. We start by briefly considering each of the extant strategies, their motivation, and the resulting operationalization within aggregation.

1.1 Linear and Order Weighting

The first information aggregation strategy is operationalized by a linear combination based on the information sources, which underlies many approaches to information aggregation for wisdom of crowds [Budescu and Chen, 2015, Davis-Stober et al., 2014, 2015, Huang et al., 2022]. This approach reflects linear models pervasive in the social sciences and psychology, such as linear regression, serving as an interpretable modelling means allowing scientists to understand the relative importance of each information source in an overall aggregate [Dawes and Corrigan, 1974]. We call these Linear Weighted Averages (LWA) because they are represented as a linear combination of the information sources with weights reflecting properties of, or assigned to these sources (e.g., their recency, reliability, relevance, trustworthiness, or validity), conceptualised as source-specific worth, and operationalized as source-specific weights.

While this linear combination of information provides a powerful normative framework with distinct benefits such as interpretability and robustness to over-fitting [Dawes, 1979], its restricted degrees of freedom in turn limit what type of combinations of information can be modelled.

The second information aggregation strategy is operationalized by ordered combinations of the evidence or judgment arising from the information sources. Here, in computer and more broadly, the physical sciences, flexible non-linear operators such as linear order-statistics have been applied with great success in applications from cyber-security to data-fusion [Murray et al., 2021, Miller et al., 2016]. For example, the Order Weighted Average (OWA) [Yager, 1988] retains much of the simplicity and interpretability of the LWA but allows for non-linear combinations of evidence. To achieve this, the OWA focuses not on the notion of worth arising from each source, but instead on the worth associated with the actual information, i.e. the evidence arising from each given source at a given point in time.

To illustrate how these two strategies apply in a single context, consider the problem of estimating how much product to produce, using judgments from 3 experts. First, we may have some prior knowledge of the relative performance of each expert such that we assign weights {0.60, 0.30, 0.10} to experts 1, 2 and 3 respectively. Table 1 displays the linear aggregate of a set of hypothetical judgments. Second, we may have a desire to avoid underproduction of our product such that we assign weights {0.05, 0.50, 0.45} to the lowest, middle, and highest judgment produced by the experts. These weights result in an aggregate that is mainly composed of the middle and highest judgment, putting very little weight on the lowest judgment. Table 1 displays the ordered aggregate of the same set of hypothetical judgments. The two strategies result in different averages with the linear weights focusing on our relative trust in the experts and the ordered weights focusing on our concerns about data context and the given evidence itself.

From the above, it is clear that beyond the linear and non-linear characteristics of the LWA and OWA, both operators operationalize fundamentally different foci, with the LWA focusing on the worth of the information sources, and the OWA focusing on the worth of the evidence or judgment contributed by the sources. Recent literature has explored the combination of both approaches by specifying a weighted combination of both operators Torra and Lv [2009], Merigó [2009], Xu and Da [2003], Cardin and Giove [2021], thus affording outputs that are influenced to a pre-determined degree by the behavior of each. While this strategy of combination provides a new set of aggregation operations, it is based on the partial consideration of each operator (i.e. each is used to a different degree), rather than the joint consideration of both foci, as outlined above.

Expressed differently, in cases where we have prior information on the worth of sources which is independent from the specific context of judgment, for example, based on ‘years-of-service’ in our example above; as well as a mutually independent assessment of worth associated with the contributed evidence, such as provided through the ‘avoid underproduction’ strategy above, then our aggregation strategy should integrate both of these independent bits of information into one strategy, where one aspect may reinforce another, rather than only applying each to a certain degree.

Revisiting the example of expert judgment’s for product production, we can imagine how an integrated strategy would favour experienced experts who produce higher estimates than their peers. We can illustrate the difference between a weighted combination of both operators and the joint consideration of both aggregation strategies. Based on the results in Table 1, any weighted aggregation of LWA and OWA would by definition generate an estimate in the range [66, 70], i.e. between the minimum and maximum inputs, produced in our case by the OWA and LWA respectively. Applying more weight to LWA will lead to a higher estimate, whereas applying more weight to OWA would lead to a lower estimate. This result is slightly counter-intuitive to the goals of the two strategies, as the OWA (focused on down-weighting the lowest judgment) leads to a lower aggregate. This result is due to the fact that the highest judgment is coming from expert 1, who is (by chance) given more weight in the LWA relative to the weight assigned to the highest judgment in the OWA. A joint consideration of the two operator’s weights can resolve this problem. Given that the lowest judgment (which has the lowest priority in the OWA) is generated by expert 3 (who has the lowest priority in the LWA), a joint focus on both strategies would put less weight on expert 3’s judgment than either strategy alone. This logic results in the combined set of weights displayed in Table 1, with an even larger aggregated estimate than would result from the LWA, OWA, and any weighted combination of them.

In this paper, we will introduce a mathematical framework for creating joint weights and exploring the nature of a joint focus on source-and-evidence specific worth of information. In Section II we provide background that discusses the assumptions and properties of information sources and evidence that would lead a decision maker to apply LWA, OWA, or some combination of the two for their information aggregation needs. Next, we introduce the technical details of compositional geometry [Aitchison, 1982, Smithson and Broomell, 2022], which is the mathematical framework used to generate joint weights that create aggregates based on both source and evidence. Finally we review prior work that has proposed weighted averages of the LWA with the OWA. In Section III we introduce our new aggregation operator the Joint Weighted Average (JWA) and provide illustrative examples of applying this aggregator to data. Finally, Section IV provides a discussion and conclusion.

2 Background

2.1 Theoretical context

When discussing the aggregation of information, it is valuable to take a step back and consider the origins of information relevant in such a process. As discussed in the introduction, all aggregation problems share the presence of multiple sources, with each source contributing evidence for a given context. Context here refers to a complex construct; capturing both stationary aspects, such as the domain (e.g. the production decision space in the previous example), but also non-stationary aspects such as the given time and specific instance where evidence is provided, e.g. during a recession, a natural disaster, or a period of sustained growth.

The theories of information aggregation discussed in Section 1 define the weight of evidence arising from a number of sources within a given context through two main concepts:

The quality of sources, i.e. capturing the expected quality, experience, suitability, etc. of the given source in contributing evidence within the given context. 2. 2.

The quality of evidence, i.e. capturing the perceived quality, ‘reasonableness’, parsimony, etc. of the given evidence from the given source contributed within the context.

The difference between the source and evidence quality is worth articulating. Where researchers can assume that the quality of evidence generated by sources is stationary (i.e., constant even with regard to any changes in context), they simply need to define the quality of the sources to understand the quality of the evidence they generate.

However, other applications do not allow researchers to assume evidence generated by sources is stationary, as there may exist relevant contexts where sources perform differently, and the quality of the evidence from each of the sources can change due to these changes in circumstances. In such an application researchers traditionally have focused, often implicitly, on understanding the quality of the evidence itself, without relying on the quality of the sources.

Thus, in practice, the difference between the quality of sources and the evidence they contribute is often lost, at times for good reason. An example of this is sensor calibration where fairly economical temperature sensors are calibrated post-manufacture or post-installation against a known ground truth, providing effective calibration weights to convert the raw sensor output to an accurate measurement of temperature. As the sensor has been installed in a given setting, we assume the context of each measurement to be stationary, thus we do not differentiate between the quality of the measurement vis-a-vis the quality of the sensor itself.

While the above is intuitive, it is similarly clear why the quality of the source and the quality of the evidence contributed by the source may not be the same–making it crucial to differentiate them–when context is non-stationary. For example, as humans we attribute a level of trust and confidence to individuals around us, whether they are family members, professionals, or politicians. When considering their views, we implicitly use this notion of source quality to evaluate the usefulness of advice obtained from different individuals. For example, we seek out and trust the diagnoses provided by medical professionals. However, we may discount the recommendation if we know they own a stake in the company that manufactures the drug they are prescribing. In other words, for this specific decision context, we evaluate the quality of the information provided by the source differently as we would generally do.

Indeed, this example is identical to the general practice for experts and professionals to disclose conflicts of interest so that we may discount their judgments when conflicts arise–or they may recuse themselves from providing judgment for a particular instance–without changing our evaluation of the overall quality of their expertise, knowledge, or judgments.

In summary, when engaging in aggregation or studying how prior aggregation was conducted, it is important to consider both the quality of sources and the quality of information arising from said sources for the given decision context. Often, the former is more stable than the latter, and different approaches exist to determine either. Crucially, when we know such mappings between quality and context a priori, we apply them implicitly and without a second thought. We naturally rely on our sense of touch rather than our vision–normally superior–in the dark.

However, prior work on information aggregation does not jointly reflect both the quality of source, and the quality of evidence. We therefore seek to fill this gap in the literature by introducing a new aggregation operator that makes explicit use of a priori beliefs about both the quality of sources and the quality of evidence generated by these sources.

2.2 The Linear Weighted Average

The linear weighted average applies source weights to integrate a measure of the worth of each source into the evidence it provides. Let a Linear Weighted Average (LWA) be defined by applying linear weights ( $w_{i}$ ) to the evidence provided by each information source ( $x_{i}$ ) given by,

[TABLE]

The weights are convex, i.e. constrained to be non-negative and sum to one.

2.3 The Ordered Weighted Average

The ordered weighted average [Yager, 1988] applies ordered weights to describe a measure of the worth of the evidence. Let an Order Weighted Average (OWA) be defined by applying ordered weights ( $v_{i}$ ) to information transformed by a permutation function $\pi()$ , such that $x_{\pi(1)}\geq...\geq x_{\pi(n)}$ .

[TABLE]

The weights are convex, i.e. constrained to be non-negative and sum to one.

2.4 Axiomatic Basis and prior Work Combining the LWA and OWA

As discussed above, there are many reasons and application domains where an aggregation operator that considers both source and order weights is desirable, from strategic decision making in business, to sensor fusion in autonomous vehicles. Several authors have proposed methods for combining the LWA and OWA. As outlined by Cardin and Giove [2021], the LWA and OWA satisfy very general mathematical desiderata for an aggregator $F(x)$ , including333numbering adopted from Cardin and Giove [2021]:

B1.

Compensativeness: $min(x)<F(x)<max(x)$

B2.

Monotonicity: if $x>y$ then $F(x)>F(y)$

B3.

Idempotency: $F(x,x,x,x)=x$

Cardin and Giove [2021] argue that any aggregation function combining LWA and OWA should also satisfy these properties. However, the combined operator adds additional complexity, raising questions about how it should behave relative to the individual LWA and OWA. In considering these issues, Cardin and Giove [2021] also argue that any aggregating function $F(x)$ which mixes linear and ordered weights should satisfy another set of desiderata:

A1.

Internal Boundness: $min(OWA,LWA)<F(x)<max(OWA,LWA)$

A2.

Coherence: if $OWA=LWA=K$ then $F(x)=k$

A3.

Collapsing: if $w_{1}=\ldots=w_{n}$ then $F(x)=OWA$ , if $v_{1}=\ldots=v_{n}$ then $F(x)=LWA$

The derivation of these additional desiderata (A1 - A3) is not clearly specified by Cardin and Giove [2021]. In the next section we derive some of these desiderata, but we find that internal boundedness (A1) and coherence (A2) preclude an aggregator that allows the joint consideration of both LWA and OWA. We therefore treat A1 - A3 as tentative desiderata, as they provide a useful set of properties through which to understand the similarities and differences of the various proposed aggregators that combine LWA with OWA. Generally, we argue that not all of them may apply.

A simple method for mixing linear and ordered weights is to generate a weighted combination of the two aggregators. This is the OWAWA operator introduced by Merigó [2009]:

[TABLE]

The parameter $\alpha\in[0,1]$ defines the relative contribution of each set of weights. Cardin and Giove [2021] show that this satisfies B1, B2, B3, and A1, A2, but fails to satisfy A3.

In order to satisfy collapsing (A3), Cardin and Giove [2021] introduced the Standard Deviation OWA (SDOWA) to combine linear and ordered weights:

[TABLE]

where

[TABLE]

The authors show that this operator satisfies B1, B2, B3, and A1, A2, A3.

We argue that there are two primary disadvantages to the SDOWA operator. The first is that because the relative contribution of the LWA and OWA is completely defined by the variance of their weights, it is both difficult to control and understand the relative contribution of the LWA and OWA to the final aggregate. The second disadvantage is exemplified in Table 1. It highlights that averages of LWA and OWA are different from an aggregator that jointly considers the linear and order weights. In fact, we show in Section 3 that satisfying A1 precludes the ability to jointly consider the implication of the LWA and OWA weights. This is because a joint consideration of the weights results in instances where the linear and order weights interact, such as when a highly weighted source has also provided the evidence with the highest order weight, resulting in an aggregate that is higher than either the LWA or the OWA.

In fact, both the OWAWA and SDOWA aggregator functions satisfy A1. As such, these operators are insensitive to interactions between the linear and order weights and can only provide an average consideration of the LWA and OWA. We propose that understanding the linear and order weights as compositions in the statistical sense allows for joint consideration of both the source and evidence as well as a better understanding the behavior of the operator combining them–as outlined over the next sections.

2.5 Compositional Geometry

Our proposed approach to blend a priori beliefs about the quality of sources and the quality of evidence is operationalized by blending the weights assigned to sources (via linear weights) with the weights assigned to the evidence (via order weights) into interpretable joint weights. To achieve this goal, we leverage the fact that non-negative weights that sum to one are compositional, and utilize compositional geometry to combine them.

A composition is formally defined as a collection of components that are constrained to sum to a constant for individual cases (See Aitchison [1982], Van den Boogaart and Tolosana-Delgado [2013], Smithson and Broomell [2022] for tutorials). Proportions and probabilities are examples of compositions which are constrained to sum to 1. For example, Smithson and Broomell [2022] describe how the proportion of the day spent working, relaxing, and exercising (assuming no other activities) could be defined as a composition. If more time is spent working on a given day, then there must be less time spent on relaxing and/or exercising. Therefore, components of a composition cannot be analyzed separately, because the value of one component affects the value of the others.

Compositions have their own geometry defined by a mathematical structure called a simplex. Within the simplex, a composition is defined by each component part. For example, a component with 3 parts is defined by the set $\{a,b,c\}$ such that

[TABLE]

Constraining $d$ to equal $1$ generates a standardized simplex that elegantly reflects compositions made up of proportions and probabilities.

Because each component is dependent on the value of the remaining components, the operation of addition is slightly different in compositional geometry. Compositional addition is known as perturbation, and defines how the relative values of each component will change when added to another component. Given composition $\mathbf{x}=\{x_{1},x_{2},x_{3}\}$ and $\mathbf{y}=\{y_{1},y_{2},y_{3}\}$ , we define addition in compositional geometry using the symbol $\oplus$ as,

[TABLE]

In compositional algebra, the additive identity is the uniform composition where $x_{k}=1/K$ for all $k$ .

We apply compositional geometry and algebra as a mathematical representation for the linear and order based weights. The weights assigned to sources and orders are relative, meaning that the individual weights represent a part of a whole. We can use (6) as an operator to blend two sets of weights that are constrained to be non-negative and sum to one, so that the joint weights have several desirable mathematical properties:

(1) scale invariance: results do not depend on the constant sum.

(2) permutation invariance: results do not depend on the order of the parts.

(3) subcompositional coherence: results of subcompositions (any subset of component parts) do not differ from the their results when considering the full composition.

We therefore propose to add the linear weights for each source to the order weights assigned to the evidence generated from the source within the compositional representation, resulting in a perturbation of the source and order weights. The new set of joint weights can be interpreted as a mutual perturbation between the source and order weights. In general, the perturbation of two quantities with different units produces a result with a joint unit. For example, Van den Boogaart and Tolosana-Delgado [2013] (p. 18) outline how a composition of percentages of grams of nutrients for a food item can be transformed to units of energy by adding this composition to a composition that represents the relative kJ/g of each nutrient. Therefore, our joint weights are in the joint unit of source quality times evidence order.

3 The Joint Weighted Average

In this section we introduce the Joint Weighted Average (JWA) as a new aggregation operator that allows for the joint representation of independently defined linear and order weights in computed averages. We therefore propose a new type of blend of LWA and OWA with mathematical properties that differs from the prior literature attempting to combine these two weighting approaches.

Given $n$ sources, let $\mathbf{w}=\{w_{1},w_{2},\ldots w_{i},\ldots w_{n}\}$ be a set of convex source weights and let $\mathbf{v}=\{v_{1},v_{2},\ldots v_{i},\ldots v_{n}\}$ be a set of convex order weights. Both $\mathbf{v}$ and $\mathbf{w}$ are compositions because their components are non-negative and sum to 1. Let $\mathbf{x}=\{x_{1},x_{2},\ldots x_{i},\ldots x_{n}\}$ be the evidence generated by the $n$ sources. We reorder the evidence and the linear weights by the same permutation function $\pi()$ , such that $x_{\pi(1)}\geq x_{\pi(2)}\geq...\geq x_{\pi(n)}$ . Finally, the ordered linear weights are perturbed by the order weights (using the compositional $\oplus$ operator), and these weights are applied to the ordered evidence to generate a weighted average given by

[TABLE]

The operations inside the parentheses represent compositional operations. We then take the inner product of the resulting composition with the data in $\mathbf{x}_{\pi}$ to calculate the weighted average.

The mathematical properties of this operator can be derived from two basic facts: (1) the vector inside the parentheses is non-negative and sums to one, making the JWA have the same properties as a weighted average and (2) that these weights are derived using well-defined compositional geometry. First, because this operator can be represented as a weighted combination of the data $\mathbf{x}_{\pi}$ , it automatically satisfies properties B1 - B3 described in Section II D. Second, within compositional geometry, the additive identity is the composition with uniform weight on each component. Therefore, if $\mathbf{w}_{\pi}=1/n$ for all $n$ then $\mathbf{w}_{\pi}\oplus\mathbf{v}=\mathbf{v}$ and JWA(x) = OWA(x). If $\mathbf{v}=1/n$ for all $n$ then $\mathbf{w}_{\pi}\oplus\mathbf{v}=\mathbf{w}_{\pi}$ and JWA(x) = LWA(x). The JWA therefore also satisfies A3.

The JWA operator diverges from prior operators because it does not satisfy properties A1 and A2. We argue that this is because these two properties preclude an aggregate that allows the LWA and OWA weights to interact. For property A1, the results in Table 1 provide a demonstration that the JWA is not bounded by the OWA and LWA, and can in fact be less than the $min(OWA,LWA)$ and greater than the $max(OWA,LWA)$ . For property A2, the result of $OWA(x)=LWA(x)=k$ can be achieved by having $\mathbf{w}_{\pi}=\mathbf{v}$ , meaning the linear and order weight assigned to each element of $\mathbf{x}_{\pi}$ is the same for all $n$ . However, within compositional geometry, if $\mathbf{w}_{\pi}=\mathbf{v}\neq\{1/n,\ldots,1/n\}$ then $\mathbf{w}_{\pi}\oplus\mathbf{v}\neq\mathbf{v}$ and $\mathbf{v}\oplus\mathbf{v}\neq\mathbf{v}$ . In other words, the combined weights resulting from the perturbation operation will differ from the original weights, resulting in aggregates where $OWA(x)=LWA(x)\neq JWA(x)$ . This is because evidence receiving weight greater than $1/n$ from both the linear and order weights will receive even more weight through perturbation, and conversely, evidence receiving weight less than $1/n$ from both of the linear and order weights will receive less weight through perturbation. This result is derived from the basic properties of perturbation. The violations of A1 and A2 derived here are due to the JWA allowing the LWA and OWA weights to interact, resulting in evidence receiving greater (lesser) weight from the JWA relative to the LWA and OWA if both of the linear and order weights assign weight greater (lesser) than $1/n$ to this evidence.

4 Experiments

We provide a demonstration of the effectiveness of the JWA operator using simulation. Assume we consult $k$ sources over $n$ trials resulting in a $k$ x $n$ matrix $X$ that represents the evidence from our sources. For each trial, the aggregate of the source evidence provides a prediction $\hat{y}$ of the criterion variable $y$ .

To test how different aggregators perform, we simulate evidence from a collection of sources, and we also simulated the criterion that we are trying to predict. This way we can directly control the statistical properties of the sources, and their evidence, for predicting $y$ . We use a very simple model, where the data for $X$ and $y$ are simulated from a joint normal distribution. For a normal distribution, we can set the mean, variance, and covariance of each variable. For simplicity, we set all means and variances equal to the constant 10, so that the worth of the sources is defined by the covariance structure. We set the covariance between sources ( $\sigma_{x,x^{\prime}}$ ) to the constant 2. We want the sources to have some predictive accuracy, so this is modelled by the degree to which the evidence, $X$ , co-varies with the criterion, $y$ . We adopt the term validity from social sciences here to describe the correlation between our observable evidence and the criterion.

The LWA and OWA have weights that reflect the worth of the sources and the evidence in our simulation. Specifically, the worth of the sources is defined in our simulation by changing the relative validity of each source (we hold all other source attributes constant). In other words, the LWA weights directly reflect the relative validity of each source.

To simulate a clear OWA strategy, our simulation injects changes to the interpretation of the evidence by randomly adding a positive bias (or error) $\delta$ to the evidence generated by two of the sources. We randomly add this bias to 50% of the trials. Since this bias is not a property of any specific source, we randomly select which two sources receive this bias independently for each trial. Therefore, our data context suggests that we should not rely heavily on evidence that is too high as chances are good that this evidence is not a reliable indicator of the criterion $y$ . We therefore set the OWA weights to account for this data context by putting zero weight on the highest two data points with the remaining weights set to $1/(k-2)$ . We acknowledge that both LWA and OWA use prior information in this simulation. However, the aim of the experiment is not to compare the operators in absolute terms, but to illustrate their individual behavior, and above all, the behavior of their combinations, here OWAWA and JWA.

Each aggregation operator is applied to the source data $X$ to generate an aggregate that represents a prediction $\hat{y}$ of $y$ . We measure performance as the mean squared error (MSE) between $\hat{y}$ and $y$ across the $n$ trials. We vary the validity and bias parameters as shown in Figure 1 and Table 2. We report the mean MSE across 500 replications for each parameter combination listed in Table 2.

In our results we show the performance of 4 different aggregation operators: (a) LWA( $X$ ), (b) OWA( $X$ ), (c) JWA( $X$ ), and (d) OWAWA( $X$ ) (with $\alpha=0.5$ ). We simulate $X$ using the different sets of validity values for the sources displayed in Table 2. For each set of validity values, the LWA weights are equal to these validity values divided by their sum (so they sum to one). Therefore, the LWA operator weights sources by their validity, equally weighting them when the validity of each source is the same, and thus increasing weight on the higher validity sources as the data sets go from 1 (top row of Table 2) to 7 (bottom row in the top half of Table 2), and the variance of the weights increases.

Figure 1 displays plots of the MSE of each aggregator by the validity sets in Table 2. Each panel reflects a different level of bias as defined in Table 2. From left to right panel, the bias increases from a small value (within 1 standard deviation of the data in $X$ ) to larger values that can more easily stand out relative to the unbiased sources. We can see that the LWA tends to increase performance as the difference between source validity values increase, but only when the bias term $\delta$ is not large. When $\delta$ is large, LWA performs worse as the validity sets increase. This is because the LWA effectively focuses on fewer valid sources when the sources differ more, and these few valid sources can easily become completely unreliable when they are biased by a large $\delta$ . The OWA is completely unaffected by the changes in both the validity sets and the bias size. This reflects how this aggregator is designed, equally weighting all pieces of evidence, except disregarding the largest two.

The behavior of the JWA aggregate capitalizes on merging the LWA and OWA strategies, showing improved performance as the validity sets increase, yet remaining robust to the size of the bias that alters the data context. Indeed, JWA is the best performing aggregator in the right hand panel across all validity sets. The OWAWA aggregator (which simply averages the output of LWA and OWA) outperforms JWA in the left panel, performs similarly to JWA in the middle panel, but performs far worse than JWA in the right hand panel. The left hand panel is a very friendly environment for the LWA, thus benefiting the OWAWA. However, the right hand panel is where the LWA’s performance deteriorates due to the harsh data context. Interestingly, the JWA and OWA do not change substantially across panels because we selected order weights that would be robust to this specific change in data context. While the LWA performance deteriorates, the JWA does not because it relies on the interaction between the LWA and OWA strategies to capitalize on information about source validity while still remaining robust to the random injection of bias. Indeed, the JWA shows subtly increased performance as bias increases, as this facilitates the identification and exclusion of the compromised sources, while benefiting from the accurate weights for the sources remaining.

5 Discussion

We have presented a theoretical framework built on the distinction between the quality of sources and the quality of the evidence they produce. We have shown that prior work on information aggregation does not jointly account for both the quality of source and the quality of evidence. Through the examples in Table 1 and Figure 1, we highlight the importance of simultaneously considering both the quality of sources and the quality of information arising from said sources, showing how these can be independently defined and leveraged to improve information aggregation and the handling of inter-source uncertainty. We introduce the Joint Weighted Average (JWA) as the first aggregator capable of jointly and systematically focusing on these two features that define the quality of evidence.

The JWA jointly focuses on source-and-evidence specific worth by directly combining the linear and ordered weights using compositional geometry [Aitchison, 1982, Smithson and Broomell, 2022, Pekaslan and Wagner, 2021]. This approach allows researchers more transparency and a strong mathematical framework to understand the JWA’s properties and to augment the JWA for specific applications. Viewing the weights from LWA and OWA as compositions also naturally leads to better control and understanding of the combined weights allowing more flexibility than previously proposed approaches.

This approach reveals how previous work attempting to combine LWA and OWA have suggested that a combined opperator must adhere to a set of mathematical properties that precludes this joint focus. Specifically, Cardin and Giove [2021] propose that the output of aggregators that combine LWA and OWA must be bound between the outputs of LWA and OWA. We show that this property does not allow the strategies of LWA and OWA to be fully merged, and the cost of this property is clearly shown in the right hand panel of Figure 1. Averaging the outputs of LWA and OWA leads to poor performance if one of the two aggregates faces an environment that degrades their approach, where the JWA can withstand such environments by blending their strengths more effectively.

6 Conclusions and Future Work

The JWA opens the door to many applications. First, we plan to explore how this aggregator might be fit to data to estimate the relative contribution of linear and ordered weights in unknown systems that generate aggregates, such as human judgment. As such, this aggregator has the potential to describe psychological processes in a novel way, as well as providing a tool that could automate future aggregations made in this way.

Second, the linear and order weights can be elicited from human users or experts to generate an automated information aggregation system with well known mathematical properties. In the above examples, we provided weights for the sources and the evidence that reflected priorities for aggregates. As such the JWA can be used to define what information would need to be elicited as well as the implications for what aggregates these two sets of weights would produce, to see if human users agree that the resulting aggregates indeed reflect the priorities given.

Finally, combining linear and order weights plays an important role in XAI by providing parameters that allow people to understand the role of the source vs. the role of evidence in aggregation and act upon it. More broadly, at the level of AI and machine learning, where LWA approaches are used pervasively, all the way to the heart of neural networks; broadening the perspective and research to explore joint source and evidence weights holds substantial promise.

Overall, JWA represents a single, simple, interpretable instance of a potentially broad class of weighting strategies that combine individual weighting schemes based on the quality of sources and the quality of evidence. There are further possibilities in how these weights might be combined, including possibilities that are not restricted to compositional geometry. While the properties of the JWA are well defined through compositional geometry, there is still the possibility that other well defined approaches could generate novel and interesting results as well.

Bibliography25

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Aitchison [1982] J. Aitchison. The statistical analysis of compositional data. Journal of the Royal Statistical Society. Series B (Methodological) , 44(2):139–177, 1982. ISSN 00359246. URL http://www.jstor.org/stable/2345821 .
2Budescu and Chen [2015] David V Budescu and Eva Chen. Identifying expertise to extract the wisdom of crowds. Management science , 61(2):267–280, 2015.
3Cardin and Giove [2021] Marta Cardin and Silvio Giove. SDOWA: A New OWA Operator for Decision Making . Springer Singapore, Singapore, 2021. ISBN 978-981-15-5093-5. 10.1007/978-981-15-5093-5_28 . URL https://doi.org/10.1007/978-981-15-5093-5_28 . · doi ↗
4Chen et al. [2016] Eva Chen, David V Budescu, Shrinidhi K Lakshmikanth, Barbara A Mellers, and Philip E Tetlock. Validating the contribution-weighted model: Robustness and cost-benefit analyses. Decision Analysis , 13(2):128–152, 2016.
5Clemen and Winkler [1993] Robert T Clemen and Robert L Winkler. Aggregating point estimates: A flexible modeling approach. Management Science , 39(4):501–515, 1993.
6Davis-Stober et al. [2014] Clintin P Davis-Stober, David V Budescu, Jason Dana, and Stephen B Broomell. When is a crowd wise? Decision , 1(2):79, 2014.
7Davis-Stober et al. [2015] Clintin P Davis-Stober, David V Budescu, Stephen B Broomell, and Jason Dana. The composition of optimally wise crowds. Decision Analysis , 12(3):130–143, 2015.
8Dawes [1979] Robyn M Dawes. The robust beauty of improper linear models in decision making. American psychologist , 34(7):571, 1979.