Beyond $M_{t\bar{t}}$: learning to search for a broad $t\bar t$   resonance at the LHC

Sunghoon Jung; Dongsub Lee; Ke-Pan Xie

arXiv:1906.02810·hep-ph·February 12, 2020

Beyond $M_{t\bar{t}}$: learning to search for a broad $t\bar t$ resonance at the LHC

Sunghoon Jung, Dongsub Lee, Ke-Pan Xie

PDF

TL;DR

This paper employs machine learning, specifically deep neural networks, to enhance the detection of broad $tar{t}$ resonances at the LHC, utilizing information beyond the invariant mass spectrum to improve sensitivity regardless of resonance width.

Contribution

It introduces a novel machine learning approach that combines multiple kinematic and angular variables to improve broad resonance searches at colliders, surpassing traditional methods.

Findings

01

Deep neural networks improve sensitivity to broad $tar{t}$ resonances.

02

Additional variables like angular correlations and jet mass are crucial.

03

Sensitivity gains are robust against resonance width variations.

Abstract

A resonance peak in the invariant mass spectrum has been the main feature of a particle at collider experiments. However, broad resonances not exhibiting such a sharp peak are generically predicted in new physics models beyond the Standard Model. Without a peak, how do we discover a broad resonance at colliders? We use machine learning technique to explore answers beyond common knowledge. We learn that, by applying deep neural network to the case of a $t \overset{ˉ}{t}$ resonance, the invariant mass $M_{t \overset{ˉ}{t}}$ is still useful, but additional information from off-resonance region, angular correlations, $p_{T}$ , and top jet mass are also significantly important. As a result, the improved LHC sensitivities do not depend strongly on the width. The results may also imply that the additional information can be used to improve narrow-resonance searches too. Further, we also detail how we assess…

Tables5

Table 1. Table 1: The cut flows of the signals and background in resolved region. The events are generated at 1 ℓ ± + E T + jets 1 superscript ℓ plus-or-minus subscript E 𝑇 jets 1\ell^{\pm}+\not{\!\!{\rm E}}_{T}+{\rm jets} final state, where ℓ ℓ \ell denotes e 𝑒 e and μ 𝜇 \mu . M i Γ j 𝑖 Γ 𝑗 i\Gamma j denotes the benchmark case with M ρ = i subscript 𝑀 𝜌 𝑖 M_{\rho}=i TeV and Γ ρ / M ρ = 0.1 × j subscript Γ 𝜌 subscript 𝑀 𝜌 0.1 𝑗 \Gamma_{\rho}/M_{\rho}=0.1\times j .

Process	Event number	Cut 1	Cut 2	Cut 3	Efficiency
M1 $Γ$ 1	$5.00 \times 10^{6}$	$3.32 \times 10^{6}$	$3.02 \times 10^{6}$	$1.81 \times 10^{6}$	36.3%
M1 $Γ$ 2	$5.00 \times 10^{6}$	$3.29 \times 10^{6}$	$2.98 \times 10^{6}$	$1.79 \times 10^{6}$	35.8%
M1 $Γ$ 3	$3.85 \times 10^{6}$	$2.52 \times 10^{6}$	$2.23 \times 10^{6}$	$1.36 \times 10^{6}$	35.3%
M1 $Γ$ 4	$5.00 \times 10^{6}$	$3.25 \times 10^{6}$	$2.93 \times 10^{6}$	$1.75 \times 10^{6}$	34.9%
SM $t \bar{t}$	$4.98 \times 10^{6}$	$2.60 \times 10^{6}$	$2.21 \times 10^{6}$	$1.39 \times 10^{6}$	28.0%

Table 2. Table 2: The cut flows of the signals and background in boosted region. The events are generated at 1 ℓ ± + E T + jets 1 superscript ℓ plus-or-minus subscript E 𝑇 jets 1\ell^{\pm}+\not{\!\!{\rm E}}_{T}+{\rm jets} final state, where ℓ ℓ \ell denotes e 𝑒 e and μ 𝜇 \mu . M i Γ j 𝑖 Γ 𝑗 i\Gamma j denotes the benchmark case with M ρ = i subscript 𝑀 𝜌 𝑖 M_{\rho}=i TeV and Γ ρ / M ρ = 0.1 × j subscript Γ 𝜌 subscript 𝑀 𝜌 0.1 𝑗 \Gamma_{\rho}/M_{\rho}=0.1\times j . The setup xptj = 150 is to improve the event generating efficiency of the background, see the text for details.

Process	Event number	Cut 1	Cut 2	Cut 3	Cut 4	Efficiency
M1 $Γ$ 1	$5.00 \times 10^{6}$	$3.32 \times 10^{6}$	$3.02 \times 10^{6}$	$1.17 \times 10^{6}$	$9.61 \times 10^{5}$	19.2%
M1 $Γ$ 2	$5.00 \times 10^{6}$	$3.29 \times 10^{6}$	$2.98 \times 10^{6}$	$1.08 \times 10^{6}$	$8.85 \times 10^{5}$	17.7%
M1 $Γ$ 3	$5.00 \times 10^{6}$	$3.27 \times 10^{6}$	$2.96 \times 10^{6}$	$1.02 \times 10^{6}$	$8.34 \times 10^{5}$	16.7%
M1 $Γ$ 4	$5.05 \times 10^{6}$	$3.28 \times 10^{6}$	$2.96 \times 10^{6}$	$9.92 \times 10^{5}$	$8.06 \times 10^{5}$	15.9%
M5 $Γ$ 1	$5.00 \times 10^{6}$	$2.53 \times 10^{6}$	$2.36 \times 10^{6}$	$1.15 \times 10^{6}$	$8.41 \times 10^{5}$	16.8%
M5 $Γ$ 2	$5.00 \times 10^{6}$	$2.72 \times 10^{6}$	$2.52 \times 10^{6}$	$1.19 \times 10^{6}$	$8.76 \times 10^{5}$	17.5%
M5 $Γ$ 3	$5.00 \times 10^{6}$	$2.81 \times 10^{6}$	$2.59 \times 10^{6}$	$1.19 \times 10^{6}$	$8.85 \times 10^{5}$	17.7%
M5 $Γ$ 4	$5.00 \times 10^{6}$	$2.86 \times 10^{6}$	$2.64 \times 10^{6}$	$1.20 \times 10^{6}$	$8.90 \times 10^{5}$	17.8%
SM $t \bar{t}$ (xptj = 150)	$1.99 \times 10^{7}$	$1.22 \times 10^{7}$	$1.08 \times 10^{7}$	$1.41 \times 10^{6}$	$1.21 \times 10^{6}$	6.10%

Table 3. Table 3: The selected networks for M ρ = 1 subscript 𝑀 𝜌 1 M_{\rho}=1 TeV. N epoch subscript 𝑁 epoch N_{\rm epoch} is the epoch number when we cut the training.

Benchmark case	Kinematic region	Observables	$N_{hidden}$ , $N_{node}$ , $L_{r}$ , $D_{r}$ , $B_{s}$ , $N_{epoch}$	Classification accuracy
M1 $Γ$ 1	resolved	low-level	5, 200, 0.001, 0.2, $10^{3}$ , 150	85.2%
	resolved	all	5, 200, 0.001, 0.2, $10^{3}$ , 100	85.1%
	boosted	low-level	5, 200, 0.001, 0.2, $10^{4}$ , 55	67.9%
	boosted	all	4, 300, 0.001, 0.3, $10^{3}$ , 40	70.1%
M1 $Γ$ 2	resolved	low-level	4, 300, 0.003, 0.2, $10^{3}$ , 35	83.2%
	resolved	all	5, 200, 0.001, 0.2, $10^{3}$ , 60	83.2%
	boosted	low-level	5, 200, 0.001, 0.2, $10^{4}$ , 45	65.8%
	boosted	all	4, 300, 0.003, 0.3, $10^{4}$ , 40	68.2%
M1 $Γ$ 3	resolved	low-level	4, 300, 0.001, 0.2, $10^{3}$ , 30	81.6%
	resolved	all	5, 200, 0.001, 0.2, $10^{3}$ , 40	81.6%
	boosted	low-level	4, 300, 0.003, 0.2, $10^{4}$ , 30	65.1%
	boosted	all	4, 300, 0.003, 0.3, $10^{4}$ , 40	67.0%
M1 $Γ$ 4	resolved	low-level	5, 200, 0.001, 0.2, $10^{3}$ , 80	80.8%
	resolved	all	5, 200, 0.001, 0.2, $10^{3}$ , 40	80.6%
	boosted	low-level	4, 300, 0.001, 0.2, $10^{4}$ , 20	64.3%
	boosted	all	4, 300, 0.001, 0.3, $10^{3}$ , 40	66.7%

Table 4. Table 4: The selected networks for M ρ = 5 subscript 𝑀 𝜌 5 M_{\rho}=5 TeV. N epoch subscript 𝑁 epoch N_{\rm epoch} is the epoch number when we cut the training.

Benchmark case	Kinematic region	Observables	$N_{hidden}$ , $N_{node}$ , $L_{r}$ , $D_{r}$ , $B_{s}$ , $N_{epoch}$	Classification accuracy
M5 $Γ$ 1	boosted	low-level	4, 300, 0.001, 0.2, $10^{3}$ , 20	79.5%
M5 $Γ$ 1	boosted	all	5, 200, 0.001, 0.1, $10^{4}$ , 30	80.5%
M5 $Γ$ 2	boosted	low-level	4, 300, 0.003, 0.2, $10^{3}$ , 40	78.2%
M5 $Γ$ 2	boosted	all	5, 200, 0.001, 0.1, $10^{4}$ , 45	79.1%
M5 $Γ$ 3	boosted	low-level	4, 300, 0.003, 0.2, $10^{3}$ , 30	77.4%
M5 $Γ$ 3	boosted	all	4, 300, 0.003, 0.3, $10^{4}$ , 40	78.4%
M5 $Γ$ 4	boosted	low-level	4, 300, 0.003, 0.2, $10^{3}$ , 45	76.8%
M5 $Γ$ 4	boosted	all	5, 200, 0.001, 0.3, $10^{4}$ , 45	77.8%

Table 5. Table 5: The accuracy reach of the chosen neural networks before and after planing away M t t ¯ subscript 𝑀 𝑡 ¯ 𝑡 M_{t\bar{t}} . The configurations of the DNN’s are listed in Tables 3 and 4 .

Models	M1 $Γ$ 1		M1 $Γ$ 2		M1 $Γ$ 3		M1 $Γ$ 4		M5 $Γ$ 1	M5 $Γ$ 2	M5 $Γ$ 3	M5 $Γ$ 4
Kinematic region	resolved	boosted	resolved	boosted	resolved	boosted	resolved	boosted	boosted	boosted	boosted	boosted
Low-level input	85.2%	67.9%	83.2%	65.8%	81.6%	65.1%	80.8%	64.3%	79.5%	78.2%	77.4%	76.8%
Planing away $M_{t \bar{t}}$	76.8%	63.7%	75.3%	62.7%	74.1%	62.1%	73.0%	62.3%	65.3%	65.8%	63.7%	64.1%

Equations28

L =

L =

\frac{Γ _{ρ}}{M _{ρ}} \approx \frac{Γ _{ρ \to t \overset{ˉ}{t}}}{M _{ρ}} \approx \frac{g _{ρ}^{2}}{8 π} .

\frac{Γ _{ρ}}{M _{ρ}} \approx \frac{Γ _{ρ \to t \overset{ˉ}{t}}}{M _{ρ}} \approx \frac{g _{ρ}^{2}}{8 π} .

pp \to ρ \to t \overset{ˉ}{t} \to ℓ^{\pm} ν b \overset{ˉ}{b} j j .

pp \to ρ \to t \overset{ˉ}{t} \to ℓ^{\pm} ν b \overset{ˉ}{b} j j .

\frac{1}{s - M ^ _{ρ}^{2} ( s ) + i s Γ ^ _{ρ} ( s )} \approx \frac{1}{s - M _{ρ}^{2} + i M _{ρ} Γ _{ρ}},

\frac{1}{s - M ^ _{ρ}^{2} ( s ) + i s Γ ^ _{ρ} ( s )} \approx \frac{1}{s - M _{ρ}^{2} + i M _{ρ} Γ _{ρ}},

M_{T}^{W} = 2 p_{T}^{ℓ} \neq E_{T} [1 - cos Δ ϕ (p_{T}^{ℓ}, \neq E_{T})] .

M_{T}^{W} = 2 p_{T}^{ℓ} \neq E_{T} [1 - cos Δ ϕ (p_{T}^{ℓ}, \neq E_{T})] .

[N_{in}, N_{hidden} N_{node}, N_{node}, \dots, N_{node}, 2],

[N_{in}, N_{hidden} N_{node}, N_{node}, \dots, N_{node}, 2],

signal \to [01], background \to [10] .

signal \to [01], background \to [10] .

N_{hidden} = 4, 5;

N_{hidden} = 4, 5;

L_{r} = 0.001, 0.003;

B_{s} = 1 0^{3}, 1 0^{4};

0 < r_{0}, r_{1} < 1, r_{0} + r_{1} \equiv 1.

0 < r_{0}, r_{1} < 1, r_{0} + r_{1} \equiv 1.

χ^{2} = \frac{( M _{j j} - M _{W} ) ^{2}}{σ _{W}^{2}} + \frac{( M _{j j j} - M _{t} ) ^{2}}{σ _{t}^{2}} + \frac{( M _{j ℓ ν} - M _{t} ) ^{2}}{σ _{t}^{2}},

χ^{2} = \frac{( M _{j j} - M _{W} ) ^{2}}{σ _{W}^{2}} + \frac{( M _{j j j} - M _{t} ) ^{2}}{σ _{t}^{2}} + \frac{( M _{j ℓ ν} - M _{t} ) ^{2}}{σ _{t}^{2}},

v^{(j)} = \frac{\partial L _{loss} ( w , b )}{\partial b ^{(j)}},

v^{(j)} = \frac{\partial L _{loss} ( w , b )}{\partial b ^{(j)}},

W_{m} = N n = 1 \sum N_{node} (w_{mn}^{(1)})^{2},

W_{m} = N n = 1 \sum N_{node} (w_{mn}^{(1)})^{2},

m = 1 \sum N_{in} W_{m} = 1.

m = 1 \sum N_{in} W_{m} = 1.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Beyond $M_{t\bar{t}}$ : learning to search for a broad $t\bar{t}$ resonance at the LHC

Sunghoon Jung

[email protected]

Center for Theoretical Physics, Department of Physics and Astronomy, Seoul National University, Seoul 08826, Korea

Dongsub Lee

[email protected]

Center for Theoretical Physics, Department of Physics and Astronomy, Seoul National University, Seoul 08826, Korea

Ke-Pan Xie

[email protected]

Center for Theoretical Physics, Department of Physics and Astronomy, Seoul National University, Seoul 08826, Korea

Abstract

A resonance peak in the invariant mass spectrum has been the main feature of a particle at collider experiments. However, broad resonances not exhibiting such a sharp peak are generically predicted in new physics models beyond the Standard Model. Without a peak, how do we discover a broad resonance at colliders? We use machine learning technique to explore answers beyond common knowledge. We learn that, by applying deep neural network to the case of a $t\bar{t}$ resonance, the invariant mass $M_{t\bar{t}}$ is still useful, but additional information from off-resonance region, angular correlations, $p_{T}$ , and top jet mass are also significantly important. As a result, the improved LHC sensitivities do not depend strongly on the width. The results may also imply that the additional information can be used to improve narrow-resonance searches too. Further, we also detail how we assess machine-learned information.

I Introduction

Discovering new physics through a new resonance is one of the most exciting opportunities. A “narrow” resonance peak, being sharply localized in the energy spectrum, allows for the most efficient discovery above continuum backgrounds as well as for precision measurements of the particle mass, width and other properties. However, the widths of new (and presumably heavier) resonances in new physics can be easily much larger than those of the Standard Model (SM) particles. The width generally grows with the mass of a resonance, and a new strong coupling may induce rapid decay as in composite Higgs models Barducci et al. (2013); Greco and Liu (2014); Barducci and Delaunay (2016); Liu et al. (2019a) or warped extra dimensional models Kelley et al. (2011); Ask et al. (2012). Also, more decay channels to lighter beyond-SM particles may open up, which further increases the width.

The large width causes several difficulties in collider experiments. Above all, without a sharp peak, the discovery becomes challenging, as the signal becomes spread over a large range of energy above continuum backgrounds. For example, the ATLAS result based mostly on the invariant mass distribution Aaboud et al. (2018a) shows that for a $M=1$ TeV Kaluza-Klein gluon, the measured (expected) cross section upper limit $\sigma(pp\to g_{KK}\to t\bar{t})$ increases from 1.4 (1.2) pb to 4.7 (2.7) pb when the width-to-mass ratio $\Gamma/M$ varies from 10% to 40%. In addition, the phenomenological study in Ref. Liu et al. (2019a) shows that for the minimal composite Higgs model with the third generation left-handed quark $q_{L}=(t_{L},b_{L})^{T}$ being fully composite, a vector $t\bar{t}$ resonance as light as $M=1$ TeV is still allowed by the direct search in the $\Gamma/M\gtrsim 20\%$ region.

Secondly, broad resonance shape is more susceptible to the energy dependences of parton luminosity and the width, interferences with backgrounds or other resonances, and mixing and overlap with nearby resonances. These effects make discoveries further challenging and complicated. In particular, the complex interference (the one with imaginary parts in amplitudes) in supersymmetric or two-Higgs doublet models can make broad heavy Higgs bosons decaying to $t\bar{t}$ generally appear not as a pure resonance peak Gaemers and Hoogeveen (1984); Dicus et al. (1994); Craig et al. (2015); Jung et al. (2015) but even as pure dips or nothing Jung et al. (2015). And nearly degenerate heavy Higgs bosons can overlap significantly, producing complicated resonance shapes Choi et al. (2005); Ellis et al. (2004); Carena and Liu (2016).

Many of these new broad resonances are just beyond the current reach of the LHC. Thus, it is imperative to study the physics of broad resonances and develop efficient discovery methods. However, broad-resonance searches have been studied only in limited cases, e.g., phenomenologically in Refs. Liu et al. (2019a) (third-generation quark pair, $\ell^{+}\ell^{-}$ ), Kelley et al. (2011) ( $\mu^{+}\mu^{-}$ ), and experimentally in Refs. Aaboud et al. (2018a); Sirunyan et al. (2018a) ( $t\bar{t}$ ), Sirunyan et al. (2018b); Aaboud et al. (2016) ( $jj$ ) Aaboud et al. (2017)( $\ell^{+}\ell^{-}$ ). In all the cases, the invariant mass had still been used as a main observable, but the question of “how do we (best) discover a broad resonance without a peak?” had not been answered thoroughly 111If a broad resonance can decay to multi-top/ $W$ channels, it can be searched using the same-sign di-lepton final state Barducci and Delaunay (2016); Liu et al. (2019a, b), which doesn’t rely on the reconstruction of the invariant mass..

This question might be a problem appropriate to use deep neural network (DNN) technique to answer. It is because the answer is not so obvious, a priori, and even small improvements will be significant. Machine learning has indeed been applied to various problems in particle physics. For example, bump-hunting resonance searches were improved with DNN Collins et al. (2018, 2019). The DNN is one of machine learning algorithms. Coming with various network structures such as fully-connected network Hajer et al. (2018); Baldi et al. (2014, 2016); Luo et al. (2017); Lee et al. (2018); Pearkes et al. (2017), convolutional neural network Guo et al. (2018); de Oliveira et al. (2016); Cogan et al. (2015); Li et al. (2019) and others Fraser and Schwartz (2018); Louppe et al. (2019); Abdughani et al. (2018); Ren et al. (2019); Henrion et al. (2017), DNN had shown remarkable performances in the exploration of physics beyond the SM, often better than other machine learning algorithms such as boosted decision tree (BDT). We refer to Refs. Guest et al. (2018); Abdughani et al. (2019) and references therein for reviews of the DNN applications in LHC physics.

In this paper, we consider a spin-1 broad $t\bar{t}$ resonance at the LHC (Sec. II). Being the heaviest particle in the SM, the top quark has been regarded as an important portal to new physics. As a first step toward a more general study of broad resonances, we ignore any interference effects and nearby resonances (Sec. III). We use fully-connected DNN to explore answers beyond common knowledge (Sec. III). Finally, we assess whether and what DNN can learn, even beyond what we know well (Sec. IV).

II Benchmark model

For simplicity, here we consider a gauge singlet vector resonance $\rho$ interacting strongly with the SM right-handed top quark $t_{R}$ , and the relevant Lagrangian is

[TABLE]

where $\rho_{\mu\nu}=\partial_{\mu}\rho_{\nu}-\partial_{\nu}\rho_{\mu}$ , and $g_{1}$ is the SM hypercharge gauge coupling. This model is also considered in Refs. Greco and Liu (2014); Liu et al. (2019b); Liu and Mahbubani (2016). Note that the $\rho_{\mu}$ mixes with the SM gauge field $B_{\mu}$ in Eq. (1). Given $g_{\rho}\gg g_{1}$ , the mixing angle is $\sin\theta\approx g_{1}/g_{\rho}$ before the electroweak symmetry breaking (EWSB). Therefore, after transforming to the mass eigenstates, the interactions between $\rho$ resonance and SM fermions will be $\sim g_{\rho}$ for $t_{R}$ and $\sim Yg_{1}^{2}/g_{\rho}$ for other fermions (including $t_{L}$ and other light quarks), with $Y$ being the hypercharge of the corresponding fermion. The physical mass of $\rho$ is $M_{\rho}=m_{\rho}$ . EWSB gives $\mathcal{O}(v^{2}/m_{\rho}^{2})$ corrections to above picture, and the details can be found in Appendix C of Ref. Liu et al. (2019b).

Due to the large coupling $g_{\rho}$ , the $\rho$ resonance decays to $t\bar{t}$ with a branching ratio $\sim 100\%$ , and the width-to-mass ratio is

[TABLE]

For $g_{\rho}=3$ and 4, this ratio reaches 36% and 64%, respectively. Thus a broad $\rho$ is easily realized in the model described by Eq. (1). Note that $\Gamma_{\rho\to t\bar{t}}\leqslant\Gamma_{\rho}$ , if $\rho$ has other strong dynamical decay channels such as the decay to low-mass top partners (which are not listed in our simplified model), typically $\Gamma_{\rho}$ is several times larger than $\Gamma_{\rho\to t\bar{t}}$ , thus a large $\Gamma_{\rho}/M_{\rho}$ can be obtained even for smaller $g_{\rho}$ . We consider $M_{\rho}=1$ and 5 TeV as two benchmarks, and for each mass point the width-to-mass ratios $\Gamma_{\rho}/M_{\rho}=10\%$ , 20%, 30% and 40% are considered. The corresponding benchmark cases are then identified as M $i\Gamma j$ , with $i=1$ or 5 denoting the mass (in unit of TeV) and $j=1$ , 2, 3, 4 being $10\times\Gamma_{\rho}/M_{\rho}$ . For example, M1 $\Gamma$ 4 is the benchmark for $M_{\rho}=1$ TeV and $\Gamma_{\rho}/M_{\rho}=40\%$ .

At the LHC, the $\rho$ resonance can be produced via the Drell-Yan process ( $q\bar{q}\to\rho$ ) through the $\rho$ -light quark interaction. Among the various decay channels of the $t\bar{t}$ , we choose to focus on the semi-leptonic final state

[TABLE]

The dominant background is then the SM $t\bar{t}$ process, which contributes $81\%\sim 88\%$ of the total backgrounds Aaboud et al. (2018a). For simplicity, we only consider this background. It should be emphasized that although we provide a benchmark model as physical motivation here, our results are general for all heavy singlet spin-1 resonances with top quark portal.

III Searching for a broad $t\bar{t}$ resonance

In this section, we describe technical details of our work and show final cross section limits. First, we describe how we parameterize a broad resonance, and how we build learning datasets and train DNN for each benchmark signal case. Then we derive improved cross section upper limits.

III.1 Breit-Wigner description

We assume a single, isolated broad resonance far away from any other resonances and thresholds, and ignore any interference effects. Then we use the following Breit-Wigner description of the propagator of a broad resonance

[TABLE]

where the nominal resonance mass $M_{\rho}$ and the width $\Gamma_{\rho}$ are fixed constants. The energy dependence of the mass $\hat{M}_{\rho}(s)$ from the real part of the self-energy correction is higher-order, hence small irrespective of the large width. On the other hand, the energy dependence of the width $\hat{\Gamma}_{\rho}(s)\propto\sqrt{s}$ from the imaginary part can induce corrections as large as $\sim$ 100 (10)% for broad resonances considered in this paper $\Gamma_{\rho}/M_{\rho}\sim 40~{}(20)\%$ . But, within this range of the width, the resonance shape remains relatively undistorted albeit some shifts of the peak and height Liu et al. (2019a); Kelley et al. (2011); An et al. (2012). Also, the fixed mass and width have been used in LHC searches of broad resonances Aaboud et al. (2018a); Sirunyan et al. (2018a). Thus, we use Eq. (4) with fixed $M_{\rho}$ and $\Gamma_{\rho}$ , both for simplicity and for comparison purpose.

III.2 Preparing training data

The model described by Eq. (1) is written in the universal FeynRules output file Alloul et al. (2014). We generate parton-level events of the signals and background using 5-flavor scheme within the MadGraph5_aMC@NLO Alwall et al. (2014) package. All spin correlations of the final state $\ell^{\pm}\nu b\bar{b}jj$ objects are kept. The phase space integrate region is set to $|\sqrt{s}-M_{\rho}|\leqslant 15\times\Gamma_{\rho}$ , which is large enough for us to simulate the full on- and off-shell effect of the $\rho$ resonance. The interference between $pp\to\rho\to t\bar{t}$ and the SM $t\bar{t}$ background is negligible Aaboud et al. (2018a), thus not considered here. We normalize the SM $t\bar{t}$ cross section with the the next-to-next-to-leading order with next-to-next-to-leading logarithmic soft-gluon resummation calculation from the Top++2.0 package Czakon and Mitov (2014); Czakon et al. (2013); Czakon and Mitov (2013, 2012); Bärnreuther et al. (2012); Cacciari et al. (2012), and the $K$ -factor is 1.63. The parton-level events are matched to $+1{\rm~{}jet}$ final state and then interfaced to Pythia 8 Sjöstrand et al. (2015) and Delphes de Favereau et al. (2014) for parton shower and fast detector simulation. As for the detector setup, we mainly use the CMS configuration, but with following modifications: the isolation $\Delta R$ parameters for electron, muon and jet are set to 0.2, 0.3 and 0.5 respectively. The $b$ -tagging efficiency (and mis-tag rate for $c$ -jet, light-flavor jets) is corrected to $0.77$ (and $1/6$ , $1/134$ ) according to Ref. Aaboud et al. (2018b). We generate $\gtrsim 5\times 10^{6}$ events for the background and each signal benchmark.

We defined two kinematic regions. The first one is called the resolved region, in which the decay products of the top quark (i.e $\ell^{\pm}\nu b\bar{b}jj$ ) are identified as individual objects. This region is defined as follows

Exactly one charged lepton $\ell^{\pm}=e^{\pm}$ or $\mu^{\pm}$ with $p_{T}^{\ell}>30$ GeV and $|\eta^{\ell}|<2.5$ . Events containing a second lepton with $p_{T}^{\ell}>25$ GeV are vetoed. 2. 2.

$\not{E}_{T}>20$ GeV and $\not{E}_{T}+M_{T}^{W}>60$ GeV, where the $W$ -transverse mass is defined as

[TABLE] 3. 3.

At least four jets with $p_{T}^{j}>25$ GeV and $|\eta^{j}|<2.5$ , and at least one of the leading four jets is $b$ -tagged.

The cuts are mainly based on Ref. Aaboud et al. (2018a), but with some simplifications. The cut flows of the signals and backgrounds are listed in Table 1. We only consider the $M_{\rho}=1$ TeV benchmark cases in this kinematic region. The SM $t\bar{t}$ cross section is 68.9 pb taken into account the $K$ -factor.

The second kinematic region is the boosted region, in which the hadronic decay products of the top quark are combined into a fat jet. The corresponding event selection criteria is

Exactly one charged lepton $\ell^{\pm}=e^{\pm}$ or $\mu^{\pm}$ with $p_{T}^{\ell}>30$ GeV and $|\eta^{\ell}|<2.5$ . Events containing a second lepton with $p_{T}^{\ell}>25$ GeV are vetoed. 2. 2.

$\not{E}_{T}>20$ GeV and $\not{E}_{T}+M_{T}^{W}>60$ GeV. 3. 3.

Exactly one top-jet with $p_{T}^{j_{\rm top}}>300$ GeV and $|\eta^{j_{\rm top}}|<2.0$ , and satisfies $\Delta\phi(j_{\rm top},\ell^{\pm})>2.3$ . The top-jet is reconstructed with a $R=1.0$ cone in anti- $k_{t}$ algorithm, and is trimmed with $R_{\rm cut}=0.2$ and $f_{\rm cut}=0.05$ ATL (2015a). We use a simplified top-tagging procedure in event selection. The top-tagging efficiency and the mistag-rate are set to 80% and 20% respectively, based on Ref. ATL (2015b), which makes use of jet invariant mass and $N$ -subjettiness Thaler and Wang (2008); Kaplan et al. (2008); Thaler and Van Tilburg (2011, 2012); Plehn and Spannowsky (2012); Kasieczka et al. (2015). 4. 4.

Exactly one selected jet with $p_{T}^{j_{\rm sel}}>25$ GeV and $|\eta^{j_{\rm sel}}|<2.5$ . In addition, the selected jet should have $\Delta R(j_{\rm sel},j_{\rm top})>1.5$ and $\Delta R(j_{\rm sel},\ell)<1.5$ .

The cuts here are again mainly based on Ref. Aaboud et al. (2018a). and the cut flows for signals and background are listed in Table 2. In this region, we consider both $M_{\rho}=1$ and 5 TeV signals. To increase the event generating efficiency of the background events, in this region we require the SM $pp\to t\bar{t}\to\ell^{\pm}\nu b\bar{b}jj$ process has at least one final state parton (including the $b$ -parton) with $p_{T}>150$ GeV. This is done by setting xptj = 150 in MadGraph5_aMC@NLO. We have checked that this setup doesn’t lose the generality, but improves the event generating efficiency by a factor of $\sim 6$ . The background cross section after cuts is 2.88 pb taken into account the $K$ -factor.

The events after cuts are collected to make training and validation/test datasets. For the resolved region, we have $1\ell^{\pm}+\not{E}_{T}+4$ jets in total 6 reconstructed objects in the final state, and 26 low-level kinematic observables can be used as input features: $E^{\ell}$ , $p_{T}^{\ell}$ , $\eta^{\ell}$ and $\phi^{\ell}$ from the charged lepton; $\not{E}_{T}$ , $\phi^{\not{E}_{T}}$ from the missing transverse momentum; $E^{j_{i}}$ , $p_{T}^{j_{i}}$ , $\eta^{j_{i}}$ , $\phi^{j_{i}}$ and $b^{j_{i}}$ from the 4 leading jets, with $i=1$ , 2, 3, 4. Here $b^{j}$ is the $b$ -tagging observable, which is 1 for a $b$ -tagged jet and 0 otherwise. Some examples of the low-level observables distributions are shown in Fig. 1(a). For each benchmark case (i.e. M1 $\Gamma$ 1 $\sim$ M1 $\Gamma$ 4), we build a training dataset and a validation/test dataset. Both of those two datasets have 1,000,000 events, which contain nearly equal signal and background events.

For the boosted region, $1\ell^{\pm}+\not{E}_{T}+1~{}\text{top-jet}+1~{}{\rm selected~{}jet}$ in total 4 objects are reconstructed, and we can extract 15 low-level observables as input features: the first 6 are from $\ell$ and $\not{E}_{T}$ , same as the resolved region; the other 9 insist of $E^{j_{\rm sel}}$ , $p_{T}^{j_{\rm sel}}$ , $\eta^{j_{\rm sel}}$ , $\phi^{j_{\rm sel}}$ , $b^{j_{\rm sel}}$ from the selected jet, and $E^{j_{\rm top}}$ , $p_{T}^{j_{\rm top}}$ , $\eta^{j_{\rm top}}$ , $\phi^{j_{\rm top}}$ from the top-jet. Some examples of the low-level observables distributions are illustrated in Fig. 1(b). For each benchmark case (i.e. M1 $\Gamma$ 1 $\sim$ M1 $\Gamma$ 4, and M5 $\Gamma$ 1 $\sim$ M5 $\Gamma$ 4), we randomly mix equal number of signal and background events to get 800,000 events for training and another 800,000 events for validation/test.

III.3 Training the DNN

The DNN classifier is implemented using the Keras Chollet et al. (2015) package (with Tensorflow Abadi et al. (2016) as the backend). The architecture of the DNN is as follows,

[TABLE]

where $N_{\rm hidden}$ and $N_{\rm node}$ are the numbers of hidden layers and the number of neurons per hidden layer, respectively. The number of input features $N_{\rm in}=26$ (15) for the resolved (boosted) region. All the input features are rescaled to have average 0 and standard deviation 1 before training. We label the events with column matrices to match the two neurons in output layer:

[TABLE]

The Rectified Linear Unit (ReLU) activation function is used for all the hidden layers, while the softmax activation function is adopted for the output layer. The loss function is categorical_crossentropy, and the optimizer is Adam. To get the best configuration of the DNN, we try various choices of the hyper-parameter combination as follows,

[TABLE]

where $L_{r}$ is the initial learning rate, $D_{r}$ is the dropout rate, and $B_{s}$ is the batch size. For each benchmark case, there are in total 48 different DNN configurations, in which we select the best one based on the learning curves with the following criteria:

If the validation/test accuracy curve achieves its maximum when crossing with the training accuracy curve, and meanwhile the validation/test loss curve reaches its minimum and crosses with the training curve, we select that configuration and cut the training at that epoch. This early stop is to prevent over-fitting. 2. 2.

If more than one configurations have the behaviors mentioned above, then we select the one with the higher validation/test accuracy and lower validation/test loss; if still there remain more than one networks, we choose the one with learning curves having less fluctuation.

The details of training and the chosen configurations are listed in Tables 3 and 4 of the Appendix. For the $M_{\rho}=1$ TeV models, the DNN can reach a classification accuracy of $\geqslant 80\%$ in the resolved region and of $\geqslant 65\%$ in the boosted region. While for the $M_{\rho}=5$ TeV case, the accuracy is $\geqslant 76\%$ in the boosted region.

The softmax activation function for the output layer guarantees the output responses of the 0th neuron ( $r_{0}$ ) and the 1st neuron ( $r_{1}$ ) satisfy

[TABLE]

Therefore, we can consider $r_{1}$ only, and denote it as $r$ . Due to the label definition in Eq. (6), If the DNN is well trained, the distribution of $r$ should have a peak around 1.0 (0.0) for the signal (background), for both the training data and the validation/test data. Figure 2 shows the distributions of the validation/test data for benchmark cases with $\Gamma_{\rho}/M_{\rho}=40\%$ as an illustration. The DNN for M1 $\Gamma$ 4 shows worse performance in boosted region compare to the one in resolved region. This is because that two peaks in neuron output from signal and SM background are not separated well. In fact, this is a generic feature for all $M_{\rho}=1$ TeV benchmark cases. It is mainly due to the the boosted region cuts, which require a top-jet with $p_{T}^{j_{\rm top}}>300$ GeV. As a result, most of the SM $t\bar{t}$ background events are round this value. However, for a $M_{\rho}=1$ TeV resonance, its decay product $t/\bar{t}$ acquires a transverse momentum $\sim 500$ GeV, quite similar to the cut threshold. Therefore, the signal and background look similar (see the $p_{T}^{j_{\rm top}}$ distribution in Fig. 1(b)), and thus the separation is not efficient. On the other hand, for a $M_{\rho}=5$ TeV resonance, $p_{T}^{j_{\rm top}}\sim 2.5$ TeV, the DNN works very well, as plotted in the bottom of Fig. 2.

III.4 Setting bounds for the signal

We treat the neuron output $r$ as an observable, and fit its distribution shape to get the cross section upper limit of $pp\to\rho\to t\bar{t}$ for a given integrated luminosity. For the $M_{\rho}=1$ TeV benchmark cases, we use a binned $\chi^{2}$ fitting method by dividing the $0<r<1$ range into 50 bins. While for the $M_{\rho}=5$ TeV benchmarks, as the signal cross sections are expected to be tiny, to improve the efficiency we use the un-binned fitting method described in Refs. Yang and Li (2018); CMS (2011). In each case, we consider the statistic uncertainty and assume a 12% systematic uncertainty for the background. To include the effect of other subdominant backgrounds besides $t\bar{t}$ (i.e. $W+{\rm jets}$ , multi-jet, etc), we further rescale the cross section by a factor of $1.23=1/0.81$ and $1.14=1/0.88$ for the resolved and boosted regions, respectively. Those factors come from the fact that $t\bar{t}$ contributes 81% (88%) of the total background for resolved (boosted) region Aaboud et al. (2018a). This simple rescaling could overestimate final contributions from subdominant backgrounds, and result in somewhat conservative estimations of cross section bounds.

The signal strength upper limits are derived for the unfolded parton-level cross section $\sigma(pp\to\rho\to t\bar{t})$ , which can be compared with the final results in experimental papers, e.g. Refs. Aaboud et al. (2018a); Sirunyan et al. (2018a). Our results are shown in Fig 3, in which the expected and measured upper limits of Ref. Aaboud et al. (2018a) are also plotted as references, as they use the same final state and similar selection cuts. One can read that the DNN results are rather insensitive to the width of the $\rho$ resonance compare to the traditional approach, achieving better constraints in the large width region 222We also checked that the DNN results are better than those from more traditionally used BDT.. For the $M_{\rho}=1$ TeV benchmark, the result is obtained by the combined fitting of both resolved and boosted regions. Individually, the resolved and boosted regions respectively yield cross sections $\sim 3$ pb and $\sim 1$ pb. Although networks in the resolved region have a higher accuracy ( $\geqslant 80\%$ ) than those in the boosted region ( $\geqslant 65\%$ ) in Table 3, they actually give a worse measurement of the cross section. This is because the boosted cuts can remove lots of background events and hence improve the fitting performance. That is also the reason why we only consider the boosted region for $M_{\rho}=5$ TeV: the production rate for such a high mass $\rho$ is so small that we have to use the boosted region to suppress the background. The DNN bounds for 5 TeV signal benchmark are comparable to the experimentally measured ones, but still better than the experimentally expected ones. As the training uses random number for the initialization of weights and biases, even for a given DNN configuration, the final results are slightly different for different running. To take into account this training uncertainty, we repeat 15 times of running the chosen DNN configuration for each benchmark case. For the $M_{\rho}=1$ TeV case, the relative fluctuation is small thus not shown; while for the $M_{\rho}=5$ TeV case, the standard deviations of the runs are shown as vertical error bars in Fig 3.

IV Figuring out what the machine had learned

In this section, we attempt to assess information learned by DNN using three methods, each of which will be discussed in each subsection. As a result, we can figure out not only which information has been learned, but also which information is most important.

IV.1 Testing high-level observables

It is important to know whether a DNN had learned well-known useful but complicated features. In fact, it has been argued that some machine learning methods such as jet image Cogan et al. (2015) do not efficiently capture invariant mass features de Oliveira et al. (2016).

Our approach is to train another set of DNNs using additional high-level observables, of which features we want to test. By comparing the performances of these new DNNs with the original DNNs trained with only low-level observables, we can test whether those particular high-level features (i.e. physically-motivated) have been effectively learned 333The definition of “learned” could be ambiguous, but we use subjective criteria discussed in text. or not. This “saturation approach” has been widely used in particle physics research Baldi et al. (2014); Datta and Larkoski (2017).

To construct high-level observables, we first reconstruct the $t$ and $\bar{t}$ . The longitudinal momentum of the neutrino is solved by requiring the leptonically decaying $W$ to be on-shell, i.e. $M_{\ell\nu}=M_{W}$ . For the resolved region, the assignment of the 4 reconstructed jets are done by minimizing

[TABLE]

for various jet permutations, where $\sigma_{W}=0.1\times M_{W}$ and $\sigma_{t}=0.1\times M_{t}$ . For the boosted region, a top quark is identified as the top-jet and the other is reconstructed from the combination of $\ell^{\pm}\nu j_{\rm sel}$ . Once the $t$ and $\bar{t}$ are reconstructed, we are able to define the following 7 high-level observables for the signal $pp\to\rho\to t\bar{t}$ :

The invariant mass $M_{t\bar{t}}$ of the $t\bar{t}$ system. 2. 2.

The polar angle and azimuthal angle in the Collins-Soper frame Collins and Soper (1977). We label the leptonic and hadronic decaying tops with subscripts “tl” and “th”, respectively. Hence we have $\cos\theta^{\rm CS}_{\rm tl}$ , $\cos\theta^{\rm CS}_{\rm th}$ , $\phi^{\rm CS}_{\rm tl}$ and $\phi^{\rm CS}_{\rm th}$ in total 4 observables. 3. 3.

The polar angles in the Mustraal frame Richter-Was and Was (2016), $\cos\theta_{1}^{\rm Mus.}$ and $\cos\theta_{2}^{\rm Mus.}$ .

The first observable reveals the resonance feature, while the latter 6 observables reflect the spin-1 nature of the $\rho$ resonance. For the boosted region, to take into account the features of the top jet, we introduce 3 additional high-level observables, i.e.

The invariant mass $M_{j_{\rm top}}$ of the top jet. 2. 2.

The $N$ -subjettiness observables $\tau_{21}$ and $\tau_{32}$ of the top jet Kaplan et al. (2008); Thaler and Van Tilburg (2011, 2012); Plehn and Spannowsky (2012); Kasieczka et al. (2015).

Those observables are shown to be important in identifying the color structure of the hard process Kasieczka et al. (2015); Joshi et al. (2012); Soper and Spannowsky (2014). In our scenario, the signal results from a color-singlet resonance, while the background comes from QCD process, and the jet mass and $N$ -subjettiness can help to reveal this difference Joshi et al. (2012). Moreover, such jet substructures can be more independent on resonance characteristics and kinematics.

Some distributions of these high-level observables are shown in Fig. 4(b). Note that the spin correlations as well as the jet substructure observables are rather insensitive to the width of $\rho$ , as expected. For the 5 TeV resonance, the mass peak of $M_{t\bar{t}}\sim 5$ TeV almost disappears for $\Gamma_{\rho}/M_{\rho}\geqslant 10\%$ ; instead, there is a peak $\sim 1$ TeV, due to the parton-distribution support of off-shell effects and hard $p_{T}$ cuts. Most identified top-jets in both signal and background originate correctly from the top quark, thus the differences shown in the distributions of $M_{j_{\rm top}}$ and $\tau_{32}$ come from the color structure of the hard process. For example, the background’s $M_{j_{\rm top}}$ distribution is slightly broader and the $\tau_{32}$ is slightly bigger than the signals. This is because the top-jets from QCD $t\bar{t}$ are color connected with the initial state, consequently having more radiations. Using these “all observables” (i.e. sum of low- and high-level observables) as inputs, we train a new set of DNNs; best network configurations are again surveyed and detailed in Tables 3 and 4 of the Appendix.

We compare the performances of original and new DNNs using receiver operating characteristic (ROC) curves. The area under curve (AUC) is used as a metric of the performance. Some of the comparisons are shown in Fig. 5. First, in the resolved region as shown in the top panel, we found that there is only little change on ROC curves by adding high-level observables. Not only AUC, but also background efficiencies show small change. This means that the inclusion of high-level observables does not yield the improvement of accuracy; the original DNN had learned those high-level features successfully from low-level inputs.

In the boosted region, while the $M_{t\bar{t}}$ , $M_{j_{\rm top}}$ and spin correlations can be derived from the four momenta of reconstructed objects, the $N$ -subjettiness cannot be inferred from the low-level inputs. Therefore, adding high-level features can bring improvements. As shown with ROC in the bottom two panels of Fig. 5, the improvement is sizable for M1 $\Gamma$ 4, while, however, relatively small for M5 $\Gamma$ 4. This may be because the event topology of M5 boosted cases becomes so simple that many features are more correlated.

IV.2 Ranking input observables by importance

Which information has been used most usefully by DNN in distinguishing a broad resonance against continuum background? To answer this, we attempt to identify which connections between which neurons and layers are weighted most importantly. Following Ref. Nielsen (2015), we define the learning speed of the $j$ -th hidden layer as

[TABLE]

where $\vec{b}^{(j)}$ is the bias vector of the $j$ -th hidden layer, while $\mathcal{L}_{\rm loss}$ is the loss function. As the target of machine learning is to find the global minimum of $\mathcal{L}_{\rm loss}$ , the $v^{(j)}$ approximately reflects the training sensitivity of a specific layer. When training the DNN, the larger $v^{(j)}$ a layer acquires, the more important it is. We found that for all individual benchmark cases M $i\Gamma j$ , the first hidden layer has the highest learning speed several times larger than that of other layers. For example, for M1 $\Gamma$ 4 case in the resolved region, the learning speed is $v^{(1)}=0.457$ , $v^{(2)}=0.086$ , $v^{(3)}=0.033$ , $v^{(4)}=0.016$ and $v^{(5)}=0.008$ . This means that good features are typically learned most efficiently in the first hidden layer.

For our DNN architecture described in Eq. (5), the weights of the first hidden layer form a $N_{\rm in}\times N_{\rm node}$ matrix, whose element is denoted as $w^{(1)}_{mn}$ with $m=1,\cdots,N_{\rm in}$ and $n=1,\cdots,N_{\rm node}$ . As all the input features are rescaled to have average 0 and standard deviation 1, the magnitude of the weight $w^{(1)}_{mn}$ reflects the correlation strength between the $m$ -th input and the $n$ -th neuron in the first hidden layer. Motivated by this, we further define

[TABLE]

as a measure of the importance of the $m$ -th input feature. The normalization $\mathcal{N}$ is such that

[TABLE]

Figure 6 shows the $W_{m}$ ’s of each input observable from the DNN trained using both low- and high-level observables. Above all, the $M_{t\bar{t}}$ – that we expected to be less useful for a broad resonance – is still one of the most important observables even when the resonance is broad. This is particularly true for a low-mass broad resonance in the resolved region (upper panel). In the case of a heavy-resonance in the boosted region (lower panel), its importance is relatively reduced, partly because some invariant-mass information has been used in the selection of the boosted region. In such cases, the top-jet mass and transverse momentum which are somewhat correlated with $M_{t\bar{t}}$ and width can significantly complement the search, as shown in the bottom panel. In addition, the invariant mass of the top-jet is another important input feature because it reflects the color flow difference between signals and background. On the other hand, $N$ -subjettinesses again turn out to be relatively less useful.

Remarkably, there are much other useful information, particularly from angular distributions $\eta^{\ell,j}$ and $\cos\theta_{1,2}^{\rm Mus.}$ . From Figs. 1 and 4(b), we can see that these observables are relatively uncorrelated with the resonance width. We have indeed checked that the cross entropies Roxlo and Reece (2018) between these observables and $M_{t\bar{t}}$ , which can quantify their correlations, are not so high. As we will see in the next subsection, these information are useful even in the off-shell region away from the resonance, hence less correlated with the width. Thus, these features are useful in search of broad resonances. This may also imply that narrow-resonance searches can be improved by adding off-resonance information; this is partly because a large fraction of signals is still from low-energy off-resonance region where parton-luminosity support is much larger (although buried under larger backgrounds). We leave this for a future study.

IV.3 Planing away $M_{t\bar{t}}$

We have observed that $M_{t\bar{t}}$ is still important, but there are indeed uncorrelated useful information. How much is discovery capability attributed to those uncorrelated (whether known or unknown) information? Using the data planing method de Oliveira et al. (2016); Chang et al. (2018), we plane away the feature in the invariant mass spectrum. We attach a weight to each event so that the weighted distribution of $M_{t\bar{t}}$ becomes flat for both signals and backgrounds; the details of chosen network configurations and more results are described in Table 5 of the Appendix. A new set of DNNs trained with such planed data must learn information uncorrelated with $M_{t\bar{t}}$ , and the difference between the performance with/without $M_{t\bar{t}}$ offers a quantitative answer to the question “how much information it is beyond the invariant mass”.

In practice, to avoid large fluctuations, we use only $M_{t\bar{t}}\in[0.5,3]$ TeV region with 20 GeV bin size for all signal cases. This means that for 5 TeV signals, we consider only off-resonance events; note that the majority of signal is from the low-energy region supported by larger parton luminosities.

After $M_{t\bar{t}}$ planed away, the classification accuracies reduce from $\geqslant 80\%$ to $\geqslant 73\%$ for $M_{\rho}=1$ TeV in the resolved region and from $\geqslant 65\%$ to $\geqslant 62\%$ in the boosted region. For $M_{\rho}=5$ TeV cases, accuracies reduce from $\geqslant 76\%$ to $\geqslant 63\%$ in the boosted region. As accuracies are still significantly higher than random guess (i.e. 50%), we conclude that DNNs still have some capabilities to distinguish signals from background, even though they are blind to $M_{t\bar{t}}$ and most events are from off-resonance region (for 5 TeV cases). Clearly, on top of $M_{t\bar{t}}$ and width, the original DNNs had learned extra information (such as aforementioned angular correlations).

Indeed, we have checked that the weights $W_{m}$ for various anglular and angular-correlation observables, after planing the $M_{t\bar{t}}$ , are relatively high. From Fig. 1 and 4(b), one can also see that they are largely independent on the width. The helicity conservation (hence, angular correlations) can hold somewhat independently of the invariant mass, as the range of the invariant mass considered is always much larger than the top mass. Thus, we conclude that much of the angular information can be from off-resonance region, and such off-resonance information (although buried under larger backgrounds) can enhance discovery power. As a result, as shown in Fig. 3, final performance is not only improved but became rather insensitive to the resonance width.

A final remark is that there could still be unknown (to us) useful information that are not identified in our analysis.

V Conclusion

We have found that, in an attempt to develop methods to discover broad $t\bar{t}$ resonances, $M_{t\bar{t}}$ is still one of the most important observables, but additional information from both on- and off-resonance regions can significantly enhance discovery capability. As a result, the cross section upper limits can be improved by $\sim 60\%$ for $\Gamma_{\rho}/M_{\rho}\sim 40\%$ , and the improved LHC sensitivities do not strongly depend on the width of a resonance. As resonances in new physics beyond the SM are easily broad, our learnings and techniques can be used to efficiently search for them.

The most useful observables turn out to be $M_{t\bar{t}}$ (even for broad resonances), $p_{T}^{j_{\rm top}}$ , $M_{j_{\rm top}}$ , angular distributions and color correlations. The usefulness of $M_{t\bar{t}}$ even for broad-resonance searches is not necessarily obvious, a priori. But correlated observables such as $p_{T}^{j_{\rm top}}$ are found to further complement. Angular information (some of whose contributions come from off-resonance region) and $M_{j_{\rm top}}$ (which can measure color flow structures irrespective of resonance characteristics) are relatively uncorrelated with the width and $M_{t\bar{t}}$ , making improved LHC sensitivities less dependent on the width. Lastly, as we trained using only low-level inputs, our results also show that high-level observables such as $M_{t\bar{t}}$ are effectively well learned by DNN.

We have assessed these machine-learned information in three ways: by explicitly testing those high-level observables, by ranking input (low and/or high) observables using weights of the network, and by planing away features correlated with $M_{t\bar{t}}$ . Notably, after all, there can still be unknown useful information that are not easily identified in our analysis. Thus, being able to communicate more efficiently with networks will enable better explorations of the nature, beyond what we know.

Acknowledgements.

We would like to thank Shawn Jia, Jinmian Li, Hui Luo, Tao Xu, Daneng Yang and Zhao-Huan Yu for discussions and and the anonymous referee for useful suggestions. SJ and KPX are supported by Grant Korea NRF 2015R1A4A1042542, NRF 2017R1D1A1B03030820, SJ also by POSCO Science Fellowship, and DL by NRF 0426-20170003, NRF 0409-20190120.

Appendix A The chosen DNN configurations and their performances

The selected DNN configurations for $M_{\rho}=1$ and 5 TeV are listed in Table 3 and Table 4, respectively. The selection criteria are described in Section III.3. The epochs when we cut the training are listed in the forth columns. One can see that for a individual signal benchmark in a given kinematic region, the DNN with low-level observables usually requires a longer training epoch than the DNN with all observables, if they have the same configurations. That is because the DNN needs more time to learn about the physics in the signal process, if no hint is given to it. The classification accuracies (on the validation/test data) of the networks are given in the fifth columns.

Table 5 shows the accuracy reach of the DNNs before and after planing away the key observable $M_{t\bar{t}}$ . The data of the second row, i.e. the accuracies before planing, are taken from the fifth columns of Tables 3 and 4. While the accuracies after planing listed in the third row are obtained by the weighted training described in Section IV.3.

Bibliography71

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Barducci et al. (2013) D. Barducci, A. Belyaev, S. De Curtis, S. Moretti, and G. M. Pruna, JHEP 04 , 152 (2013), eprint 1210.2927.
2Greco and Liu (2014) D. Greco and D. Liu, JHEP 12 , 126 (2014), eprint 1410.2883.
3Barducci and Delaunay (2016) D. Barducci and C. Delaunay, JHEP 02 , 055 (2016), eprint 1511.01101.
4Liu et al. (2019 a) D. Liu, L.-T. Wang, and K.-P. Xie (2019 a), eprint 1901.01674.
5Kelley et al. (2011) R. Kelley, L. Randall, and B. Shuve, JHEP 02 , 014 (2011), eprint 1011.0728.
6Ask et al. (2012) S. Ask, J. H. Collins, J. R. Forshaw, K. Joshi, and A. D. Pilkington, JHEP 01 , 018 (2012), eprint 1108.2396.
7Aaboud et al. (2018 a) M. Aaboud et al. (ATLAS), Eur. Phys. J. C 78 , 565 (2018 a), eprint 1804.10823.
8Gaemers and Hoogeveen (1984) K. J. F. Gaemers and F. Hoogeveen, Phys. Lett. 146B , 347 (1984).

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Beyond MttˉM_{t\bar{t}}Mttˉ​: learning to search for a broad ttˉt\bar{t}ttˉ resonance at the LHC

Abstract

I Introduction

II Benchmark model

III Searching for a broad ttˉt\bar{t}ttˉ resonance

III.1 Breit-Wigner description

III.2 Preparing training data

III.3 Training the DNN

III.4 Setting bounds for the signal

IV Figuring out what the machine had learned

IV.1 Testing high-level observables

IV.2 Ranking input observables by importance

IV.3 Planing away MttˉM_{t\bar{t}}Mttˉ​

V Conclusion

Acknowledgements.

Appendix A The chosen DNN configurations and their performances

Beyond $M_{t\bar{t}}$ : learning to search for a broad $t\bar{t}$ resonance at the LHC

III Searching for a broad $t\bar{t}$ resonance

IV.3 Planing away $M_{t\bar{t}}$