Unveiling CP property of top-Higgs coupling with graph neural networks   at the LHC

Jie Ren; Lei Wu; Jin Min Yang

arXiv:1901.05627·hep-ph·January 8, 2020

Unveiling CP property of top-Higgs coupling with graph neural networks at the LHC

Jie Ren, Lei Wu, Jin Min Yang

PDF

TL;DR

This paper introduces a graph neural network approach to determine the CP nature of the top-Higgs coupling at the LHC, achieving effective discrimination between CP-even and CP-odd interactions with realistic data volumes.

Contribution

The study applies message passing neural networks to particle physics data, providing a novel method for probing the CP properties of the top-Higgs interaction.

Findings

01

Effective CP discrimination at the LHC with 300 fb$^{-1}$ data

02

Uses semi-leptonic top-Higgs decay channel

03

Achieves high classification accuracy

Abstract

The top-Higgs coupling plays an important role in particle physics and cosmology. The precision measurements of this coupling can provide an insight to new physics beyond the Standard Model. In this paper, we propose to use Message Passing Neural Network (MPNN) to reveal the CP nature of top-Higgs interaction through semi-leptonic channel $pp \to t (\to b ℓ^{-} ν_{ℓ}) \overset{ˉ}{t} (\to \overset{ˉ}{b} j j) h (\to b \overset{ˉ}{b})$ . Using the test statistics constructed from the event classification probabilities given by the MPNN, we find that the pure CP-even and CP-odd components can be well distinguished at the LHC, with at most 300 fb $^{- 1}$ experimental data.

Figures5

Click any figure to enlarge with its caption.

Equations21

L \supset - \frac{y _{t}}{2} \overset{ˉ}{t} (cos ξ + i γ_{5} sin ξ) t H,

L \supset - \frac{y _{t}}{2} \overset{ˉ}{t} (cos ξ + i γ_{5} sin ξ) t H,

s_{i}^{0} = ReLU (W_{e} x_{i} + b_{e}),

s_{i}^{0} = ReLU (W_{e} x_{i} + b_{e}),

m_{i}^{t}

m_{i}^{t}

s_{i}^{t}

(\hat{d}_{ij})_{k} = exp {\frac{( d _{ij} - μ _{k} ) ^{2}}{2 σ ^{2}}},

(\hat{d}_{ij})_{k} = exp {\frac{( d _{ij} - μ _{k} ) ^{2}}{2 σ ^{2}}},

(p_{i})_{k} = \frac{exp {( W _{o} s _{i} + b _{o} ) _{k} }}{\sum _{k} exp {( W _{o} s _{i} + b _{o} ) _{k} }}, (k = 1, 2, 3)

(p_{i})_{k} = \frac{exp {( W _{o} s _{i} + b _{o} ) _{k} }}{\sum _{k} exp {( W _{o} s _{i} + b _{o} ) _{k} }}, (k = 1, 2, 3)

p = \frac{1}{N} i \sum p_{i},

p = \frac{1}{N} i \sum p_{i},

L_{h} (D)

L_{h} (D)

L_{A} (D)

ln Q (D) = ln \frac{L _{A} ( D )}{L _{h} ( D )}

ln Q (D) = ln \frac{L _{A} ( D )}{L _{h} ( D )}

p_{h} (ln Q^{*})

p_{h} (ln Q^{*})

p_{A} (ln Q^{*})

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Unveiling CP property of top-Higgs coupling with graph neural networks at the LHC

Jie Ren

[email protected]

CAS Key Laboratory of Theoretical Physics, Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing 100190, China

School of Physics, University of Chinese Academy of Sciences, Beijing 100049, China

Lei Wu

[email protected]

School of Physics Science and Technology, Nanjing Normal University, Nanjing, 210023, China

Jin Min Yang

[email protected]

CAS Key Laboratory of Theoretical Physics, Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing 100190, China

School of Physics, University of Chinese Academy of Sciences, Beijing 100049, China

Department of Physics, Tohoku University, Sendai 980-8578, Japan

Abstract

The top-Higgs coupling plays an important role in particle physics and cosmology. The precision measurements of this coupling can provide an insight to new physics beyond the Standard Model. In this paper, we propose to use Message Passing Neural Network (MPNN) to reveal the CP nature of top-Higgs interaction through semi-leptonic channel $pp\to t(\to b\ell^{-}\nu_{\ell})\bar{t}(\to\bar{b}jj)h(\to b\bar{b})$ . Using the test statistics constructed from the event classification probabilities given by the MPNN, we find that the pure CP-even and CP-odd components can be well distinguished at the LHC, with at most 300 fb*-1* experimental data.

I Introduction

The discovery of Higgs boson Aad et al. (2012); Chatrchyan et al. (2012) is a great step in the long quest to the origin of mass. The precision measurement of Higgs couplings is one of the main goals of the future LHC experiment, which will further reveal the electroweak symmetry breaking mechanism and shed light on new physics beyond the Standard Model (SM). Among these couplings, the top-Higgs interaction is particularly interesting. On one hand, due to its large size, this coupling dominantly contributes to the renormalization group evolution of the Higgs potential and thus plays an unique role in determining the scale of new physics. On the other hand, the direct extraction of this coupling from the QCD process $pp\to t\bar{t}H$ is very challenging since it has a small production rate and rather complicated final states. Very recently, the ATLAS and CMS collaborations have reported the observation of $t\bar{t}H$ production in several Higgs decay channels, such as $\gamma\gamma$ and $W^{+}W^{-}$ , at the 13 TeV LHC Aaboud et al. (2018); Sirunyan et al. (2018). The LHC Run-3 with a higher luminosity will have great potential to decipher the structure of top-Higgs coupling.

The general top-Higgs interaction can be parameterized as

[TABLE]

where $y_{t}=\sqrt{2}m_{t}/v$ and $\xi=0$ in the SM Aguilar-Saavedra (2009), with $v=174$ GeV being the vacuum expectation value of the Higgs field. The presence of $\sin\xi$ term leads to the CP violation in top-Higgs coupling. It will affect the Higgs production and decay channels, the electric dipole moments and the flavor physics observables, which are measured to well agree with the SM and then set strong constraints on the CP-violating top-Higgs coupling Cirigliano et al. (2016); Kobakhidze et al. (2017). However, these indirect bounds are model-dependent and can be evaded in some extensions of the SM. Therefore, the most robust test of top-Higgs coupling in Eq. (1) is from the direct measurement of $t\bar{t}H$ production at colliders. Several observables, such as $t\bar{t}$ spin correlation and charge asymmetry in $t\bar{t}H$ production, are constructed to probe the CP violating top-Higgs coupling at the LHC Gunion and He (1996); Ellis et al. (2014); Bramante et al. (2014); Demartin et al. (2014); Aguilar-Saavedra et al. (2015); Godbole et al. (2015); Buckley and Goncalves (2016); Li et al. (2016, 2018); Cao et al. (2018).

On the other hand, applying machine learning (ML) techniques, exceptional performance can be achieved with object level kinematic variables from jets, leptons, and photons to separate the signal from the background Bertone et al. (2016); Baldi et al. (2014, 2015); Bridges et al. (2011); Buckley et al. (2012); Bornhauser and Drees (2013); Caron et al. (2017). A successful example in this aspect is the use of boosted decision trees in the LHC experiment that led to the discovery of Higgs boson Roe et al. (2005). Since a collision event can be seen as a geometrical pattern formed by a number of final state objects, graph is a natural way to represent the events in mathematical language, which can be efficiently analyzed by an appropriate ML approach. Among the ML algorithms to deal with graphs, the Message Passing Neural Networks (MPNNs) Gilmer et al. (2017) are particularly suited for graph classification and flexible enough as the original Graph Neural Networks (GNNs) Gori et al. (2005); Scarselli et al. (2009) as nonlinear end-to-end models that relate the target output to the input graphs. An MPNN consists of a number of learnable functions acting on the graph nodes, and can be efficiently trained using supervised learning techniques. So far, the MPNNs have been successfully applied in supersymmetry exploration Abdughani et al. (2018), jet physics Henrion et al. (2017) and other fields Gilmer et al. (2017).

In this paper, we attempt to use MPNN to investigate the CP nature of top-Higgs coupling in Eq. (1). We design and train a specific MPNN to classify the collider events, and then perform a hypothesis test based on the variable constructed from the output of the MPNN. As a proof of concept, we focus on the semi-leptonic top decay channel of the process $pp\to t\bar{t}H(\to b\bar{b})$ at the LHC.

II Methods

For convenience, we denote the Higgs boson with CP-even $t\bar{t}H$ coupling ( $\xi=0$ ) as $h$ , and the one with CP-odd $t\bar{t}H$ coupling ( $\xi=\pi/2$ ) as $A$ . At the LHC, the $t\bar{t}h$ and $t\bar{t}A$ signals have the same background events dominantly coming from the process $pp\to t\bar{t}b\bar{b}$ .

We choose event graphs Abdughani et al. (2018) as the representation of collider events, and design an MPNN specific to the classification of the collider events, whose outputs are the probabilities of the input event graph being $t\bar{t}h$ , $t\bar{t}A$ and $t\bar{t}b\bar{b}$ event, respectively. Then, we construct a variable from the output of MPNN and perform a hypothetical test.

II.1 Event graph

As the input of MPNN, we represent each of the collider events as an event graph. FIG. 1, as illustration, shows an event graph for a specific simulated $t\bar{t}h$ event.

For a given collider event, the nodes in the graph are used to represent the final state objects, including the reconstructed photons, leptons, jets and missing transverse momentum (MET). Each node has a compact seven dimensional feature vector $\bm{x}_{i}=(I_{1},I_{2},I_{3},I_{4},p_{T},E,m)$ to describe the major properties of the corresponding final state. Except the transverse momentum $p_{T}$ , energy $E$ and mass $m$ , the first four features are indicators of the type of final state: (1) it is a photon ( $I_{1}=1$ ) or not ( $I_{1}=0$ ); (2) it is a lepton ( $I_{2}=$ charge) or not ( $I_{2}=0$ ); (3) It is a $b$ -jet ( $I_{3}=1$ ), light jet ( $I_{3}=-1$ ) or not a jet ( $I_{3}=0$ ); (4) it is the MET ( $I_{4}=1$ ) or not ( $I_{4}=0$ ).

Each pair of nodes is linked by an edge, which is weighted by the geometrical distance between the corresponding pair of final states. We choose to use $d_{ij}=\sqrt{\Delta(\eta_{i},\eta_{j})^{2}+\Delta(\phi_{i},\phi_{j})^{2}}$ to measure the geometrical distance between two final states $i$ and $j$ , where $\eta$ and $\phi$ are the pseudo-rapidity and azimuthal angle, respectively.

Notice that the differential cross section of a collider event is invariant with the rotation of the whole event along the beam. To respect such an important geometrical symmetry of collider event, we exclude the information of azimuthal angle from the node features, and only encode the difference of azimuthal angles in the edge weights. In such a design, the event representation and classification will be stable, regardless of the rotation of event along the beams. Note also that, (1) the number of nodes in an event graph depends on specific collider event, (2) there is no ordering of nodes, and (3) the data in event graphs are exact. These are the main differences between event graph representation and other collider event representations used as input for ML models.

II.2 Network architecture

The architecture of our MPNN is shown in FIG. 2. It has one node embedding layer, two message passing and node update layers and one output layer.

The node embedding layer embeds each node feature vector $\bm{x}_{i}$ into a higher dimensional node state vector $\bm{s}_{i}^{0}$ by applying a linear transformation and the rectified linear unit (ReLU) activation function,

[TABLE]

where $W_{e}$ and $\bm{b}_{e}$ are learnable parameters. The state vector $\bm{s}_{i}^{0}$ only encodes the node features $\bm{x}_{i}$ without any information about the geometrical pattern of the graph.

In the following two message passing and node update layers, the nodes exchange information contained in their state vectors by passing messages. At layer $t$ , each node $i$ collects the messages sent from each nodes $j$ and then update its state vector,

[TABLE]

where the brackets denote vector concatenation, $W$ s and $\bm{b}$ s are learnable parameters in each layer. Note that, to make the edge weight $d_{ij}$ more suitable in linear transformation, we expand it onto 21 Gaussian bases to form a weight vector $\hat{\bm{d}}_{ij}$ , whose components are

[TABLE]

where $\mu_{k}$ are linearly distributed in range [0, 5] and $\sigma=0.25$ . The message passing mechanism is the key to automatically extract features of the input event graph, which can efficiently disseminate the information among all the nodes taking into account the connections between the nodes. After the two message passing and node update layers, each node state vector can be viewed as an encoding of the whole event graph.

In the output layer, each node $i$ produces three probabilities $\bm{p}_{i}$ by applying a linear transformation and the softmax activation function on its state vector $\bm{s}_{i}$ ,

[TABLE]

To stablize the classification performance, we average the output over all the nodes as the final output of MPNN,

[TABLE]

where $N$ is the number of nodes in the input event graph. The three components of $p$ are the probabilities of the single input event graph $e$ being the $t\bar{t}h$ , $t\bar{t}A$ and $t\bar{t}b\bar{b}$ event, respectively, denoted as $p(h|e)$ , $p(A|e)$ and $p(b|e)$ .

It is worth to note that the MPNN is a dynamic neural network, which can be viewed as a stack of several learnable transformations acting on each signal or pair of graph nodes. Therefore, MPNN intrinsically scales with the size of input event graph.

II.3 Training

The MPNN can be efficiently trained using supervised learning. We choose cross entropy as the loss function. The gradients of loss to learnable parameters are evaluated on each mini-batch of 500 examples. The learnable parameters are optimized using the ADAM optimizer Kingma and Ba (2014) with a fixed learning rate of 0.001. The training is performed up to 300 epochs and we choose the MPNN parameters which lead to the best generalization performance (minimum loss) on the validation set. All the above are implemented with the open-source deep learning framework PyTorch http://pytorch.org/ with extensive GPU acceleration.

II.4 Hypothesis test

If the top-Higgs coupling is CP-even, the event sample collected in experiments will come from the $t\bar{t}h$ process plus the dominate $t\bar{t}b\bar{b}$ background process. Otherwise, it consists of a mixing of $t\bar{t}A$ and $t\bar{t}b\bar{b}$ events. Therefore, we define variables that can discriminate event samples of the two scenarios. From the single-event probabilities output from the MPNN, we construct two likelihoods

[TABLE]

to measure the consistence of a given event sample $D$ with each of the two scenarios. In the CP-even scenario, $L_{h}(D)\gg L_{A}(D)$ ; otherwise in the CP-odd scenario, $L_{h}(D)\ll L_{A}(D)$ . It is worth to note that the productions only run over events with $p(h|e)$ and $p(A|e)$ larger than $p(b|e)$ . Namely, we exclude the background-like events in the evaluation, which can effectively reduce the contamination of background.

To perform a hypothesis test, here we choose to use the log-likelihood ratio

[TABLE]

as the test statistics. The distribution of $\ln Q$ in the two scenarios, denoted as $f_{h}$ and $f_{A}$ , respectively, can be numerically obtained by evaluating a large number of random simulated event samples, namely, performing pseudo experiments.

Because actually generating a huge number of simulated events can be extremely time-consuming, we adopt the bootstrap technique. First, for each process $X$ , we generate a simulated event dataset, in which the number of events is large enough and the events have equal weights. Then, we construct event samples of process $X$ by randomly sampling the corresponding dataset with replacement. The number of events $n$ in an event sample obeys Poisson distribution $P(n|\lambda)$ with the average number of events $\lambda=\epsilon_{X}\sigma_{X}L$ , where $\sigma_{X}$ is the production cross section of process $X$ , $L$ is the integrated luminosity and $\epsilon_{X}$ is the event selection efficiency for process $X$ . In the CP-even (odd) scenario, a pseudo experimental event samples is the union of a $t\bar{t}h$ ( $t\bar{t}A$ ) sample and a $t\bar{t}b\bar{b}$ sample.

Given the distributions of $\ln Q$ in the two scenarios and the $\ln Q^{*}$ calculated from the observed experimental data $D^{*}$ , as shown in FIG. 3, we can evaluate the $p$ -values of rejecting the CP-even scenario and the CP-odd scenario by the integrals

[TABLE]

respectively.

III Results

Simulated events of the $t\bar{t}h$ , $t\bar{t}A$ and $t\bar{t}b\bar{b}$ processes are generated separately using MadGraph5 Alwall et al. (2014) at 13 TeV LHC. Showering and hadronization are performed by Pythia8 Sjöstrand et al. (2015). Detector simulation is done by Delphes de Favereau et al. (2014) with ATLAS configuration. CheckMATE2 Drees et al. (2015) is used to perform event selection. Leptons are detected within $p_{T}>20$ GeV and $|\eta|<2.5$ , and jets are required to have $p_{T}>25$ GeV and $|\eta|<2.5$ . B-tagging is performed with 60% nominal efficiency. We focus on the semi-leptonic channel, requiring events to have exactly one lepton, four $b$ -jets and at least two light jets in the final states.

After event selection, the detection cross sections $\epsilon\sigma$ for the $t\bar{t}h$ , $t\bar{t}A$ and $t\bar{t}b\bar{b}$ processes are 3.78, 1.82 and 27.5 fb, respectively. We collect 900,000 examples with balanced number of $t\bar{t}h$ , $t\bar{t}A$ and $t\bar{t}b\bar{b}$ events as the training set for optimizing the parameters in our MPNN, while another 300,000 examples are collected as the validation set for performance evaluation.

We show in FIG. 4 the distributions of the output of our trained MPNN evaluated on the validation set. It is clear that the MPNN has successfully learned important discriminative event features for different processes. The background events are prone to have higher $p(b|e)$ , while the $t\bar{t}h$ , $t\bar{t}A$ events get higher $p(h|e)$ and $p(A|e)$ , respectively.

We perform millions of pseudo-experiments for each of the two scenarios. The pseudo experiment events are taken from the validation set. In FIG. 5, we show the probability distributions of the log-likelihood ratio $\ln Q$ from pseudo-experiments (left panel) and receiver operating characteristic (ROC) curves of the hypothesis test (right panel) for different values of integrated luminosity. With the increase of luminosity, we can see that the overlap between the two distributions reduces significantly and the ROC curves will be closer to the corner. When the integrated luminosity reaches 300 fb*-1*, the two distributions will be almost separated. This indicates CP-even and CP-odd components can be well distinguished at the LHC with at most 300 fb*-1* experimental data.

IV Acknowledgements

We thank the helpful discussions with Dr. Andrew Fowlie. This work was supported by the National Natural Science Foundation of China (NNSFC) under grant Nos. 11705093, 11675242, 11821505 and 11851303, by Peng-Huan-Wu Theoretical Physics Innovation Center (11747601), by the CAS Center for Excellence in Particle Physics (CCEPP), by the CAS Key Research Program of Frontier Sciences and by a Key R&D Program of Ministry of Science and Technology under number 2017YFA0402200-04.

Bibliography36

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Aad et al. (2012) G. Aad et al. (ATLAS), Phys. Lett. B 716 , 1 (2012) , ar Xiv:1207.7214 [hep-ex] . · doi ↗
2Chatrchyan et al. (2012) S. Chatrchyan et al. (CMS), Phys. Lett. B 716 , 30 (2012) , ar Xiv:1207.7235 [hep-ex] . · doi ↗
3Aaboud et al. (2018) M. Aaboud et al. (ATLAS), Phys. Lett. B 784 , 173 (2018) , ar Xiv:1806.00425 [hep-ex] . · doi ↗
4Sirunyan et al. (2018) A. M. Sirunyan et al. (CMS), Phys. Rev. Lett. 120 , 231801 (2018) , ar Xiv:1804.02610 [hep-ex] . · doi ↗
5Aguilar-Saavedra (2009) J. A. Aguilar-Saavedra, Nucl. Phys. B 821 , 215 (2009) , ar Xiv:0904.2387 [hep-ph] . · doi ↗
6Cirigliano et al. (2016) V. Cirigliano, W. Dekens, J. de Vries, and E. Mereghetti, Phys. Rev. D 94 , 016002 (2016) , ar Xiv:1603.03049 [hep-ph] . · doi ↗
7Kobakhidze et al. (2017) A. Kobakhidze, N. Liu, L. Wu, and J. Yue, Phys. Rev. D 95 , 015016 (2017) , ar Xiv:1610.06676 [hep-ph] . · doi ↗
8Gunion and He (1996) J. F. Gunion and X.-G. He, Phys. Rev. Lett. 76 , 4468 (1996) , ar Xiv:hep-ph/9602226 [hep-ph] . · doi ↗