Search for pair production of vector-like quarks in the fully hadronic final state
CMS Collaboration

TL;DR
This paper reports on two searches for vector-like T and B quark pair production in fully hadronic final states at 13 TeV, setting new mass limits and improving upon previous results using advanced analysis techniques.
Contribution
It introduces a novel cut-based and a multiclassification analysis approach to search for vector-like quarks in fully hadronic final states, enhancing sensitivity and setting new mass exclusion limits.
Findings
Mass limits for T and B quarks range from 740 to 1370 GeV.
The analyses achieve improved sensitivity over previous searches.
The use of the boosted event shape tagger enhances jet classification accuracy.
Abstract
The results of two searches for pair production of vector-like T or B quarks in fully hadronic final states are presented, using data from the CMS experiment at a center-of-mass energy of 13 TeV. The data were collected at the LHC during 2016 and correspond to an integrated luminosity of 35.9 fb. A cut-based analysis specifically targets the bW decay mode of the T quark and allows for the reconstruction of the T quark candidates. In a second analysis, a multiclassification algorithm, the "boosted event shape tagger," is deployed to label candidate jets as originating from top quarks, and W, Z, and H. Candidate events are categorized according to the multiplicities of identified jets, and the scalar sum of all observed jet momenta is used to discriminate signal events from the quantum chromodynamics multijet background. Both analyses probe all possible branching fraction…
| (3) |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
\cmsNoteHeader
B2G-18-005
\RCS
\RCS
\cmsNoteHeader
B2G-18-005
Search for pair production of vector-like quarks in the fully hadronic final state
Abstract
The results of two searches for pair production of vector-like \PQTor \PQBquarks in fully hadronic final states are presented, using data from the CMS experiment at a center-of-mass energy of 13\TeV. The data were collected at the LHC during 2016 and correspond to an integrated luminosity of 35.9\fbinv. A cut-based analysis specifically targets the \cPqb\PW decay mode of the \PQTquark and allows for the reconstruction of the \PQTquark candidates. In a second analysis, a multiclassification algorithm, the “boosted event shape tagger,” is deployed to label candidate jets as originating from top quarks, and \PW, \PZ, and \PH. Candidate events are categorized according to the multiplicities of identified jets, and the scalar sum of all observed jet momenta is used to discriminate signal events from the quantum chromodynamics multijet background. Both analyses probe all possible branching fraction combinations of the \PQTand \PQBquarks and set limits at confidence level on their masses, ranging from 740 to 1370\GeV. These results represent a significant improvement relative to existing searches in the fully hadronic final state.
0.1 Introduction
With the discovery of a light Higgs boson (\PH) by the ATLAS and CMS Collaborations in 2012 [1, 2, 3], the standard model (SM) is complete as a low-energy effective theory describing all known fundamental particles and their interactions. However, several questions still remain with the theory, for example, why the mass of the observed Higgs boson is 125\GeV, whereas quantum loop corrections would be expected to drive the mass up towards the Planck scale. Many models of new physics beyond the SM predict additional particles that can affect the quantum corrections to the Higgs boson mass and resolve this so-called “hierarchy problem”. New states proposed include new particles such as supersymmetric partners of SM particles, or fourth-generation quarks.
Chiral fourth-generation quarks, \PQtpror \PQbpr, with identical properties to the SM third-generation \cPqt and \cPqb quarks, but with larger masses, are effectively excluded because of their impact on the Higgs boson production cross section. However, many models of new physics, such as those predicting a composite Higgs boson [4, 5, 6, 7, 8], or “little-Higgs” models [9, 10], include fourth-generation particles of a new type, called vector-like quarks (VLQs), labeled \PQTand B, having electric charges of and , respectively. These VLQs do not obtain their mass via the Higgs boson Yukawa coupling, and will not affect the values of the Higgs boson production cross section or decay width. Therefore, these are viable search candidates for the LHC experiments, and are predicted to have masses at the \TeVnsscale [11], allowing the hierarchy problem to be resolved. We do not search for the related X and Y particles.
The VLQs are called “vector-like” because their left-handed and right-handed chiralities transform under the same symmetry group of the SM electroweak gauge bosons. This leads to several decay modes of the VLQs, through charged- and neutral-current interactions. Although decays to light first- and second-generation quarks are possible, the dominant decay modes of the VLQs are to third-generation SM quarks[12]. The possible decay modes of the VLQs to the third-generation quarks are as follows (charge-conjugate modes implied):
[TABLE]
Specific model assumptions can influence the proportions of these VLQ decay modes. Both single and pair production of VLQs are possible, with single production dominating at larger VLQ masses (2\TeV), while single and pair production rates are comparable for VLQ masses 1\TeV. This analysis considers only the pair production of VLQs.
Both the ATLAS and CMS Collaborations have recently presented searches for pair production of VLQs. The CMS Collaboration has searched for \PQTand \PQBquarks in the dilepton final state, targeting VLQ decays to \PZ bosons [13], and excluding \PQT(\PQB) quark masses up to 1280 (1130)\GeV. An analysis from CMS including single-lepton, dilepton, and multilepton final states [14] probes all decay modes of the VLQs, and excludes \PQTquark masses in the range 1140–1300\GeVand \PQBquark masses in 910–1240\GeV, depending on the combination of the VLQ branching fractions. Finally, a CMS result optimized for the \cPqb\PW\cPqb\PW channel, using single-lepton final states, excludes \PQTquark masses up to 1295\GeV[15]. The ATLAS Collaboration has recently presented a search for VLQ pair production in the fully hadronic channel, with sensitivity to all possible decay modes of the VLQs [16]. This analysis most strongly excludes \PQTand \PQBquarks when they decay to Higgs bosons, with mass exclusion limits of 1010\GeV. The ATLAS Collaboration has also performed a combination of searches utilizing various final states, resulting in mass exclusion limits of up to 1370\GeV[17].
In this paper, we describe two independent analyses targeting pair production of vector-like quarks in fully hadronic final states. We first present an analysis that employs a traditional strategy, utilizing \PW boson tagging and \cPqb quark tagging algorithms. This analysis specifically targets the \cPqb\PW decay mode of the \PQTquark, but is used to evaluate sensitivity to all possible decays of the \PQTor \PQBquark, and is referred to as the “cut-based analysis”. The second analysis uses a novel machine learning technique to identify and classify different varieties of Lorentz-boosted particles that originate from VLQ decays. This strategy allows the analysis to target all the decay modes of the \PQTor \PQBquark. We refer to this analysis as the “NN (neural network) analysis”.
The cut-based analysis uses dedicated algorithms to identify efficiently jets consistent with \PW bosons and the hadronization of \cPqb quarks. These algorithms allow the reconstruction of each VLQ \PQTquark present in the event, providing a mechanism to reduce further the contribution of background processes. At least four jets are required to be present, and events are classified according to the number of jets that are identified as being consistent with a \PW boson, to obtain signal regions of varying signal purities. The distribution, defined as the scalar sum of jet transverse momenta (\pt), is used for signal discrimination in each category. The NN analysis uses a neural network algorithm with a multiple-class output to identify jets as consistent with one of six distinct decay topologies from highly boosted particles: top quark, \PW boson, \PZ boson, Higgs boson, \cPqb quark, and light \cPqu/\cPqd/\cPqs/\cPqc quark or gluon (denoted “light jets”). Events with exactly four jets are considered for the analysis, which is the expected final state for fully hadronic decays of VLQ pairs, as seen in Eq. 1. The multiplicities of jets falling into each of the six categories are used to define 126 independent signal regions, in which the value of is used to discriminate signal from the expected background processes.
The main background contribution in these fully hadronic final states comprises multijet events from quantum chromodynamics (QCD) processes. Techniques based on control samples in data are used to predict the expected QCD multijet background yield and shape. In the cut-based analysis, control regions are used to measure QCD multijet background yields and shapes, which are then extrapolated to the signal regions. In the NN analysis, misidentification rates for each of the six categories of jets considered in the multiclassification algorithm are used to predict the level of contribution of multijet events in the signal regions. Each method is validated using samples of observed and simulated events.
The paper has the following structure. Section 0.2 provides a description of the CMS detector and trigger system. The event reconstruction, including jet reconstruction, jet substructure, and the multiclassification algorithm used in the NN analysis, is described in Section 0.3. The data sets and simulated samples used are presented in Section 0.4. Information about the definition of the signal and control regions is included in Section 0.5. The methods employed to predict the QCD multijet background from data for each analysis are explained in Section 0.6, and details of the systematic uncertainties affecting the analyses are itemized in Section 0.7. Signal region yields and distributions are given in Section 0.8, and the statistical analysis used to extract the results is described in Section 0.9. Finally, the results of the two analyses are presented in Section 0.10, and a summary is given in the last section.
0.2 The CMS detector
The central feature of the CMS apparatus is a superconducting solenoid of 6\unitm internal diameter, providing a magnetic field of 3.8\unitT. Within the solenoid volume are a silicon pixel and strip tracker, a lead tungstate crystal electromagnetic calorimeter (ECAL), and a brass and scintillator hadron calorimeter (HCAL), each composed of a barrel and two endcap sections. Forward calorimeters extend the pseudorapidity () coverage provided by the barrel and endcap detectors. Muons are detected in gas-ionization chambers embedded in the steel flux-return yoke outside the solenoid.
Events of interest are selected using a two-tiered trigger system [18]. The first level, composed of custom hardware processors, uses information from the calorimeters and muon detectors to select events at a rate of around 100\unitkHz within a time interval of less than 4\mus. The second level, known as the high-level trigger, consists of a farm of processors running a version of the full event reconstruction software optimized for fast processing, and reduces the event rate to around 1\unitkHz before data storage.
A more detailed description of the CMS detector, together with a definition of the coordinate system used and the relevant kinematic variables, can be found in Ref. [19].
0.3 Event reconstruction
To reconstruct and identify each individual particle in an event, a “particle-flow algorithm” [20] that uses an optimized combination of information from the various elements of the CMS detector is employed. The energy of photons is obtained from the ECAL measurement. The energy of electrons is determined from a combination of the electron momentum at the primary interaction vertex as determined by the tracker, the energy of the corresponding ECAL cluster, and the energy sum of all bremsstrahlung photons spatially compatible with originating from the electron track. The energy of muons is obtained from the curvature of the corresponding track. The energy of charged hadrons is determined from a combination of their momentum measured in the tracker and the matching ECAL and HCAL energy deposits, corrected for zero-suppression effects and for the response function of the calorimeters to hadronic showers. Finally, the energy of neutral hadrons is obtained from the corresponding corrected ECAL and HCAL energy.
The reconstructed vertex with the largest value of summed physics-object is taken to be the primary proton-proton () interaction vertex. Here the physics objects are the jets, clustered using the jet finding algorithm [21, 22] with the tracks assigned to the vertex as inputs, and the associated missing transverse momentum, taken as the negative vector \ptsum of those jets.
The output of the particle-flow algorithm provides a list of particles that are used as inputs to the jet finding algorithm. Charged hadrons that are not associated with the primary interaction vertex are removed before jet finding to mitigate the effects of additional (“pileup”) interactions occurring in the same or neighboring bunch crossings as the interaction of interest. The anti-\ktclustering algorithm [21] is used, as implemented in the FastJet software package [22], to produce two collections of jets, the first obtained with a distance parameter of (AK4 jets), and the second obtained with (AK8 jets), where is the radius of the jet in the plane (where is the azimuthal angle). The AK8 jets are used to identify the hadronic decays of massive SM particles, including top quarks, and \PW, \PZ, and Higgs bosons, while the AK4 jets are used to identify other hadronic activity in the event. The cut-based analysis uses both AK4 and AK8 jets, while the NN analysis only uses the AK8 jets for analysis.
The jet momentum is determined as the vectorial sum of all particle momenta in the jet, and is found from simulation to be within 5 to 10% of the true momentum over the whole \ptspectrum and detector acceptance. Pileup interactions can contribute additional tracks and calorimetric energy deposits, increasing the apparent jet momentum. To mitigate this effect, tracks identified to be originating from pileup vertices are discarded and an offset correction is applied to correct for remaining contributions. Jet energy corrections are derived from simulation studies so that the average measured response of jets becomes identical to that of particle-level jets. In situ measurements of the momentum balance in dijet, photon+jet, \PZ+jet, and multijet events are used to determine any residual differences between the observed and simulated jet energy scale, and to derive appropriate corrections [23]. Additional selection criteria are applied to each jet to remove jets potentially dominated by instrumental effects or reconstruction failures. The jet energy resolution is approximately 15% at 10\GeV, 8% at 100\GeV, and 4% at 1\TeV.
0.3.1 Jet substructure
To identify the hadronic decays of highly Lorentz-boosted objects, including top quarks, and \PW, \PZ, and \PH, jet substructure information provides powerful discrimination from massive jets originating from QCD multijet production.
The mass of the jet itself can discriminate QCD jets from boosted heavy objects. A grooming algorithm is applied to jet constituents to better estimate the mass of the originating particle of the jet. In the algorithm used, the constituents of the AK8 jets are reclustered using the Cambridge–Aachen algorithm [24, 25]. The “modified mass drop tagger” algorithm [26], also known as the “soft-drop” (SD) algorithm, with angular exponent , soft cutoff threshold , and characteristic radius [27], is applied to remove soft, wide-angle radiation from the jet. The SD mass () is used to determine the consistency of a jet with a given boosted heavy object.
In addition to the , information about the distribution of particles within the jet can be used for further discrimination. A quantity called “-subjettiness” [28, 29] is used to determine the consistency of a jet with or fewer subjets. The -subjettiness values are defined as
[TABLE]
where the index refers to each jet constituent, and is the angular distance between a jet constituent and a candidate subjet axis. The quantity is a normalization constant. To identify boosted top quarks, the quantity is used to target the expected three-subjet signature, while for \PW, \PZ, and Higgs bosons, the quantity is used because of the expected two-subjet decay topology.
Jets originating from bottom quarks, which hadronize and subsequently decay, are selected with an algorithm to identify and reconstruct displaced vertices, along with their associated tracking information. Known as the combined secondary vertex algorithm (CSVv2) [30], it provides several working points of varying efficiencies and misidentification rates. In the cut-based analysis, the CSVv2 algorithm is applied to AK4 jets using a working point corresponding to a misidentification probability in simulated \ttbar events of 0.01 for \cPqu/\cPqd/\cPqs/\cPg jets and an efficiency for identifying genuine \cPqb jets of approximately 0.63. In the NN analysis, the CSVv2 algorithm is applied to the subjets of the AK8 jets to increase the categorization efficiency for decays of top quarks, \PZand Higgs bosons, which can have one or more displaced vertices within the jet. A CSVv2 working point is not explicitly used in the NN analysis, however the output value of the CSVv2 discriminator for each subjet is used as an input to the multiclassification algorithm to categorize jets.
In the cut-based analysis, a working point for identifying merged decay products of a highly-boosted \PW boson in a single jet (\PW tagging) is chosen. To be considered for \PW tagging, an AK8 jet must have \GeV. The jet must satisfy \GeVand to be \PW tagged. This working point corresponds to an efficiency of about 0.50 to identify genuine \PW jets and a misidentification probability of about 0.03 [31]. Because of an observed dependence of on the \PW jet momentum, an additional correction is applied to ensure the \PW tagged jet peak is stable and the \PW tagging efficiency remains roughly constant as a function of jet momentum.
0.3.2 Boosted event shape tagger (BEST) algorithm
The NN analysis does not focus on a single VLQ decay mode and thus the expected signatures can contain various combinations of top and bottom quarks along with \PW, \PZ, and \PH. Using standard cut-based working points for each type of particle leads to complications with overlaps in selection criteria when considering many different final states simultaneously. For this reason, a new algorithm is used that simultaneously attempts to identify six categories of jets: \cPqt, \PW, \PZ, \PH, \cPqb, and light jets. The algorithm is called the boosted event shape tagger (BEST) algorithm, as first detailed in Ref. [32], and uses hypothesized reference frames to determine the consistency of a jet with the expected topology from top quark, \PW, \PZ, \PH decays, \cPqb quark and light jets. The algorithm uses a neural network to classify jets according to one of those six possibilities. The NN analysis presented here is the first CMS result to use the BEST algorithm.
The BEST algorithm relies on the fact that jets from very high energy (“highly boosted”) heavy-particle decays will have a distinct topology in the rest frame of the decaying object. For example, the decay of a highly boosted \cPqt quark produces three collimated particles in the laboratory frame, but in the rest frame of the \cPqt quark, the three distinct jet directions lie in a plane. By Lorentz-boosting the particles or constituents in a jet back to the rest frame, it can be seen whether the distribution of particles is consistent with that expected from a top quark decay. This boost transformation is applied four different times to obtain four sets of jet constituents. The boost transformation is performed assuming the jet originates from a top quark, \PW, \PZ, or \PH, after forming the boost vector by using the jet four-vector with the mass altered to be that of the particle under consideration, while keeping the jet momentum constant.
The sets of jet constituents resulting from each boost transformation are used to compute kinematic quantities, including Fox–Wolfram moments [33], aplanarity, sphericity, and isotropy, based on the eigenvalues of the sphericity tensor [34], and the jet thrust [35]. In each boosted reference frame, jet constituents are reclustered to obtain a set of objects relative to the transformed jet axis. These objects are used to compute the longitudinal asymmetry, defined as the ratio of the longitudinal-component sum of the momenta to the \ptsum of this set of objects. This ratio gives another way to compute the isotropy of constituents that is expected for a jet consistent with one of the hypothesized particles. Additionally, the jet , jet , charge, , , and subjet CSVv2 scores from the original jet reference frame are used. In total, 59 kinematic quantities from the original and transformed sets of constituents are used as inputs to a deep neural network to discriminate between the different jet species. These kinematic quantities are validated by examining distributions in data and simulated events, where good agreement in shape is observed.
The BEST neural network is trained using samples of simulated AK8 jets that originate from the decay of heavy resonances and that correspond to the final state objects (\cPqt, \PW, \PZ, \PH, \cPqb, or light jets). The jets in the training sample are matched to the object of interest using the generator-level information. Samples with heavy resonance masses from 1 to 4\TeVare used to populate the jet \ptrange from 0.4 to 2\TeV. The neural network is trained using the Python-based scikit-learn package, using the MLPClassifier module [36]. The network architecture consists of 3 hidden layers with 40 nodes in each layer using a rectified-linear activation function. There are six output nodes, corresponding to the six particle species of interest. A sample of 500 000 jets is used to train the network, split evenly between the six training samples. The six outputs from the network represent probabilities for the jet to originate from the corresponding particle. The classification of an AK8 jet is chosen according to the output node with the highest probability. Several validation studies have been performed in different samples of data events enriched in different types of processes: a muon+jets sample containing boosted top quarks and boosted \PW bosons, a sample containing events from QCD processes enriched in gluon-initiated jets, and a sample of photon+jets events enriched in quark-initiated jets. In each of these samples, we find good agreement in the shape and rate of the BEST neural network inputs, as well as the output probabilities [37].
0.4 Data set and simulated samples
Both the cut-based and NN analyses use the data set collected by the CMS experiment at the CERN LHC in 2016, corresponding to an integrated luminosity of collisions of 35.9\fbinv. Events in the cut-based analysis are selected online using a trigger algorithm requiring an value of at least 800\GeV, or 700\GeVif a jet with mass above 50\GeVis present. Events are also selected by another two triggers, which require a single jet with either or 360\GeVwith a mass above 30\GeV. The above trigger selection is measured to be fully efficient for the signal regions, with corrections applied for percent-level inefficiencies in control regions. Events in the NN analysis are selected online using the above trigger algorithms in combination with all other algorithms requiring multijet topologies. The trigger requirements for the NN analysis are fully efficient in the signal and control regions, because of the higher jet momenta considered.
Methods utilizing data are employed to estimate the dominant background from QCD multijet production, however, samples of simulated events are used to validate the background estimation techniques described in Section 0.6. These samples of QCD multijet events are generated at leading order with \PYTHIA [38, 39].
Simulated events are used to model the subdominant background contributions. The largest of these in both analyses comes from the SM pair production of top quarks, generated at next-to-leading order (NLO) with \POWHEGv2 [40, 41] and showered with \PYTHIA8.212, using the event tune CUETP8M2T4 [42]. The production of a \PW or \PZ boson in association with additional jets, where the \PW/\PZ boson decays to quarks, is generated at leading-order (LO) with \MGvATNLO2.2.2 [43, 44] and also showered with \PYTHIA8.212. Diboson events (\PW\PW, \PW\PZ, \PZ\PZ) are generated at LO with \PYTHIA, and rare top quark production processes (\ttbar\PW, \ttbar\PZ, \ttbar\ttbar) are generated at NLO with \MGvATNLOand showered with \PYTHIA. Background contributions from Higgs boson production in the dominant gluon fusion mode with decays to \bbbarand \PWp\PWmare included via events generated with \MGvATNLOplus \PYTHIAand \POWHEGv2 + \PYTHIA, respectively. Backgrounds other than \ttbarusing \PYTHIAuse the CUETP8M1 event tune [45]. The cut-based analysis considers only the \ttbarand \PW+jets background contributions. Other processes such as \PZ+jets were measured to contribute at only the level to the total background expectation, and therefore were not further investigated.
Event samples of pair-produced vector-like \PQTand \PQBquarks, with masses ranging from 0.7 to 1.8\TeVin increments of 100\GeV, are generated at LO using \MGvATNLO[46] + \PYTHIA. They are inclusive with respect to the VLQ decay mode, and are generated with equal branching fractions for \PQT/\PQBquark decays to each of the three modes (\cPqt\PH/\cPqb\PH, \cPqt\PZ/\cPqb\PZ, \cPqb\PW/\cPqt\PW). Events are weighted to produce results for different combinations of branching fractions, and are normalized to theoretical cross section expectations calculated at the next-to-next-to-leading order (NNLO), including next-to-leading-logarithmic order soft-gluon resummation, with Top++2.0 [47], as listed in Table 0.4.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] ATLAS Collaboration, “Observation of a new particle in the search for the standard model Higgs boson with the ATLAS detector at the LHC”, Phys. Lett. B 716 (2012) 1, 10.1016/j.physletb.2012.08.020 , ar Xiv:1207.7214 . · doi ↗
- 2[2] CMS Collaboration, “Observation of a new boson at a mass of 125 Ge V with the CMS experiment at the LHC”, Phys. Lett. B 716 (2012) 30, 10.1016/j.physletb.2012.08.021 , ar Xiv:1207.7235 . · doi ↗
- 3[3] CMS Collaboration, “Observation of a new boson with mass near 125 Ge V in pp pp \mathrm{pp} Collisions at s 𝑠 \sqrt{s} = 7 and 8 Te V”, JHEP 06 (2013) 081, 10.1007/JHEP 06(2013)081 , ar Xiv:1303.4571 . · doi ↗
- 4[4] R. Contino, L. Da Rold, and A. Pomarol, “Light custodians in natural composite Higgs models”, Phys. Rev. D 75 (2007) 055014, 10.1103/Phys Rev D.75.055014 , ar Xiv:hep-ph/0612048 . · doi ↗
- 5[5] R. Contino, T. Kramer, M. Son, and R. Sundrum, “Warped/composite phenomenology simplified”, JHEP 05 (2007) 074, 10.1088/1126-6708/2007/05/074 , ar Xiv:hep-ph/0612180 . · doi ↗
- 6[6] D. B. Kaplan, “Flavor at SSC energies: A new mechanism for dynamically generated fermion masses”, Nucl. Phys. B 365 (1991) 259, 10.1016/S 0550-3213(05)80021-5 . · doi ↗
- 7[7] M. J. Dugan, H. Georgi, and D. B. Kaplan, “Anatomy of a composite Higgs model”, Nucl. Phys. B 254 (1985) 299, 10.1016/0550-3213(85)90221-4 . · doi ↗
- 8[8] S. Blasi and F. Goertz, “Softened goldstone-symmetry breaking”, (2019). ar Xiv:1903.06146 .
