Searching in CMS Open Data for Dimuon Resonances with Substantial Transverse Momentum
Cari Cesarotti, Yotam Soreq, Matthew J. Strassler, Jesse Thaler, Wei, Xue

TL;DR
This paper introduces a pT-enhanced dimuon search strategy using CMS Open Data, significantly improving sensitivity to potential new particles produced in heavy particle decays, with implications for future 13 TeV data analyses.
Contribution
It develops a novel pT-based dimuon search method utilizing open data, providing stronger model-independent limits and better sensitivity to new physics signals.
Findings
Limits are up to nine times stronger than inclusive searches.
The method improves sensitivity to particles from heavy decays.
Expected sensitivity increase by an order of magnitude with 13 TeV data.
Abstract
We study dimuon events in 2.11/fb of 7 TeV pp collisions, using CMS Open Data, and search for a narrow dimuon resonance with moderate mass (14-66 GeV) and substantial transverse momentum (pT). Applying dimuon pT cuts of 25 GeV and 60 GeV, we explore two overlapping samples: one with isolated muons, and one with prompt muons without an isolation requirement. Using the latter sample requires information about detector effects and QCD backgrounds, which we obtain directly from the CMS Open Data. We present model-independent limits on the product of cross section, branching fraction, acceptance, and efficiencies. These limits are stronger, relative to a corresponding inclusive search without a pT cut, by factors of as much as nine. Our "pT-enhanced" dimuon search strategy provides improved sensitivity to models in which a new particle is produced mainly in the decay of something heavier, as…
Click any figure to enlarge with its caption.
Figure 1
Figure 1
Figure 2
Figure 2
Figure 5
Figure 5
Figure 5
Figure 6
Figure 6
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15
Figure 16
Figure 17
Figure 18
Figure 19
Figure 20
Figure 21
Figure 22
Figure 23
Figure 24| Dimuon Events | ||
| CMS11a | 6,241,576 | |
| Baseline Acceptance | 2,961,681 | |
| (, , ) | ||
| Tight Muon Cuts | 2,155,900 | |
| (, , mm, mm) | ||
| OS Events | SS Events | |
| Opposite Sign vs. Same Sign | 1,895,756 | 260,144 |
| -mass Region () | 794,623 | 30,105 |
| 699,270 | 9,726 | |
| Muon Isolation () | 642,219 | 78 |
| Central Value | Uncertainty | |
|---|---|---|
| 2.11 fb-1 | 2.2% | |
| 0.392 | 2.4% | |
| (i.e. per muon) | 0.924 | 2.4% |
| (i.e. per muon) | 0.966 | 1.5% |
| Background | — | 1.0% |
| Combined () | 0.659 fb-1 | 5.3% |
| Dimuon Events | ||
| Baseline Acceptance and Tight Muons Cuts | 2,155,900 | |
| (, , , | ||
| mm, mm, to match Table 1) | ||
| Search Region | 561,364 | |
| (OS, , | ||
| m, m) | ||
| Isolated Sample | Prompt Sample | |
| () | (m) | |
| 188,924 | 412,002 | |
| GeV | 46,798 | 91,264 |
| GeV | 7,668 | 11,208 |
| Central Value | Uncertainty | Incremental Effect on Expected Limit | |
|---|---|---|---|
| Resolution | 1.1% (1.3%) | 0.4% | 10% (7%) (profiled) |
| Line Shape Modeling | 1 | 5% | |
| 2.11 fb-1 | 2.2% | 0.6% (multiplicative) | |
| (prompt sample only) | 0.97 | 1.5% |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Searching in CMS Open Data for Dimuon Resonances
with Substantial Transverse Momentum
Cari Cesarotti
Department of Physics, Harvard University, Cambridge, MA 02138
Yotam Soreq
Theoretical Physics Department, CERN, Geneva, Switzerland
Department of Physics, Technion, Haifa 32000, Israel
Matthew J. Strassler
Department of Physics, Harvard University, Cambridge, MA 02138
Jesse Thaler
Center for Theoretical Physics, Massachusetts Institute of Technology, Cambridge, MA 02139
Department of Physics, Harvard University, Cambridge, MA 02138
Wei Xue
Theoretical Physics Department, CERN, Geneva, Switzerland
Abstract
We study dimuon events in 2.11 fb*-1* of 7 TeV collisions, using CMS Open Data, and search for a narrow dimuon resonance with moderate mass (14–66 GeV) and substantial transverse momentum (). Applying dimuon cuts of 25 GeV and 60 GeV, we explore two overlapping samples: one with isolated muons, and one with prompt muons without an isolation requirement. Using the latter sample requires information about detector effects and QCD backgrounds, which we obtain directly from the CMS Open Data. We present model-independent limits on the product of cross section, branching fraction, acceptance, and efficiencies. These limits are stronger, relative to a corresponding inclusive search without a cut, by factors of as much as nine. Our “-enhanced” dimuon search strategy provides improved sensitivity to models in which a new particle is produced mainly in the decay of something heavier, as could occur, for example, in decays of the Higgs boson or of a TeV-scale top partner. An implementation of this method with the current 13 TeV data should improve the sensitivity to such signals further by roughly an order of magnitude.
††preprint: MIT-CTP/5044††preprint: CERN-TH-2018-188
Contents
I Introduction
The CERN Open Data portal CERNOpenData aims to make data from the Large Hadron Collider (LHC) publicly available as a long-term archive, with the first research-grade data from the CMS experiment released in 2014 CMSOpenData . In order to identify any issues that might interfere with their use by physicists of the future, it is important that open data frameworks be tested today. There are good scientific motivations to make use of this resource CMS:OpenAccess . Open data makes it possible for scientists outside of the LHC collaborations to study Standard Model (SM) processes that are not well modeled by Monte Carlo (MC) generators, such as rare QCD backgrounds. Together with detector-simulated samples, open data also makes it possible to test event analysis strategies that rely on a detailed understanding of detector effects. The value of the CMS Open Data for exploratory studies of QCD has been demonstrated in Refs. Tripathee:2017ybi ; Larkoski:2017bvj ; see Refs. Madrazo:2017qgh ; Andrews:2018nwy ; Andrews:2019faz for machine-learning studies on detector-simulated CMS samples, Refs. Kile:2017ryy ; Kile:2017ccn ; Kile:2017psu for QCD studies on archival ALEPH data, and Ref. CidVidal:2018blh for a diphoton analysis with public LHCb data.
In this paper, we report the first utilization of the CMS Open Data in a search for Beyond the Standard Model (BSM) phenomena. We seek a new particle that decays promptly to dimuon pairs () and is typically produced with substantial transverse momentum (). Our analysis is based on 2.11 fb*-1* of 7 TeV center-of-mass collision events recorded by the CMS experiment during the first part of 2011 and made public through the CERN Open Data portal CMS:DiMuonPrimary . We perform a narrow resonance search in the dimuon mass range GeV and study the effect of modest cuts on , namely and 60 GeV; this approach (which we will refer to as “-enhanced”) could be applied to larger values as well, or alternatively to a cut on the boost factor . This type of search strategy was suggested some time ago Strassler:MITBerkeley , as one of several unconventional approaches for finding low-mass dilepton and diphoton resonances Strassler:KITP08 , but to our knowledge has never been carried out as a public analysis by the LHC collaborations. For this reason, the mass and regime we cover is relatively unexplored. Moreover, our model-independent approach is complementary to highly targeted searches.
A low-mass, high- particle is well motivated. LHC-accessible hidden sectors of new particles without SM gauge interactions can result in narrow neutral resonances appearing at any mass. These scenarios are often called hidden valleys Strassler:2006im or dark sectors Essig:2013lka ; Alexander:2016aln ; famous examples arise in twin Higgs models Chacko:2005pe and asymmetric dark matter Nussinov:1985xr ; Kaplan:2009ag . Such hidden sectors would have small direct production rates at the LHC and at all previous colliders, but indirect production through the decay of a heavier particle may be much larger than direct production. This heavier particle could be a known SM state (e.g. , , Higgs boson, or top quark) or as yet undiscovered (e.g. a top partner that has escaped detection due to its exotic decays, or a heavy Higgs), and its production rate may be much larger at the LHC than at lower-energy colliders. When indirect production via decay is common, a particle from a hidden sector may typically have moderate to high , and a search involving a cut may preserve the signal while reducing SM backgrounds sharply.
For the specific case of a decaying to dimuons, Drell-Yan (DY) and QCD backgrounds (including both real muons from hadron decays and fake muons) fall rapidly with the dimuon transverse momentum . In many models, the signal’s spectrum is harder than that of the background, so even a rather modest cut on increases sensitivity. While this is not the case for minimal dark photon models with kinetic mixing Okun:1982xi ; Galison:1983pa ; Holdom:1985ag ; Pospelov:2007mp ; ArkaniHamed:2008qn ; Bjorken:2009mm (see related discussion in Ref. Hoenig:2014dsa ), it is common to any scenario where is produced from the decay of a heavier state. It is also the case, for example, in the SM search for , where is used to define event categories Aaboud:2017ojs or as part of a multivariate discriminant Sirunyan:2018hbu .
In addition to having substantial , the may often be produced in association with other hidden sector particles Strassler:2006im ; Han:2007ae , whose decay products might be clustered together in the detector. This clustering could significantly reduce the efficiency of any lepton isolation cut on the signal. To ensure sensitivity to the broadest range of models, one may wish to relax or drop the tight isolation criteria that are usually applied in dilepton searches, and instead reduce QCD backgrounds through a stringent impact parameter (IP) cut to select real prompt muons.111Another potential failure mode for isolation can occur even with a single isolated highly boosted , where the decay products ruin each other’s isolation; cf. early studies of lepton jets Baumgart:2009tn ; Cheung:2009su ; Falkowski:2010cm ; Falkowski:2010gv ; Aad:2015sms and photon jets Dobrescu:2000jt ; Larios:2001ma ; Toro:2012sv ; Draper:2012xt ; Ellis:2012zp . One may evade this by excluding companion muons when imposing isolation. We do not do so here, but this should be implemented for any search targeting lower masses and/or higher cuts. With the availability of CMS Open Data and corresponding simulated samples, we can test the efficacy of an IP-cut-based search strategy and look for prompt but non-isolated dimuon resonances.
The use of a cut to separate a BSM signal from SM backgrounds has a long history. In the LHC era, there has been intense interest in particles with a large boost, such that their decay products become highly collimated. Searches for particles that decay to one or more highly boosted //Higgs bosons or top quarks have been widely proposed and carried out, for instance in Ref. Chatrchyan:2012tw for boosted ; see Refs. Larkoski:2017jix ; Asquith:2018igt for recent reviews for boosted hadronic objects. Searches for new particles that are produced with a high boost have also been proposed Strassler:2006im ; Han:2007ae ; Aguilar-Saavedra:2017zuc ; Chakraborty:2017mbz ; Aguilar-Saavedra:2017rzt ; Aguilar-Saavedra:2018xpl ; Collins:2018epr , and although some have been implemented Chatrchyan:2012cg ; Khachatryan:2015wka ; Sirunyan:2018mgs , there have been none to our knowledge in the purely dimuon or dielectron channels. Moreover, as we show here, enhanced sensitivity across the mass region of interest may be obtained even with moderate cuts, such that boost factors are typically much more modest.
In this paper, we present summaries of our -enhanced dimuon search results. More details and additional results will be presented in future work. In Sec. II, we validate our use of the CMS 2011 dimuon data set by performing a measurement of the boson cross section. In Sec. III, we describe our dimuon resonance search strategy, with results shown in Sec. IV. Implications for various benchmark scenarios are sketched in Sec. V, and we conclude in Sec. VI.
II Validation of the Dimuon Data Set
II.1 Basic Selection Criteria
Our analysis is based on the DoubleMu primary data set from CMS Run 2011A CMS:DiMuonPrimary , hereafter referred to as “CMS11a”, and benefits from the excellent performance of the CMS muon system Chatrchyan:2012xi ; Chatrchyan:2013sba . We select events that pass the HLT_Mu13_Mu8 () high-level dimuon trigger, which nominally requires of for the leading (subleading) muon.222The DoubleMu primary data set has 22 high-level trigger paths, none of which impose a muon isolation requirement, except HLT_DoubleMu5_IsoMu5 which is not used here. To mitigate trigger threshold effects, we impose a further cut of on the leading muon and on the subleading muon, irrespective of their electric charge. We also impose a pseudorapidity cut of , since the muon resolution degrades in the forward region. We performed a validation study using the prescaled HLT_DoubleMu7 () trigger with a nominal threshold of on both muons. After our baseline selection, the muon spectra from and are statistically equivalent, demonstrating that we are indeed working in the trigger plateau region.333Note that Ref. Chatrchyan:2013tia , which used the same trigger on the 2011 data set, applied looser requirements of and () on the leading (subleading) muon. In Ref. Hoenig:2014dsa , the results of Ref. Chatrchyan:2013tia were recast as a dark photon search, albeit with weaker limits than derived here due to the use of relatively coarse mass bins.
For all of our analyses, we require that the muons pass the tight muon selection criteria defined in Ref. Chatrchyan:2012xi .444This tight definition is taken from the 2010 CMS performance study Chatrchyan:2012xi . To our knowledge, there is no dedicated muon performance study from CMS on the 2011 data. There is a study on the CMS 2012 data that recommends slightly different tight muon selection criteria CMS-DP-2014-020 , but that study is limited to muons with . This means that the muon is reconstructed both as a “global muon” with the fit yielding and as a “tracker muon” with more than 10 inner-tracker hits. As a baseline IP requirement, the reconstructed muon tracks must intersect the primary vertex within mm in the – plane and mm in the direction.
We now present two validation studies of the CMS11a trigger stream. These same baseline requirements will be used in our dimuon search in Sec. III.
II.2 Comparison to Monte Carlo Samples
The first validation study, shown in Fig. 1, involves comparing the opposite-sign dimuon spectrum in the CMS11a data set with MC samples provided by CMS, which are generated using the CMS GEANT4-based Agostinelli:2002hh detector simulation. We impose isolation cuts on the muons to reduce QCD backgrounds to negligible levels; more details are given in Eq. (2) below.
In the mass range , we compare to a -pole Monte Carlo sample (ZMC) CMS:ZMC obtained from MadGraph 5 v1.1.0 Alwall:2011uj interfaced with Pythia 6.4.25 Sjostrand:2006za 555Since the information provided with Ref. CMS:ZMC (and other similar MC samples) does not specify the Pythia version used for event generation, we cite version 6.4.25, which has tune Z2 as an official option. An earlier version might have been used, with the tune Z2 settings. with tune Z2 Field:2011iq and TAUOLA 2.4 Jadach:1993hs , adjusting the ZMC normalization to match the boson peak in CMS11a. In the mass range , we compare CMS11a to a DY Monte Carlo sample (DYMC) CMS:DYMC obtained from Pythia 6 with tune Z2, adjusting the DYMC normalization to match the top of the trigger turn-on curve around 30 GeV. We also impose an unusual upper bound of on the data and MC events, because the DYMC sample, lacking parton shower/matrix element matching, underestimates the high- tail of the data below .666We must rely on the unmatched Pythia-only DY sample here, because the 2011 CMS Open Data release provided MadGraph/Pythia matched samples for DY plus jets CMS:DY1JetMC ; CMS:DY2JetMC ; CMS:DY3JetMC ; CMS:DY4JetMC but not for DY plus 0 jets. This highlights the importance of stress-testing archival data strategies, to ensure that relevant information is not inadvertently omitted.
The CMS11a data set and the DYMC/ZMC samples show fairly good agreement in Fig. 1, including the shape of the pole and the shape of the trigger turn-on region. Below the pole, disagreements are mostly within the expected theoretical uncertainties of the simulations, which are of order 10%–20%. Where the DYMC and ZMC samples meet at , there is a small mismatch, again within the expected theoretical uncertainties of the simulations. (Strictly speaking, the DYMC and ZMC samples are defined by the generator-level , not the reconstructed , so near the DYMC and ZMC curves in Fig. 1 actually include a few events from the other sample.) Above the pole, the ZMC sample lacks the background present in data at high mass. The -pole region shows that the ZMC underestimates the width of the resonance in data, a known effect Chatrchyan:2012xi . While CMS has documented three different methods (MuScleFit, Rochester, and SIDRA) to correct the MC resolution CERNOpenDataMuonRecommendations , details are not publicly available for the reprocessed CMS Open Data samples. (Implementing the “Summer11” SIDRA correction CMSSIDRA on the “Summer11LegDR” ZMC sample leads to oversmearing of the peak.) We have no independent way to determine the corrections, but fortunately we will not need them elsewhere. Within these limitations, the general agreement provides confidence that our data sample is in accord with expectations.
In carrying out this check, we should have first applied a scale factor correction on the muon to the data. This scale factor is a function of , , and azimuthal angle . However, this information for the CMS11a data set is not yet public and we are therefore unable to use it directly. We can obtain some partial information as follows. A study in Ref. Chatrchyan:2012xi shows how the uncorrected mass, as a function of the charge-weighted muon azimuthal angle, varies by in 2010 data. Since we find that the corresponding variation in the CMS11a data set is much smaller, we infer that improved calibrations were applied to it. We also find that the mass in our sample varies by less than for . In summary, we find evidence that the largest variations in the scale factor have already been corrected in CMS11a, and that any residual corrections to be accounted for are far below the 1% level.
Meanwhile, the recommendation from CMS for the 2011 data is that, if unable to apply muon scale factor corrections, one should take the scale factor to be CERNOpenDataMuonRecommendations . This uncertainty has a negligible effect on the cross-checks in this section, so we do not account for it.
II.3 Extracting the Boson Cross Section
For a second validation study, we extract the cross section for bosons decaying to muons using the CMS11a data set. Our analysis is modeled on the CMS measurement of on 36 pb*-1* of 2010 data CMS:2011aa (see also Ref. Khachatryan:2010xn ). We impose the same kinematic cuts: and on both muons, and . This ensures that our acceptance, and the theoretical SM cross section in the -mass window, should match Ref. CMS:2011aa , though the trigger and isolation criteria are different. In Table 1, we show the number of dimuon events that pass these cuts, separated by whether the two leading muons have charges with the same sign (SS) or opposite sign (OS).
The quantity can be obtained from the number of candidates, , via
[TABLE]
where is the integrated luminosity, is the kinematic acceptance, is the combined trigger/reconstruction efficiency for the sample, and represents the sample’s isolation efficiency. The central values and uncertainties for these quantities are summarized in Table 2 and described briefly below.
Integrated luminosity information is provided with the CMS Open Data CMS:2011Lumi . To determine , we sum over the luminosity blocks where the trigger was active, obtaining 2.16 fb*-1* delivered and 2.11 fb*-1* recorded for CMS11a.777Strangely, there are 7 luminosity blocks where the recorded luminosity is zero, despite the fact that they contain a total of 17 events where the trigger fired. Removing these events has a negligible impact on our results. CMS quotes a 2.2% luminosity uncertainty for 2011 CMS-PAS-SMP-12-008 , and we take this as a systematic uncertainty.
Though we cannot cross-check the luminosity uncertainty independently, we did verify that when we break the data into subsets, the number of events in the boson peak divided by the integrated luminosity is nearly constant. The same is true for the number of non- Drell-Yan events, which is a further check that the trigger functioned stably during the run. That said, there is some jitter in these ratios, of order 2%. We have no information about the source of this jitter, which could stem from the luminosity measurement, the trigger/reconstruction efficiency, or other sources. To be conservative we assign this uncertainty to the trigger/reconstruction efficiency; see below.
Using the ZMC sample, we find a kinematic acceptance factor of , to be compared with the in Ref. CMS:2011aa ; we take the relative discrepancy as a systematic uncertainty. There is also a 1.9% theoretical uncertainty on noted in Ref. CMS:2011aa , which we combine in quadrature for a total uncertainty of 2.4%.
Since Ref. CMS:2011aa uses a single-muon trigger (whereas we use a dimuon trigger), and applies cuts for muon quality and isolation that differ from ours, we must determine the corresponding efficiencies ourselves; details on this procedure will be presented in future work. For the trigger/reconstruction efficiency , we must rely on truth information from the ZMC sample, but the result we find can be cross-checked against 2011 CMS estimates, such as found in Ref. CMSMuonTwiki ; these show that, for the single-muon efficiency, MC and data agree to within 2%, which we take to be a systematic error. We combine this in quadrature with the uncertainty inferred from the jitter in the ratio of boson events to recorded luminosity, 2% on the dimuon efficiency (1.4% per muon), giving a total uncertainty on the single-muon efficiency of 2.4%.
To impose isolation, we require for each muon, where the combined isolation variable is
[TABLE]
where the numerator is the sum of the transverse momenta of all tracks within a cone of radius around the muon, together with the transverse energy of all ECAL (electromagnetic calorimeter) and HCAL (hadronic calorimeter) deposits within the same cone, without removing double counting (see Ref. Chatrchyan:2012xi ). To determine , we use multiple methods, including truth information from ZMC and a tag-and-probe analysis on the CMS11a data, and these agree to within 1%. To be conservative we take a 1.5% systematic uncertainty.
This analysis is essentially background free. This can be seen, for instance, in the CMS DY study Chatrchyan:2013tia , where backgrounds from , , , , and QCD (i.e. real and fake muons from all hadronic sources) together add up to less than 1% of the signal. This can be checked by a direct calculation, except for the QCD background, which we probe using SS muon events; from Table 1 we see that they are removed efficiently by the isolation cut. Combining the uncertainties from Table 2 in quadrature leads to a relative uncertainty of approximately 5%.
Inserting from Table 1 into Eq. (II.3), we find
[TABLE]
in the -mass window of 60–120 GeV, where the first uncertainty is statistical and the second is from the uncertainties in Table 2. This agrees with the next-to-next-to-leading-order SM prediction of pb quoted in Ref. CMS:2011aa (obtained from FEWZ Gavin:2010az and MSTW08 Martin:2009iq ), the measured value of pb in Ref. CMS:2011aa (pb with electron/muon averaging), and the 2011 CMS result of pb Chatrchyan:2013tia .
III Resonance Search Strategy
We now describe our analysis strategy for setting new bounds on production. Our results are largely model independent, up to subtleties described below. The overall methodology is straightforward. Taking events in the trigger stream, we impose minimal additional cuts on the and of the muons. We then define separate isolated and prompt samples that overlap but are useful for different classes of signal models. We finally impose three different cuts on the dimuon transverse momentum to isolate boosted kinematics. Within these samples (six in total), we search for a narrow bump, with a width appropriate to the CMS dimuon mass resolution and a Crystal-Ball-like line shape. We employ a profiled-likelihood method using approximate formulas from Ref. Cowan:2010js , with certain details motivated by Ref. Williams:2017gwf .
III.1 Defining Isolated and Prompt Samples
The initial event selection mirrors that of Sec. II, Table 1. As summarized in Table 3, we place cuts of 15 (10) GeV on the leading (subleading) muon to ensure that we are above the trigger threshold. We require these two muons to satisfy because of the degraded resolution at forward angles, and we demand that they satisfy the transverse and longitudinal IP requirements of mm and mm. Next, we tighten the IP cuts to m and m, and limit ourselves to OS events in the mass window , allowing for searches in the mass range .
We then define two overlapping samples for study:
an isolated sample, where the two leading muons satisfy an isolation requirement of [defined in Eq. (2)], which dramatically suppresses the QCD background; and 2. 2.
a prompt sample, where no isolation cut is imposed but the transverse IP cut on the two leading muons is tightened further to m, substantially reducing the QCD background and leaving it comparable to the irreducible DY background.
From the ZMC and DYMC samples, and cross-checking using data, we infer that this tighter IP cut in the prompt sample accepts of typical prompt signals, an effect we correct for later.888The sample of Sec. II, and any high- sample of DY with isolation imposed, are almost free of QCD contamination. This can be inferred from the number of SS dimuon events and from lack of a tail in the IP distribution. In these nearly pure samples of prompt dimuons, which closely resemble our signals, we can directly estimate the relevant efficiency by counting events as a function of the IP cut. Note that access to the CMS Open Data was essential for validating the prompt sample, since it involves QCD backgrounds whose magnitude cannot be precisely predicted a priori, as well as detector effects related to the IP resolution. (Though not directly comparable, one can also infer the potency of the IP cut to reduce QCD backgrounds from Fig. 7b of Ref. Chatrchyan:2012xi .)
As control samples, we take SS muons separated into prompt and non-prompt subsamples, and OS muons where we reverse either the isolation cut or the tighter IP cut. Nothing striking appears in these samples, which adds confidence that any features observed in the signal samples are not a result of kinematic sculpting.
Finally, within the isolated and prompt samples, we consider additional subsamples, defined inclusively, in which we impose a cut. The sequence of cuts is chosen based on a principle: a signal at the expected exclusion level (2) of one cut should be discoverable (5) following the next, tighter cut, assuming both cuts have identical signal acceptance. As we will see explicitly in Sec. V, the latter assumption is more sensible than it might at first appear; it is often the case that a hard cut has high (60%–100%) signal acceptance relative to the next-hardest cut. Based on this principle, we take three cuts of , which reduce the background at each step by approximately a factor of , as shown in Fig. 2. (One should continue this procedure as far as possible, but the next natural cut at GeV leaves little data in CMS11a; we do not study it here.) Of course, this reduction factor is not entirely uniform across the dimuon spectrum; because of the trigger’s impact, the factor for the cut is below GeV, rising to just above this range.
The behavior seen in Fig. 2 also explains why we chose, in this study, to make a cut on rather than on the dimuon boost . The backgrounds at fixed are, somewhat accidentally, rather flat across this mass range; they are relatively easy to fit and we obtain bounds that are fairly uniform as a function of mass. By contrast, backgrounds at fixed boost drop much more sharply across this mass range, complicating the fitting procedure.
III.2 Resonance Line Shape
We next search for a bump, scanning across a range of values for the dimuon invariant mass . As a potential signal, we assume a narrow resonance, with intrinsic width far smaller than the detector resolution.
The choice of line shape and resolution for our search requires some care, because the line shape for a signal is not model independent. First, there is a radiative tail from QED emission off of the muons, whose precise form depends logarithmically on . Next and more importantly, the muon resolution, and therefore the dimuon mass resolution, is a function of the and especially the of the muons. Since different models produce different muon and distributions, their line shapes will have different widths. Finally, even at fixed and , both the CMS11a data and the corresponding MC samples indicate that the resolution has small non-Gaussian tails.
In order to understand the CMS dimuon mass resolution , which depends on and , we have studied the kinematic dependence of the CMS muon momentum resolution, using the line shape of the in the CMS11a data. (Samples of other hadronic resonances are either less abundant or less pure.) The excellent tracking granularity means the resolution on the dimuon opening angle is subdominant to the resolution of the individual muons. Because of the trigger, these ’s are highly boosted, and so the muons are very close in and in roughly the same range. Fitting the line shape with a Crystal Ball function to account for both the radiative tail and the resolution allows us to estimate the resolution as a function of for GeV and, with larger uncertainties, for GeV. We also have an estimate of the resolution from the CMS MC samples, where we can directly relate generator-level and detector-level values. Comparing data and MC indicates that the MC underestimates the resolution in the real data by about 10%, but the dependence is otherwise well modeled within the region of interest to us. Therefore, in bins where the CMS11a sample is large, we use the results from our fit to the as our central value, and at higher , where the sample is too small, we use the resolution found in the CMS MC samples, multiplied by 1.1, as our central value. Convolving these results against typical signal distributions, we find that the resolution is in the range for low , slowly increasing for higher signals to around for GeV.
The uncertainty in the resolution is difficult for us to determine, since the current release of CMS Open Data does not provide any detailed information concerning muon resolution and its uncertainty. It is recommended CERNOpenDataMuonRecommendations , when unable to apply resolution corrections in detail, to take a systematic uncertainty of in the resolution. This appears consistent with the uncertainties found in the most up-to-date public information from 2010 Chatrchyan:2012xi . The corresponding uncertainty of on the dimuon mass resolution appears to be too large, based on our studies of the in the CMS11a sample, but since we cannot quantify this reliably, we follow the above recommendation.
As noted earlier, we also follow the CMS recommendation for our data set to take the scale factor on the muon to be CERNOpenDataMuonRecommendations . The resulting scale uncertainty on of is approximately half the size of our bins, and the size of our signal resolution. We cannot model the scale factor uncertainty properly, since we have no information about the dependence of the scale factor on kinematic quantities, and thus no information about event-to-event correlations. But a constant (event-independent) scale factor of 1.0014 would shift the dimuon mass by less than 100 MeV for GeV, too small to affect our analysis, and an uncorrelated one would combine in quadrature with the uncertainty of in the resolution, leaving it unchanged to the available precision. We consequently do not account for this uncertainty in our results.
The appropriate line shape has a Gaussian core and a radiative tail.999A Gaussian line shape, without accounting for the radiative tail, gives limits 10%–15% smaller than those presented below. The most important role of the tail is to deplete signal from the Gaussian core, so it is important that its integral be approximately correct in order that the core be properly normalized. We cannot determine this entirely from data, because the radiative tail from the , the cleanest resonance, disappears under the continuum background. For this reason, a MC-based approach for modeling this well-understood QED phenomenon is more accurate.
We therefore first generate a high-statistics sample of decays with Pythia 8.235 Sjostrand:2014zea , in a specific model for the kinematics (model M1 defined in Sec. V.1), for GeV and for GeV. This generator includes photon final state radiation (FSR), so the dimuon mass distribution has a tail below the delta function spike at , whose size depends on . We then smear this result with a Gaussian, applied event by event according to the - and -dependent single-muon resolution obtained above. The amount of smearing is chosen so that for GeV we reproduce the desired - and -dependent resolution in the core of the peak to within 0.05%, much smaller than the uncertainties on the resolution of . We then apply the same procedure for other to obtain a predicted line shape (with a slow dependence on and specific to model M1) for the central value of the resolution. We repeat the procedure, increasing or decreasing the smearing by an amount that is independent of and , to obtain other choices of resolution that we need later in our statistical analysis.101010The CMS11a data and MC reveal subtleties in the efficiency for muon reconstruction when a hard muon overlaps with a hard FSR photon. But this issue only affects dimuons far into the radiative tail, and does not impact our results.
Our generated statistics are high enough that we may use the smeared MC as our prediction. As a check, we studied smoothing our prediction by fitting it with a single- or double-shouldered Crystal Ball function. These fits give results that differ by up to 3% on expected limits and up to 6% on observed limits, but this is caused by an imperfect fit in the peak region, not by low MC statistics on the tails. Nevertheless our prediction has intrinsic uncertainties, both from the modeling of photon FSR and from the fact that detector effects produce slightly non-Gaussian smearing, but these are common to all samples and vary little if at all with . We associate to these effects a 5% conservative Gaussian uncertainty in the best fit signal strength that affects all samples and masses uniformly. The impact on our 95% confidence upper limits is then very small, as we will see below.
III.3 Systematic Uncertainties
Our results include systematic uncertainties associated with the four effects in Table 4. For the dimuon resolution, we take central values of for the samples and for , and we profile over the resolution uncertainty as described in Sec. III.4 below. As discussed further in Sec. IV.2, we externalize the uncertainties associated with the acceptance and trigger/reconstruction efficiencies, since they are model dependent.
The three remaining uncertainties are from line-shape modeling, luminosity, and (for the prompt sample only) IP cut efficiency. The latter two effects have an obvious multiplicative impact on the limit. Less obvious is that the line-shape uncertainty also has a dominantly multiplicative effect. The reason is that, as far as fitting the signal is concerned, changing the tail of the line shape primarily changes the normalization of the Gaussian-like core. While it is possible to profile over these multiplicative uncertainties, we can use a simpler rescaling procedure since these multiplicative effects are relatively small.
Let the signal strength be multiplicatively proportional to a dimensionless quantity with Gaussian uncertainty and central value . Assume further that the log-likelihood profiled over all other quantities is effectively Gaussian, such that the quantity can be treated as having Gaussian uncertainty and central value :
[TABLE]
Marginalizing over and , keeping fixed, and taking the limit,111111Strictly speaking, we have to assume that , which is a reasonable approximation when evaluating the 95% lower/upper limit. Eq. (4) becomes
[TABLE]
where and . Thus, the profiled log-likelihood is shallower than when , increasing the size of the confidence intervals. For instance, the expected 95% upper limit increases by
[TABLE]
with .
Because the corrections from these multiplicative uncertainties are quadratic in , their effect on our results is small. When combined in quadrature in Table 4, the line-shape uncertainty dominates, leading to a shift in the expected limits of around 0.6%. Note that Eq. (6) is obtained after profiling over the resolution uncertainty and background fit, which explains why the impact of the multiplicative corrections is diluted in this analysis.
III.4 Procedure for Setting Limits
We use the following procedure to obtain limits on production, with more justification presented below. For each mass value, we select a window centered around of width , binning the data in 140 bins. We then fit the mass spectrum within the window to a background model, with or without a signal (whose line shape is described in Sec. III.2) added at the center of the window. The background is modeled as a fifth-order polynomial, including all orders from to , with six free parameters that we profile over. The signal shape is as described in Sec. III.2, with a resolution profiled over the above-mentioned 0.4% uncertainty, treated as Gaussian. (When profiling the resolution, we still keep the window size fixed to 35 times the central value of the resolution.) Using the above signal shape and background model, we determine a -value for rejecting the background-only hypothesis, and evaluate observed and expected 95% upper limits on the number of signal events. The expected limit is determined from the Asimov data set Cowan:2010js in the standard way. We incorporate various uniform systematic uncertainties, shown in Table 4, by adding them into the likelihood and computing their effects on the limits analytically (see Eq. (6) above).
The choice of the above background fitting method is motivated as follows. The available MC samples from CMS do not allow us to reliably predict the background in all relevant kinematic regions, so we cannot determine a fit function a priori over the whole mass range. We therefore fit to the background locally in a window around each dimuon mass value, and we use a polynomial fit because of the somewhat intricate shape of the background. The use of a polynomial background fit in a centered mass window was advocated for in Ref. Williams:2017gwf and employed in Ref. Aaij:2017rft . In this approach, both the degree of the polynomial and size of the window (relative to the resolution ) must be chosen.
In order for the background to be well modeled by a polynomial, we should choose a high-order polynomial and a small mass window. In particular, a window larger than roughly covers so much of the data that it defeats the purpose of local fitting. Because we center the mass window, adding an odd-order to an even-order polynomial has almost no effect on our results, as a parity-odd term is orthogonal to a Gaussian signal and nearly orthogonal to a more realistic signal with a radiative tail Williams:2017gwf . We therefore consider odd-order polynomials of third order or higher (since a linear fit function gives bad fits with any reasonable choice of window), and windows no larger than (to be compared to recommended in Ref. Williams:2017gwf ).
On the other hand, a mass window that is too small, or a polynomial that has too high an order, leads to a spurious “ringing” effect: a large excess at one mass can affect the fits at nearby masses, generating subsidiary correlated -value spikes on either side of a real spike. These correlated spikes, visible by eye, are also detectable through the distribution of spikes as a function of local -value, and equivalently by unreasonably large global -values relative to the maximum local -value. We find that avoiding the ringing effect requires a window of at least 25 (30) for a cubic (quintic) polynomial. Our results are stable for a range of window sizes above these values, except in the trigger turn-on region for the inclusive subsample, which we mask in the limits below. A seventh-order polynomial appears to require a window too large for good fits.
Limits obtained using the quintic, with more nuisance parameters, are generally higher than those for the cubic. We therefore use the quintic as the more conservative option, effectively soaking up the systematic uncertainty associated with the choice of background model by profiling over two additional parameters. We retain the cubic as a cross-check, and we also check the stability of the limits using windows of and . In the spirit of Ref. Williams:2017gwf , we tested the impact of discretely profiling over the cubic and quintic models, finding results that were generally intermediate between those of the two polynomials taken separately. Details and further justifications of our methods will be provided in future work, in which we also “search” for and observe, in the prompt sample, the SM meson decay .
IV Limits On Dimuons Using Cuts on Transverse Momentum
IV.1 Search Results
We now show limits on production from our -enhanced dimuon search. Results for the isolated sample are shown in Fig. 3, for the three cuts. Due to trigger-related effects, we show results only for GeV for the inclusive subsample; below 20 GeV, the and GeV samples are very similar and thus redundant, while between 20 and 35 GeV, the rapid variation of the data makes our methods unreliable. Results could be obtained if the trigger threshold shape could be precisely predicted a priori, but this is not possible for us, especially for the prompt sample.
The left column in Fig. 3 shows the -values as a function of , and the right column shows the observed and expected 95% upper bounds on the quantity
[TABLE]
namely, the product of the production cross section, its branching fraction to muons, the acceptance for events to pass our cuts, the combined dimuon trigger/reconstruction efficiency for muons in these events, and the corresponding dimuon isolation efficiency.
Similar results for the prompt sample are shown in Fig. 4. Since there is no need to account for an isolation efficiency, our bound is on
[TABLE]
Note that we have explicitly corrected for the IP cut efficiency; see Table 4.
IV.2 Use of the Results
To use the results of Figs. 3 and 4 in a model-specific search, one must generate a signal and compute its acceptance and efficiencies, and then combine that with our limits to obtain a bound on the signal cross section times branching ratio. For this reason, Table 4 does not include any uncertainties on the acceptance or the efficiencies , since these depend on the specific model that one wants to constrain. The degree of detail with which this must be done depends on the goals of the user. In many applications, knowing limits to within a factor of 2 is sufficient, and it is rare that knowing them better than 10% is both necessary and feasible. Indeed, signal generation is often done at tree level, or at best at one loop, meaning that substantial uncertainties are intrinsic to the methodology.
The trigger and reconstruction efficiency , while not constant, generically has weak model dependence. Under many circumstances, unless high precision is needed, it is reasonable to take and combine this uncertainty with the comparable or larger uncertainties on the signal generator. A key exception is if the typical has a large transverse boost with , in which case the muons can often be so collimated that the muon trigger system may fail to detect both muons.121212This effect, and a corresponding precipitous loss in efficiency in the forward region, can be seen clearly in the distribution of the in the CMS11a sample. This situation requires a dedicated study of .
By contrast, the isolation efficiency and the acceptance can depend strongly on the specific signal model and its parameters; see Sec. V. Fortunately, acceptance is very similar at generator level and detector level. For isolation, which we have studied using a combination of CMS MC and CMS data, the situation is more complex. If the generator-level dimuon isolation efficiency is low, below 60%–70%, the prompt sample should be used instead of the isolated sample, and is not needed. If it is high () at generator level, then the absolute difference between generator- and detector-level efficiencies is typically less than 10% and so an uncertainty of this order may be taken. In the region between, the differences between generator and detector level must be studied with more care. However, for the limits with a cut of 25 or 60 GeV, a detector-level isolation efficiency of – makes the sensitivities of the prompt and isolated samples comparable. The user can then choose whether to use the prompt samples, at the cost of slightly lower but more certain sensitivity, or to study the isolation with more precision so as to benefit from the slightly higher sensitivity of the isolated samples.
A user requiring higher precision will need to estimate and , and their uncertainties, as we have done in our study above, using information from CMS MC and CMS data, as well as data/MC comparison studies such as in Ref. Chatrchyan:2012xi . Details of how we performed these estimates will be given in future work.131313Specifically, since the resolution, the trigger/reconstruction efficiency, and the conversion factors from generator-level to detector-level isolation efficiency are dominantly a function of the single muon and , we may try in the future to release this information in the same format as Ref. Aaij:2018xpt to allow for easier recasting of our bounds. The precision user will also need to account for uncertainties on the acceptance , and possible important corrections and uncertainties due to the muon resolution and scale factor. Finally, the user must estimate the appropriate signal line shape and resolution to confirm it is within the uncertainties of our assumptions in Sec. III.2, or if not, must correct for it, replacing our line shape with one appropriate to another model. However, the precision user should also consider that there are small residual uncertainties in the choice of window and fitting function in Sec. III.4, and there is no agreed-upon procedure for quantifying such uncertainties in the literature.
IV.3 Interpretation of the Limits
Let us now examine the results of Figs. 3 and 4, keeping in mind that the prompt and isolated samples overlap (as do the samples with different cuts) and are therefore correlated. For – GeV, i.e. where the trigger is efficient, the cut (60) GeV gives expected bounds, relative to the sample with no cut, that are smaller by a factor of for the isolated sample and a factor of for the prompt sample. For well below GeV, the GeV cut gives expected bounds smaller than the sample by slightly less (more) than a factor of for the isolated (prompt) sample. More specifically, in the isolated sample, our expected bounds are in the range of 40 (15) fb for (60) GeV, and correspondingly 60 (20) fb for the prompt sample.
The most significant excursions from expectation in the -value plots are for the inclusive prompt sample, in the 2–3 range. However, an estimate of the global -value for this plot, following the methods of Ref. Gross:2010qma , gives 0.032, slightly below 2 significance. (This result is obtained by counting up-crossings at a baseline significance-squared of ; changing this to or leaves the answer nearly unchanged.) The global significance of the other plots is below 1, including the prompt GeV subsample whose largest local excess (discussed further below) is nearly 3.
One excess, at 29.5 GeV in the prompt sample with GeV, merits a mention since it lies in a region that is already of some interest Heister:2016stz ; Sirunyan:2018wim (see Refs. Godunov:2018qsu ; vanBeveren:2018hnp for follow-up phenomenological studies). At this mass value, the background is rejected at local significance. Most likely this is a statistical fluctuation; two spikes of comparable size appear elsewhere in the same plot, and another appears at 32.5 GeV for GeV. However, let us briefly consider whether this excess could possibly reflect a signal. No corresponding spike is present for the sample with GeV, but this does not by itself argue against a signal; we will see examples of signals with this behavior in Sec. V (e.g. the dotted red curve in Fig. 7). Also note that this excess may not be inconsistent with the results from CMS at this mass range Sirunyan:2018wim , because even though CMS has larger samples from both Run I and Run II, their analysis imposes different cuts (requiring a tag and a central jet veto), which would have very low acceptance for certain signals to which we would be sensitive. For any particular signal, a detailed recasting of the CMS results would be needed, beyond our scope here.141414Our analysis is insensitive to the specific excess in decays observed in Ref. Heister:2016stz , despite hundreds of expected events in CMS11a. As shown in Appendix B of Ref. Heister:2016stz , the typical of the excess is low in the frame. In the CMS11a data, then, our cut has very low acceptance, unless a second production mechanism at the LHC creates additional dimuons at higher .
The most dramatic -value spike in the GeV plot, at 42.7 GeV, has been unrealistically enhanced as a result of the large uncertainty in the resolution (adopted from the CMS recommendation; see Sec. III.2 above). This is reflected in the extreme narrowness of the spike and lack of a similarly large excess in the limit plot at that mass. This effect can occur when an excess in the data has a width smaller than the central value , in which case the fit to a narrow signal may be excellent, resulting in a very small -value. On the other hand, a narrower signal faces smaller backgrounds, so the observed limit (for a fixed -value) is lower than would be expected for a significant signal with width . The excursion of the observed limit above the expected limit is therefore relatively small. A reduced uncertainty on on the low side, as our studies suggest would be appropriate, would make the -values at such locations less significant, with little effect on the observed limits at those masses. We have confirmed this by profiling over the mass resolution using instead of the nominal ; the most dramatic effect is to reduce the significance of the -value peak at 42.7 GeV by . Little or no effect on other -values or on the limits is seen in this or other samples. Thus, at locations with significant -value spikes but a much less significant excess in the limit plot, some caution is advisable.
We additionally caution that small changes in our fitting method can lead to shifts in the local significance of excesses of order . (Changes to the expected and observed limits are smaller.) For example, adjusting the fitting window from to or is sufficient to see effects of this size, as is using the cubic model instead of the quintic one.
One can only say, therefore, that the data show no clearly significant excesses. What is more essential, however, is that application of our methods to Run II data would lead to limits an order of magnitude stronger. Such an analysis would immediately reveal or exclude any particle hypothetically responsible for any of the excesses in our plots.
As a further check, we show the dimuon spectrum with in Fig. 5. For the GeV samples, the number of events is such that all the 2 excursions can be seen by eye, giving a useful cross-check on our results. This figure also illustrates our earlier remark that, while there is virtually no QCD background in the isolated sample, the DY and QCD backgrounds are of similar size in the prompt sample, with QCD falling faster with than DY.
Let us note, finally, that only technical issues deter us from applying stronger cuts, or from searching at higher or lower masses. At higher masses and/or with higher cuts, the event counts become very low and our fitting procedure requires more care; the strategy of Ref. Williams:2015xfa may be helpful in this context. At lower masses and/or with higher cuts, muons become increasingly collimated. As mentioned above, excessive collimation causes the muon trigger system to become inefficient at separating the two muons, especially at high . A more careful study of trigger and reconstruction efficiencies (or use of the much larger single muon stream) would be required. We do not address these issues here, but nothing should prevent the LHC experimental collaborations from extending a -enhanced dimuon search strategy into these more extreme kinematic regions.
V Implications for Benchmark Scenarios
In this section, we briefly consider the implications of our bounds for benchmark signals. As discussed in Sec. IV.2, full application of the bounds requires detailed discussion of how to obtain the various efficiencies for a particular model, which will be presented in future work. Here, we simply demonstrate that simple models exist in which remains large with our cuts (and is unsuppressed). For these models, which include cases where the is produced in the decay of a heavier particle, our -enhanced search strategy offers much improved sensitivity, because the trigger/reconstruction efficiency is mostly independent of the cut, and any significant change in isolation efficiency can be addressed through the judicious use of the isolated and prompt samples. By contrast, as we discuss at the end of this section, our strategy is not aimed at the minimal dark photon models Okun:1982xi ; Galison:1983pa ; Holdom:1985ag ; Pospelov:2007mp ; ArkaniHamed:2008qn ; Bjorken:2009mm , where is predominantly produced via kinetic mixing with the photon/ of strength .
V.1 Production of via Decay
In models where the is produced predominantly in the decay of a heavier particle, our cuts often increase sensitivity. To see this, consider the two simple theoretical models shown in Fig. 6, which both contain a scalar (possibly identified with the 125 GeV Higgs ) and a vector that decays to muons:
- •
M1: , where is a pseudoscalar dominantly decaying to gluon pairs (or perhaps to ); and
- •
M2: , , , where are neutral fermions and the decay of is similar to that of an LSP in R-parity-violating supersymmetry.
If in M1, or if either or in M2, the resonance will have substantial in most events. In both models, the final state of interest is plus jets and no missing transverse momentum, for which there are few searches at the LHC.151515One exception is Ref. Chatrchyan:2012tw , though that search required the dimuons to reconstruct a boson and imposed the equivalent of to have mass above 500 GeV. In Fig. 7 we show the dimuon distribution (normalized to unity) in model M1 for GeV and for two choices of , along with the distribution of the background in CMS11a between 39 and 41 GeV. The peaking of the signal above a rapidly falling background makes clear why our cuts are effective for models in this class.
In model M2, if , then could potentially occur and produce four-lepton events, which are powerfully constrained by multi-lepton searches. For any , however, there are choices of and where this is kinematically forbidden to occur on shell, while still allowing . Furthermore, in some models can be highly suppressed, for example by approximate symmetries or small couplings. In any case, our analysis is model independent, so the fact that other searches may rule out some parts of parameter space for particular models does not affect the validity of our results.
For model M1, we expect the isolated sample to yield the best limits, since the decay products of the pseudoscalar are unlikely to contaminate the muon isolation cones. To assess the degree to which the -enhanced dimuon strategy improves upon an inclusive search, consider the case that is identified with the 125 GeV Higgs boson. Using Pythia 8.235 Sjostrand:2014zea , we estimated the signal acceptance as a function of the cut, namely . The absolute signal acceptance for the inclusive search is –80% for . But the relevant quantity when evaluating the benefits of a cut is the relative acceptance between a -enhanced search with, say, GeV and an inclusive search with no cut. In Fig. 8 (left), we see that –100% when and . (This is not surprising since, for GeV, the momentum in the rest frame always exceeds 25 GeV.) Since our expected bounds for GeV and GeV are lower by a factor of 2–3 compared to those in an inclusive search (see Fig. 3), this cut allows us to strengthen the expected limit on for model M1 by over a substantial portion of the kinematically allowed range.161616Note that, in this model and within the mass range of interest, the efficiencies are weak functions of the cut.
The largest improvement comes in the range , where our expected bounds from the isolated sample for are in the range of 35–45 fb. For , we estimate , 47%, , and . (We will discuss these efficiencies further in future work; the isolation efficiency is smaller than in Table 2 because the muons are softer and the Higgs process is accompanied by more initial state radiation.) Using the observed bound from Fig. 3, we obtain a limit for of
[TABLE]
where we have conservatively taken the uncertainty on the 7 TeV total Higgs cross section to be with a flat prior. Because of the high relative signal acceptance, of 85%, this limit is more than a factor of 2.5 lower than what is expected when no cut is applied. A simple scaling of our model-independent result suggests that limits of better than could be expected from LHC Run II data, even after a penalty from higher trigger thresholds.
Of course, a search targeted specifically for this model could obtain even stronger limits through an - and -dependent cut and by adding the channel. In this context, it is interesting to consider some other models to which our limit applies and which have been constrained by existing analyses. Both CMS Khachatryan:2017mnf and ATLAS Aaboud:2018esj have searched for , whose signature is identical to ours if and . Both analyses required two -tagged jets, and constrain the jets and muons to reconstruct a Higgs; ATLAS further requires that the invariant mass of the jets be similar to that of the muons. Using 19.7 fb*-1* of 8 TeV data, CMS obtained a limit (for GeV) of , also achieved by ATLAS with 36.1 fb*-1* of 13 TeV data. The order of magnitude improvement compared to Eq. (9) is not surprising considering the higher energy and integrated luminosity, along with the optimized targeting of a particular model which greatly reduces background. Of course, our limit continues to apply with little change even if , or to variants of model M1 where the does not decay to , situations to which the ATLAS and CMS limits do not generally apply. This illustrates the complementarity of targeted and model-independent search strategies, and the importance of each.
If , then the -enhanced strategy yields a higher relative acceptance, and the cut can be raised. As an example, we show in Fig. 8 (right) the relative acceptance of the dimuon GeV cut, for GeV. With this cut, expected limits on can improve by as much as a factor of 5 relative to an inclusive search.
For model M2, either the isolated or prompt samples could yield the stronger limit, depending on the precise mass hierarchy. Specifically, in the regime , the is boosted, so the and produced in its decay are both boosted and collimated, as are their decay products. Therefore, the muon isolation efficiency for the signal will be degraded, and the prompt sample may give better limits in this regime. We relegate further details about M2 to future work. Here we simply note that, according to our Pythia 8 simulation, both for GeV and for GeV are much higher than 50% in much of the kinematic range, again implying that a -enhanced search can significantly outperform an inclusive search.
Beyond dimuon resonance searches, there are other LHC analyses that could be sensitive to models such as M1 and M2. If the is the Higgs boson or is produced by mixing with the Higgs, then and production rates are not negligible. In such cases, the -enhanced search described here should be compared not only with an inclusive search of the dimuon spectrum but also with multilepton searches. At the same integrated luminosity, the multilepton signal from is two orders of magnitude smaller than the total cross section, but in certain kinematic regimes it has small backgrounds. The sensitivity of the two classes of searches may depend on the model and its parameters, and on the integrated luminosity, as well as on the specific design of the multilepton search, whose efficiencies and acceptance for low- leptons must be carefully accounted for. We have not attempted to make a detailed comparison, but for the model and parameters corresponding to our limit in M1, Eq. (9), fewer than four multilepton events arise for fb*-1*, before accounting for efficiencies and acceptance. Even with the full Run I data set, losses due to efficiencies and acceptance suggest that a limit from multilepton searches will not dramatically improve on Eq. (9). Run II multilepton searches at ATLAS and CMS (such as Refs. Khachatryan:2017qgo ; CMS-PAS-EXO-18-005 ; Sirunyan:2017qkz ) presumably could put stronger limits than we could achieve using CMS11a, but it is not obvious how they would compare with our method applied to the full Run II data set; a detailed study would be required.
However, if is produced not by mixing with the Higgs but through a separate coupling to gluons, then the and processes are absent, eliminating the multilepton signal. And if the muons are often non-isolated, the multilepton search loses its sensitivity. In such cases, our -enhanced dimuon search competes only with inclusive dimuon searches, and often performs better, as we have already seen. It seems likely that this is true for many other models in which a high- dilepton resonance is the dominant observable effect. For such models, any limits obtained from the results presented here may potentially improve upon existing public limits, though a complete study of the Run II literature would be needed to confirm this.
Most importantly, when applied to the Run II data set, the -enhanced search strategy should give bounds that are several times smaller than a Run II inclusive search, and up to an order of magnitude below those presented here. We therefore view the discovery potential of this strategy as noteworthy.
V.2 Production of via Kinetic Mixing: The Dark Photon Scenario
By contrast, our -enhanced search strategy is not effective, and indeed counterproductive, for the popular benchmark dimuon resonance scenario known as the minimal dark photon model Okun:1982xi ; Galison:1983pa ; Holdom:1985ag ; Pospelov:2007mp ; ArkaniHamed:2008qn ; Bjorken:2009mm . Here, is predominantly produced via kinetic mixing with the photon/ of strength , and the distribution of the signal is the same as for the DY background. Consequently, any cut on reduces sensitivity to , because it removes signal without changing . (As discussed in Ref. Hoenig:2014dsa , imposing a cut on is still useful to avoid the turn-on behavior of the dimuon trigger.)
Nevertheless, our inclusive search in the isolated sample for GeV can be compared to previous results. At present, LHCb has the best LHC limits in the 10.6–70 GeV mass range Ilten:2016tkc ; Aaij:2017rft , though BaBar is more sensitive below 10 GeV Lees:2014xha and future ATLAS/CMS searches are expected to be more sensitive above 40 GeV Curtin:2014cca . (For a recent study of different dark photon and vector resonance bounds, see Refs. Ilten:2018crw ; Bauer:2018onh .) The LHCb data sample has lower integrated luminosity (1.6 fb*-1*) and narrower acceptance than the CMS11a sample, but the higher production rate at 13 TeV more than compensates. Thus, in the region above 35 GeV, our limits on from the subsample should be comparable to but slightly weaker than those of LHCb Aaij:2017rft . Following the analysis of Ref. Ilten:2016tkc , we obtain an estimated limit of at GeV, which confirms this expectation.171717A less stringent limit was estimated in Ref. Hoenig:2014dsa due to a more conservative treatment of the dimuon mass resolution. At lower , where the trigger effectively already applies a cut, our limits on are further weakened.
VI Discussion
Using fb*-1* of CMS Open Data from 2011, we performed a model-independent -enhanced search for a new particle decaying to dimuons. We showed how exploiting moderately boosted kinematics can give significantly lower bounds on a product of physics and detector quantities, because a simple cut on the dimuon system sharply reduces QCD and DY backgrounds. As long as is typically produced in the decay of a heavier particle, this type of cut often preserves signal acceptance, and so our results will lead to improved limits on a wide class of models. Our results indicate that limits in some classes of signal models can improve by up to a factor of 9 relative to those from an inclusive dimuon search at the same luminosity. Still greater improvements could be achieved in some models by using even stronger cuts. A similar strategy would be relevant for diphoton resonances from a particle produced mainly in decays; see Refs. Strassler:2006im ; Chang:2006bw ; Juknevich:2009ji .
We argued that there exist reasonable and simple models for which a -enhanced search would set better limits than any other search strategy implemented to date. Though we only studied the dimuon final state, a combination with dielectrons would further improve the limit on many models. With the much larger integrated luminosity collected during Run II and the higher signal cross sections at 13 TeV (partially counter-balanced by higher trigger thresholds), we estimate that our bounds could shrink by an order of magnitude. Thus in LHC Run II data, the -enhanced search strategy would have considerable discovery potential for a diverse collection of theoretical models, over a wide range of resonance masses.
We have also emphasized the importance of searching both with and without imposing an isolation cut on the leptons. Backgrounds increase by a factor of order 2 when the isolation cut is dropped and replaced with a stringent IP cut. On the other hand, in models where the leptons are embedded in a cluster of particles produced in a hidden sector Strassler:2006im ; Han:2007ae ; Baumgart:2009tn , the dimuon isolation efficiency may easily be smaller than order , such that the prompt sample provides more sensitivity than the isolated sample.
Finally, we have illustrated for the first time that open collider data has the potential to assist the BSM search program at the LHC. In carrying out a search whose results, while limited, do probe new ground, we hope we have demonstrated two things. First, open data can be used to study questions which are outside the mainstream search program, and thus explore new territory. Second, when important backgrounds are challenging for theorists to simulate reliably, open data can provide those backgrounds directly, making phenomenological studies or prototype analyses far more accurate. As an example, our prompt sample has large QCD backgrounds, and we could not have selected our IP cuts with confidence without the explicit knowledge of the backgrounds obtained from the CMS Open Data. In our view, although searches using current open data are unlikely to uncover BSM phenomena on their own, they can help demonstrate the value of certain search strategies and justify the application of those strategies by the experimental collaborations on much larger data sets.
Acknowledgements.
We thank CERN, the CMS collaboration, and the CMS Data Preservation and Open Access (DPOA) team for making research-grade collider data available to the public. We thank R. Leane, R. Mastandrea, and especially R. D’Agnolo for assistance at certain points in this work. We thank E. Carrera, K. Lassila-Perini, and the CMS DPOA team for help interpreting the muon information in the CMS Open Data. We are grateful for conversations with K. Cranmer, A. Geiser, S. Gori, B. Nachman, S. Somalwar, and especially P. Harris, S. Rappoccio, and M. Williams whose comments on a preliminary draft contributed to significant improvements in our methods. C.C. is supported by the Office of High Energy Physics of the U.S. Department of Energy (DOE) under grant DE-SC0013607. Y.S. thanks the Aspen Center for Physics for support. M.J.S. thanks the Department of Physics at Harvard University for hospitality. J.T. is supported by the DOE under grant DE-SC0012567 and by the Simons Foundation through a Simons Fellowship in Theoretical Physics. W.X. is supported by the DOE under grants DE-SC0012567 and DE-SC0013999 and by the European Research Council grant NEO-NAT.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1(1) CERN, “CERN Open Data Portal.” http://opendata.cern.ch .
- 2(2) CMS Collaboration, “CMS Open Data.” http://opendata.cern.ch/research/CMS .
- 3(3) CMS Collaboration, “2018 CMS data preservation, re-use and open access policy,” CERN Open Data Portal (2018) DOI:10.7483/OPENDATA.CMS.7347.JDWH . · doi ↗
- 4(4) A. Tripathee, W. Xue, A. Larkoski, S. Marzani, and J. Thaler, “Jet Substructure Studies with CMS Open Data,” Phys. Rev. D 96 no. 7, (2017) 074003 , ar Xiv:1704.05842 [hep-ph] . · doi ↗
- 5(5) A. Larkoski, S. Marzani, J. Thaler, A. Tripathee, and W. Xue, “Exposing the QCD Splitting Function with CMS Open Data,” Phys. Rev. Lett. 119 no. 13, (2017) 132003 , ar Xiv:1704.05066 [hep-ph] . · doi ↗
- 6(6) C. F. Madrazo, I. H. Cacha, L. L. Iglesias, and J. M. de Lucas, “Application of a Convolutional Neural Network for image classification to the analysis of collisions in High Energy Physics,” ar Xiv:1708.07034 [cs.CV] .
- 7(7) M. Andrews, M. Paulini, S. Gleyzer, and B. Poczos, “End-to-End Physics Event Classification with the CMS Open Data: Applying Image-based Deep Learning on Detector Data to Directly Classify Collision Events at the LHC,” ar Xiv:1807.11916 [hep-ex] .
- 8(8) M. Andrews, J. Alison, S. An, P. Bryant, B. Burkle, S. Gleyzer, M. Narain, M. Paulini, B. Poczos, and E. Usai, “End-to-End Jet Classification of Quarks and Gluons with the CMS Open Data,” ar Xiv:1902.08276 [hep-ex] .
