Inclusive Flavour Tagging Algorithm

Tatiana Likhomanenko; Denis Derkach; Alex Rogozhnikov

arXiv:1705.08707·hep-ex·May 25, 2017

Inclusive Flavour Tagging Algorithm

Tatiana Likhomanenko, Denis Derkach, Alex Rogozhnikov

PDF

TL;DR

This paper introduces an inclusive flavour-tagging algorithm for neutral B mesons that leverages machine learning to improve performance in the challenging environment of the LHC, applicable to any proton-proton experiment.

Contribution

It presents a new probabilistic, machine learning-based flavour-tagging algorithm that reduces dependence on lower-level identification, enhancing B meson tagging efficiency.

Findings

01

Improved tagging performance in LHCb data.

02

Reduced reliance on physics process information.

03

Applicable to various proton-proton experiments.

Abstract

Identifying the flavour of neutral $B$ mesons production is one of the most important components needed in the study of time-dependent $C P$ violation. The harsh environment of the Large Hadron Collider makes it particularly hard to succeed in this task. We present an inclusive flavour-tagging algorithm as an upgrade of the algorithms currently used by the LHCb experiment. Specifically, a probabilistic model which efficiently combines information from reconstructed vertices and tracks using machine learning is proposed. The algorithm does not use information about underlying physics process. It reduces the dependence on the performance of lower level identification capacities and thus increases the overall performance. The proposed inclusive flavour-tagging algorithm is applicable to tag the flavour of $B$ mesons in any proton-proton experiment.

Equations4

\frac{P ( b ˉ )}{P ( b )} = components \prod \frac{P ( b ˉ ∣ B , component , s _{p} )}{P ( b ∣ B , component , s _{p} )} = components \prod (\frac{P ( s _{b} \cdot s _{p} > 0∣ B , component )}{P ( s _{b} \cdot s _{p} < 0∣ B , component )})^{s_{p}}

\frac{P ( b ˉ )}{P ( b )} = components \prod \frac{P ( b ˉ ∣ B , component , s _{p} )}{P ( b ∣ B , component , s _{p} )} = components \prod (\frac{P ( s _{b} \cdot s _{p} > 0∣ B , component )}{P ( s _{b} \cdot s _{p} < 0∣ B , component )})^{s_{p}}

target = {1, 0, if s_{b} \cdot s_{p} > 0, if s_{b} \cdot s_{p} < 0.

target = {1, 0, if s_{b} \cdot s_{p} > 0, if s_{b} \cdot s_{p} < 0.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Inclusive Flavour Tagging Algorithm

Tatiana Likhomanenko1,2,3

Denis Derkach1,2

Alex Rogozhnikov1,2

1 National Research University Higher School of Economics (HSE), RU

2 Yandex School of Data Analysis (YSDA), RU

3 NRC ”Kurchatov Institute”, RU [email protected]

Abstract

Identifying the flavour of neutral $B$ mesons production is one of the most important components needed in the study of time-dependent $CP$ violation. The harsh environment of the Large Hadron Collider makes it particularly hard to succeed in this task. We present an inclusive flavour-tagging algorithm as an upgrade of the algorithms currently used by the LHCb experiment. Specifically, a probabilistic model which efficiently combines information from reconstructed vertices and tracks using machine learning is proposed. The algorithm does not use information about underlying physics process. It reduces the dependence on the performance of lower level identification capacities and thus increases the overall performance. The proposed inclusive flavour-tagging algorithm is applicable to tag the flavour of $B$ mesons in any proton-proton experiment.

1 Introduction

$B$ mesons contain either a $b$ or a $\bar{b}$ quark, which defines their flavour. The flavour-tagging (FT) algorithms determine the flavour of a reconstructed signal $B$ meson candidate at the production point in proton-proton collisions. The FT algorithms are used to measure differences in the behaviour of particles and antiparticles (e.g. measurements of flavour oscillations of $B_{(s)}^{0}$ mesons) and $CP$ asymmetries to probe the validity of the Standard Model of particle physics.

The production of a $B$ meson is usually accompanied by the production of another $b$ hadron and other particles like kaons, pions, and protons (see Figure 1). At hadron collider experiments the FT algorithms are usually divided into two groups:

•

opposite side (OS) taggers use the decay products of $b$ hadrons that are produced together with the signal $B$ (see [1]);

•

same side (SS) taggers exploit light particles that evolve from the hadronisation process of the signal $B$ meson like kaons, pions, and protons (see [2]).

The current version of the FT algorithm used by the LHCb [1, 2, 3], CMS [4], Atlas [5], CDF [6] and D0 [7] experiments tries to identify tracks and vertices produced on the OS/SS sides (SS tagging is done only by LHCb and CDF). It works as follows:

the first step finds all tagging tracks and tagging vertices, where the latter is only used in OS taggers111Note, a tagging track/vertex is a track/vertex whose charge is used to predict the flavour of the signal $B$ . Ideal choices for a tagging track are pion, kaon or proton tracks involved in the SS tagger and lepton or kaon tracks coming from a $b$ hadron decay in the OS tagger. Ideal choice of a tagging vertex for the OS is a $b$ hadron decay vertex or charm hadron vertex coming from a $b$ hadron decay.. Both the OS and SS algorithms find one, maybe several, tagging track, while the OS algorithm also finds one tagging vertex.

(a)

for the OS algorithm only lepton, kaon tracks and $b$ hadron decay, secondary charm hadron vertices are considered; 2. (b)

for the SS algorithm only pion, proton or kaon tracks are considered; 3. (c)

other physically motivated selections are applied to leave only tracks/vertices which have the characteristics that help to define the flavour; 4. (d)

if more than one track (vertex) is left after the previous steps for the OS or SS, a special rule is applied to select an appropriate track (vertex); 2. 2.

each of the OS/SS algorithms predicts a flavour based on the charge of the tagging track (vertex). Other characteristics of the track (vertex) are used to estimate the probability of incorrectly predicted flavour (i.e. misclassification, or mistag probability). 3. 3.

finally, the predictions of the OS and SS taggers are usually combined in the flavour-tagged analyses.

The first step naturally follows our physics intuition, but requires setting ad-hoc conditions, which require a deep understanding of the physics processes. From an analysis point of view, this pipeline causes some disadvantages:

•

the algorithm relies heavily on the particle identification and reconstructed variables during the selection of the tagging tracks (vertices);

•

the process of selecting a tagging particle is based on physics assumptions. This prevents the use of complex selection rules;

•

a lot of information is lost since only a couple of tracks (vertices) are selected.

This paper describes a new approach to define the signal $B$ flavour that exploits all available information in an event without using information about the underlying physics processes, like a tagging track (vertex) search.

2 Inclusive Probabilistic Model

The algorithm starts using an inclusive probabilistic model, which combines information from all tracks and vertices for each selected event containing a $B$ candidate to tag. It uses an assumption similar to a naive Bayes model. Specifically, it assumes a strong independence of the tagging information available in the tracks and vertices222Note that the use of a varying number of multipliers for each event is atypical for a naive Bayes approach in machine learning..

Let “components” refer to both tracks and vertices. Additionally, let $s_{p}$ be the charge sign of a component ( $+1$ or $-1$ ) and $s_{b}$ be the flavour of the signal $B$ ( $+1$ for $\bar{b}$ and $-1$ for $b$ ). Then, assume the following:

[TABLE]

The last equality assumes that the spurious asymmetries introduced by different detection efficiencies for particles and antiparticles in the different regions of the detectors are negligible.

The usage of this formula, however, requires estimating probabilities ${P(s_{b}\cdot s_{p}>0|B,\text{component}})$ and $P(s_{b}\cdot s_{p}<0|B,\text{component}).$ Note that these probabilities are established using different parameters of the signal $B$ meson and a track/vertex, but not using their charges.

This approach has several key properties:

•

it combines all available information from the components of the events under the naive probabilistic model;

•

it implicitly determines the tagging tracks and vertices by the value of the ratio of the probabilities. Most of the particles will have a very small contribution;

•

it does not depend on the tagging particle type (i.e. pion, kaon, electron, muon, proton) and it is not split into OS and SS tagging algorithms;

•

it is symmetric with respect to matter/antimatter due to model definition.

Thus, the proposed FT algorithm is an inclusive model.

3 Inclusive Training

Charged $B$ meson can be tagged using the charge of its decay components. Thus, the flavour of the meson can be defined ( $\bar{b}$ for $B^{+}$ and $b$ for $B^{-}$ ). ${B^{\pm}\to J/\psi[\mu^{+}\mu^{-}]K^{\pm}}$ decays are used for training. The charge of the kaon in the signal decay is used to independently infer the flavour of the $B$ meson at production: ${P(\bar{b})=P(B^{+}),}\,{P(b)=P(B^{-})}$ . The inclusive model is applied to the LHCb data samples that contain reconstructed signal decays $B^{\pm}\to J/\psi[\mu^{+}\mu^{-}]K^{\pm}$ . The set of all tracks with the low probability to be ghost (fake track) and vertices for all events form the tracks and vertices datasets. Note the tracks and vertices forming the reconstructed signal decay are excluded.

In the probabilistic model conditional probabilities ${P(s_{b}\cdot s_{p}>0|B,\text{component})}$ and ${P(s_{b}\cdot s_{p}<0|B,\text{component})}$ are unknown. We can estimate them using a classification model. The target for this classification model is:

[TABLE]

Two gradient boosted decision tree (GBDT) algorithms are trained to predict the conditional probability $P(s_{b}\cdot s_{p}>0|B,\text{component})$ for tracks and vertices. Kinematic properties of the tracks, vertices and signal $B$ meson, information from the particle identification algorithm based on machine learning methods and track quality criteria are used as input observables. For the $B$ meson the following features are used: transverse momentum, polar angle, impact parameter with respect to the primary interaction, pseudorapidity. For tracks, the particle identification algorithm output, polar angle, momentum, transverse momentum are used. Finally, for vertices, the number of tracks forming the vertex, mean of tracks impact parameters and mean of their transverse momenta, mass and momentum, which are calculated assuming pion mass for the incoming tracks, lifetime, angle between the signal $B$ and the vertex are used.

4 Symmetric Calibration

The conditional probability $P(s_{b}\cdot s_{p}>0|B,\text{component})$ predicted by a classification model (i.e. by the GBDT) may be biased (see [8], [9]). Additionally, $P(\bar{b})$ and $P(b)$ computed by the probabilistic model may not be true probabilities due to the naive Bayes assumption. To compensate for these biases, the classifier output must be calibrated. Furthermore, the model should have the same behaviour for particles and antiparticles except small asymmetry of the production and detectors. This means that distributions for $P(B^{+})$ and $P(B^{-})$ should be symmetric around $0.5$ .

To calibrate the GBDT output, Platt scaling [10] and isotonic regression [11] were used. Platt scaling provides better results than isotonic regression and the uncalibrated probabilities. When calibrating $P(B^{+})$ and $P(B^{-})$ , symmetric isotonic regression is used to preserve symmetry in the distributions. The calibration rule, $f$ , is required to be symmetric, i.e. $f(1-x)=1-f(x)$ , where $x$ is $P(B^{+})$ . Figure 2 shows distributions for $B^{+}$ and $B^{-}$ before and after the isotonic regression calibration. The comparison between the probability obtained from the inclusive model and frequency based estimation of the true probability is shown in Figure 3 before and after the calibration procedure. Distributions for $P(B^{+})$ and $P(B^{-})$ are checked to be symmetric around $0.5$ after calibration (see Figure 5). After the calibration the inclusive model has improved Brier and logarithmic scores (see the scoring rules [12]), while the Platt scaling gives worse scores than isotonic regression.

5 Quality Metric

The figure of merit of a FT algorithm is the effective efficiency (see [1, 2, 3]) since the overall statistical power of the flavour-tagged sample is proportional to it. As a proxy metric of the effective efficiency the ROC curve is used in the analysis to optimize the FT algorithm. After the effective efficiency is checked to have increased value with respect to the previous results.

We analyze the ROC curves and check that the new version has a higher ROC curve at each point. The comparison of the ROC curve is shown in Figure 5 and the AUC (area under the ROC curve) values are 0.566 for the current OS FT. The AUC for the proposed inclusive model is $0.641$ . The ROC curve and AUC values were computed for all events including untagged events to compute the overall quality of the algorithm333Untagged events are those events for which all tracks and vertices did not pass selections; for them probabilities are set $P(B^{+})=P(B^{-})=0.5$ ..

6 Conclusion

We proposed a simple flavour tagging technique, which efficiently combines information from vertices and tracks using a machine learning approach. The inclusive flavour tagging algorithm does not use information about underlying physics process and it is applicable to FT of $B$ mesons in proton-proton experiments. The results demonstrate significant improvement in LHCb data, as seen in the ROC AUC score improvement from 0.566 to 0.641.

References

[1]

LHCb Collaboration. Opposite-side flavour tagging of $B$ mesons at the LHCb experiment. The European Physical Journal C 72.6 (2012): 1-16.

[2]

LHCb Collaboration. A new algorithm for identifying the flavour of $B_{s}^{0}$ mesons at LHCb. Submitted to J. Instrum. arXiv:1602.07252 [hep-ex] (2016).

[3]

LHCb Collaboration. $B$ flavour tagging using charm decays at the LHCb experiment. JINST 10 (2015) P10005.

[4] CMS Collaboration. Measurement of the $CP$ -violating weak phase $\phi_{s}$ and the decay width difference $\Delta\Gamma_{s}$ using the $B_{s}\to J/\psi\ \phi(1020)$ decay channel in pp collisions at $\sqrt{s}=8$ TeV. Submitted to Physics Letters B. arXiv:1507.07527 [hep-ex] (2015).
[5] ATLAS Collaboration. Flavor tagged time-dependent angular analysis of the $B_{s}\rightarrow J/\psi\phi$ decay and extraction of $\Delta\Gamma_{s}$ and the weak phase $\phi_{s}$ in ATLAS. Physical Review D 90.5 (2014): 052007.
[6] CDF Collaboration. Measurement of the bottom-strange meson mixing phase in the full CDF data set. Physical review letters 109.17 (2012): 171802.
[7] D0 Collaboration. Measurement of the $CP$ -violating phase $\phi_{S}^{J/\psi\phi}$ using the flavor-tagged decay $B_{s}^{0}\to J/\psi\phi$ in 8 fb*-1* of $p\bar{p}$ -collisions. Physical Review D 85.3 (2012): 032006.
[8] Winkler, R. L., Murphy, A. H. (1968). “Good” probability assessors. Journal of applied Meteorology, 7(5), 751-758.
[9] Niculescu-Mizil, A., Caruana, R. (2005). Predicting good probabilities with supervised learning. In Proceedings of the 22nd international conference on Machine learning (pp. 625-632). ACM.
[10] Platt, J. (1999). Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in large margin classifiers, 10(3), 61-74.
[11] Zadrozny, B., Elkan, C. (2002). Transforming classifier scores into accurate multiclass probability estimates. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 694-699). ACM.
[12] Bickel, J. E. (2007). Some comparisons among quadratic, spherical, and logarithmic scoring rules. Decision Analysis, 4(2), 49-6.

Bibliography12

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] LH Cb Collaboration. Opposite-side flavour tagging of B 𝐵 B mesons at the LH Cb experiment. The European Physical Journal C 72.6 (2012): 1-16.
2[2] LH Cb Collaboration. A new algorithm for identifying the flavour of B s 0 superscript subscript 𝐵 𝑠 0 B_{s}^{0} mesons at LH Cb. Submitted to J. Instrum. ar Xiv:1602.07252 [hep-ex] (2016).
3[3] LH Cb Collaboration. B 𝐵 B flavour tagging using charm decays at the LH Cb experiment. JINST 10 (2015) P 10005.
4[4] CMS Collaboration. Measurement of the C P 𝐶 𝑃 CP -violating weak phase ϕ s subscript italic-ϕ 𝑠 \phi_{s} and the decay width difference Δ Γ s Δ subscript Γ 𝑠 \Delta\Gamma_{s} using the B s → J / ψ ϕ ( 1020 ) → subscript 𝐵 𝑠 𝐽 𝜓 italic-ϕ 1020 B_{s}\to J/\psi\ \phi(1020) decay channel in pp collisions at s = 8 𝑠 8 \sqrt{s}=8 Te V. Submitted to Physics Letters B. ar Xiv:1507.07527 [hep-ex] (2015).
5[5] ATLAS Collaboration. Flavor tagged time-dependent angular analysis of the B s → J / ψ ϕ → subscript 𝐵 𝑠 𝐽 𝜓 italic-ϕ B_{s}\rightarrow J/\psi\phi decay and extraction of Δ Γ s Δ subscript Γ 𝑠 \Delta\Gamma_{s} and the weak phase ϕ s subscript italic-ϕ 𝑠 \phi_{s} in ATLAS. Physical Review D 90.5 (2014): 052007.
6[6] CDF Collaboration. Measurement of the bottom-strange meson mixing phase in the full CDF data set. Physical review letters 109.17 (2012): 171802.
7[7] D 0 Collaboration. Measurement of the C P 𝐶 𝑃 CP -violating phase ϕ S J / ψ ϕ superscript subscript italic-ϕ 𝑆 𝐽 𝜓 italic-ϕ \phi_{S}^{J/\psi\phi} using the flavor-tagged decay B s 0 → J / ψ ϕ → superscript subscript 𝐵 𝑠 0 𝐽 𝜓 italic-ϕ B_{s}^{0}\to J/\psi\phi in 8 fb -1 of p p ¯ 𝑝 ¯ 𝑝 p\bar{p} -collisions. Physical Review D 85.3 (2012): 032006.
8[8] Winkler, R. L., Murphy, A. H. (1968). “Good” probability assessors. Journal of applied Meteorology, 7(5), 751-758.