A parallel algorithm for fast reconstruction of primary vertices on heterogeneous architectures

Agnieszka Dziurda; Maciej Giza; Vladimir V. Gligorov; Wouter Hulsbergen; Bogdan Kutsenko; Saverio Mariani; Niklas Nolte; Florian Reiss; Patrick Spradlin; Dorothea vom Bruch; Tomasz Wojton

PMC · DOI:10.1140/epjc/s10052-025-14225-7·June 2, 2025

A parallel algorithm for fast reconstruction of primary vertices on heterogeneous architectures

Agnieszka Dziurda, Maciej Giza, Vladimir V. Gligorov, Wouter Hulsbergen, Bogdan Kutsenko, Saverio Mariani, Niklas Nolte, Florian Reiss, Patrick Spradlin, Dorothea vom Bruch, Tomasz Wojton

PDF

Open Access

TL;DR

This paper introduces a new algorithm for quickly and accurately reconstructing particle collision vertices in the LHCb experiment using parallel computing.

Contribution

A novel parallel vertex reconstruction algorithm optimized for heterogeneous computing architectures is introduced.

Findings

01

The algorithm uses cluster finding in trajectory projections and adaptive vertex fitting.

02

Implementations on x86 and GPU architectures are discussed and optimized.

03

Performance is evaluated using simulated data samples.

Abstract

The physics programme of the LHCb experiment at the Large Hadron Collider requires an efficient and precise reconstruction of the particle collision vertices. The LHCb Upgrade detector relies on a fully software-based trigger with an online reconstruction rate of 30\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $MHz$ \end{document}MHz, necessitating fast vertex finding algorithms. This paper describes a new approach to vertex reconstruction developed for this purpose. The algorithm is based on cluster finding within a histogram of the particle trajectory projections along the beamline and on an adaptive vertex fit. Its implementations and optimisations on x86…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Cell lines1

SMOG2— Homo sapiens (Human) · Colon carcinoma · Cancer cell line

Chemicals7

Ar PV proton He GPU silicon aluminium

Diseases1

PV

Figures13

Click any figure to enlarge with its caption.

Primary-vertex-reconstruction (left) efficiency and (right) *z* resolution as a function of the *z* coordinate of the simulated primary vertex. In both plots, the magenta curve refers to the $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$pp+p$$\end{document}$ He sample with the *pp*-optimized algorithm implementation, while the tuned performance is shown in green, blue and red for the *pp*, *p*He and $\documentcl

Primary-vertex-reconstruction efficiency (left) and *z* resolution (right) as a function of the simulated $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hbox {PV}_z$$\end{document}$ for the (red) $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin

Throughput of the algorithms optimised for GPU architecture on various GPU cards and of the x86 one on an AMD EPYC 72F3 server. This includes the preprocessing algorithms producing input to the primary vertex finding and the primary vertex finding algorithm itself. The relative measurement uncertainty of around 0.2% is too small to be seen in the figure Fig. 13Breakdown of the primary vertex reconstruction sequence optimised for (left) x86 and for (right) a RTX A5000 GPU architecture. The primary vertex finding algorithms adds up to 4% and 10% of the total processing time, respectively. For th

A comparison between (left) the average number of visible *pp* and *p*-gas interactions and between (right) the number of reconstructed VELO tracks in the PVs. For both comparisons, the minimum bias samples are simulated with Run 2 (dark blue histogram) and Run 3 (orange and light blue histograms) beam conditions

The VELO detector geometry, with modules depicted in blue. The *x–z* coordinate system is also shown. Reproduced from Ref. [[11](#CR11)]

Simulated (left) *x* and (right) *z* distribution of *pp* collisions with Run 3 beam conditionsFig. 4(Left) distance between the *x*-position of the track, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_\text {trk }$$\end{document}$ , when extrapolated to the *z*-position of its origin PV and the simulated *x*-position of the PV, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepacka

Typical histogram filled by the PV reconstruction algorithm with the $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$z_\text {poca}$$\end{document}$ values, following the method explained in the text. The orange markers indicate the position of the simulated reconstructible vertices, defined in Sect. 4. The right plot is a zoom of the top distribution on some of the identified peaks. These are shown as blue vertic

Multivertex track weights as function of $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document}$ for a vertex candidate with no competing vertices (blue), a competing vertex with $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{d

Primary-vertex-reconstruction efficiency as function of (left) its multiplicity and (right) its simulated *z* position. The red squares, blue circles and green triangles points are obtained using the dedicated x86, GPU and Run2-like implementation, respectively, the grey histograms show the distribution of simulated primary vertices and the hollow red, blue, green points the number of reconstructed primary vertices in the x86, GPU and Run2-like cases, respectivelyFig. 8Relative primary-vertex-reconstruction efficiency for the (left) GPU and (right) x86 algorithm implementation as a function of

Primary-vertex resolution for (left) *x* and (right) *z*-coordinate as (top) function its multiplicity and (bottom) the simulated primary-vertex *z* position. The red squares, blue circles and green triangles points are obtained from the x86, GPU and Run2-like implementation of the primary-vertex-reconstruction algorithm, respectively

Funding2

—http://dx.doi.org/10.13039/501100000781European Research Council
—http://dx.doi.org/10.13039/501100004569Ministerstwo Edukacji i Nauki

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParticle physics theoretical and experimental studies · Algorithms and Data Compression · Particle Detector Development and Performance

Full text

Introduction

For Run 3 (2022–2026) of the Large Hadron Collider (LHC), the LHCb Upgrade I detector [1, 2] is designed to take data at an instantaneous luminosity of $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {L}} = 2 \times 10^{33} ~\textrm{cm} ^{-2} ~\textrm{s} ^{-1}$$\end{document}$ . This is five times larger than in previous data-taking periods and corresponds to an average number of five visible interactions per proton-proton (pp) bunch crossing (“event”), denoted as $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu = 5$$\end{document}$ . The detector also includes an improved fixed-target system, called SMOG2 [3–5], consisting of a storage cell confining target gas in a 20 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$~\textrm{cm}$$\end{document}$ -long region upstream of the nominal pp interaction point. By exploiting the interaction between the LHC beam protons and injected gas (p-gas), LHCb is the only experiment at the LHC capable of simultaneously acquiring pp and p-gas collisions. To cope with the increased event rate, LHCb has implemented its real-time data processing (“trigger”) in a heterogeneous farm of Graphics Processing Unit (GPU) and Central Processing Unit (CPU) processors. The task of this full software trigger [6, 7] is to process data from the detector at a frequency of up to 30 MHz. The raw data rate is reduced from 4 TB/s to around 10 GB/s and then recorded to permanent storage. The trigger is divided into two stages: the first one (HLT1) runs on GPUs and reduces the data rate by a factor of around 30; the second one (HLT2) runs on x86 CPU processors. Each algorithm in the LHCb event reconstruction software has been updated to achieve the desired event throughput and physics performance. This work, ongoing since 2015, has required LHCb to overhaul its previously sequential reconstruction code, in order to exploit modern parallel computing architectures.

A particularly important part of the LHCb reconstruction is finding the positions of the pp collisions (or primary vertices, PVs), and estimating which charged particle trajectories (tracks) are produced in each PV. In physics analyses, the decay time of long-lived particles is estimated using the positions of the primary and secondary vertices, with the latter referring to the point where a particle decays. Additionally, imposing conditions on whether particles are produced at or away from the PV, when appropriate, serves as a powerful criterion for suppressing background contributions. In the LHC Run 2 (2015–2018) the data were taken with $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu = 1.1$$\end{document}$ . The PV finding algorithms [8, 9] were optimised to this particular running condition by maximising their efficiency and minimising the rate of wrongly reconstructed PVs. A comparison of the average number of visible pp interactions in Run 2 and Run 3 simulated minimum bias events is shown in the left part of Fig. 1. A significant increase of interactions for Run 3 and the coexistence of pp and p-gas collisions require a full physics performance reoptimisation.

To meet all of these challenges a new and intrinsically parallel PV reconstruction algorithm has been developed. In this paper, we describe the key principles of this algorithm, its physics and throughput performance estimated on simulated events. We demonstrate that the algorithm fits within the LHCb real-time processing resources. LHCb ’s heterogeneous processing framework, Allen [10], can be used to compile an algorithm for both x86 and GPU architectures. This feature is particularly important when executing LHCb ’s GPU processing algorithms on CPU clusters while producing simulated events. However, LHCb has developed two distinct implementations, each with a logic optimised for the given architecture. Throughout this paper the “x86 algorithm” refers to the dedicated x86 implementation, rather than the GPU algorithm compiled for x86 architectures. We compare the x86 and GPU implementations, explaining the different algorithmic choices made to optimise performance on each architecture.Fig. 1A comparison between (left) the average number of visible pp and p-gas interactions and between (right) the number of reconstructed VELO tracks in the PVs. For both comparisons, the minimum bias samples are simulated with Run 2 (dark blue histogram) and Run 3 (orange and light blue histograms) beam conditions

The LHCb Upgrade I detector

The LHCb Upgrade I detector [1, 2] is a single-arm forward spectrometer covering the pseudorapidity range $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2<\eta <5$$\end{document}$ , designed for the study of particles containing b or c quarks. The LHCb coordinate system is a right-handed Cartesian system with its origin at the interaction point. The x-axis is oriented horizontally towards the outside of the LHC ring, the y-axis is pointing upwards with respect to the beamline and the z-axis is aligned with the beam direction.

In the context of the PV reconstruction, the most important component is the silicon pixel vertex detector (VELO), which surrounds the interaction region in the forward and backward directions as presented in Fig. 2. The minimal distance of the silicon sensors to the beam is 5.1 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$~\textrm{mm}$$\end{document}$ , in comparison to 8.2 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$~\textrm{mm}$$\end{document}$ for the 2010–2018 VELO. A thin aluminium envelope separates the vacuum around the LHCb VELO from the LHC beam vacuum.Fig. 2. The VELO detector geometry, with modules depicted in blue. The x–z coordinate system is also shown. Reproduced from Ref. [11]

The LHCb PV reconstruction algorithm uses as input tracks reconstructed in the VELO, commonly referred to as VELO tracks. In Run 3 these tracks are reconstructed using the algorithm implemented in x86 [12] and GPU [13] architectures. Since there is negligible magnetic field in the VELO [14], charged particle trajectories are reconstructed as straight lines and their momentum cannot be measured. Instead, VELO track segments are assigned a transverse momentum of 0.4 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$~\mathrm {GeV/}c$$\end{document}$ . The reconstructed pseudorapidity of the track is then used to estimate the momentum. A simplified Kalman filter, which includes the effects of multiple scattering, is performed to estimate the VELO track x- and *y-*coordinate positions $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(x_\text {trk },y_\text {trk })$$\end{document}$ , direction $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(t_{x,\text {trk}} \equiv \textrm{d}x/\textrm{d}z,t_{y,\text {trk}} \equiv \textrm{d}y/\textrm{d}z)$$\end{document}$ , and their covariance matrix V at a given z-coordinate $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$z_\text {trk } $$\end{document}$ near the interaction point.

A comparison of the number of reconstructed VELO tracks used to form reconstructed PVs for simulated samples with Run 2 and Run 3 beam conditions is shown in the right part of Fig. 1. The essential metric used to determine if a track comes from a primary vertex or from a secondary decay of a long-lived particle is the distance of closest approach of a track to a vertex, called “impact parameter” (IP). Depending on the use-case, the IP $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document}$ , which is the $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document}$ difference of a PV reconstructed with and without the track under consideration, is sometimes preferred.Fig. 3. Simulated (left) x and (right) z distribution of pp collisions with Run 3 beam conditionsFig. 4(Left) distance between the x-position of the track, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_\text {trk }$$\end{document}$ , when extrapolated to the z-position of its origin PV and the simulated x-position of the PV, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hbox {PV}_{x}$$\end{document}$ . (Right) difference between the z-position of the point of closest approach to the beamline of a track originating from the PV, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$z_\text {poca}$$\end{document}$ and the simulated $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hbox {PV}_{z}$$\end{document}$ . Both distributions are obtained using simulated samples with Run 3 beam conditions

Primary vertex finding

Primary-vertex-finding algorithms generally consist of a partitioning (or “seeding”) step to combine tracks into vertex candidates, followed by an adaptive least squares fit that estimates the vertex position and associated covariance matrix. Traditionally, the partitioning is performed by constructing valid two-prong vertices starting from track pairs [15]. These pairs are than combined with the remaining unused tracks in the event to construct multitrack seeds. Because this approach is combinatorial in nature, its complexity grows approximately quadratically with the number of tracks if the number of vertices is larger than one.

The algorithm described here uses a different technique that avoids track-track or track-vertex combinatorics. Its seeding step consists of a one-dimensional histogramming and peak search in the coordinate along the collision axis z. Similar approaches can be found in other LHC experiments [16–18]. In the LHCb experiment, this approach exploits the geometry of the pp interaction region that is spread out in z but narrow in x and y,1 as shown in Fig. 3.

The algorithm relies on external information of the location of the interaction region. For pp collisions, the interaction region is parametrised as a “beamline” with average transverse position coordinates $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_\text {b} $$\end{document}$ and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y_\text {b} $$\end{document}$ and direction $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t_{x,\text {b}} \equiv \textrm{d}x/\textrm{d}z$$\end{document}$ and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t_{y,\text {b}} \equiv \textrm{d}y/\textrm{d}z$$\end{document}$ at $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$z=0$$\end{document}$ . During a data-taking period, called “fill”, the LHC beamline does not change on scales relevant to the PV finding algorithm, while it can change between fills. Therefore, a dedicated algorithm (not described further here) is executed in less than a second at the start of every fill to determine the beamline position with a few microns uncertainty. This is subsequently stored in a database and propagated to the PV reconstruction algorithm. For p-gas fixed target collisions, a single beam passing through the target gas is used, resulting in a beamline inclination $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t_{x,\text {b}} $$\end{document}$ , and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t_{y,\text {b}} $$\end{document}$ equal to half of the effective crossing angle of the two beamlines in the pp interaction region. The beam inclination effect on the PV reconstruction is significantly more pronounced in p-gas collisions, due to the greater lever arm. In contrast, it is negligible for the pp interaction region. The angles are retrieved from the LHC database and propagated to the PV reconstruction algorithm for each fill.

The main motivation for the one-dimensional histogramming approach can be understood by comparing the distribution of PVs in the x and z coordinates shown in Fig. 3 with the resolution of the track coordinates in simulated events. The left part of Fig. 4 presents the difference between the reconstructed track coordinate x extrapolated to the true z position of the associated PV. The resolution in the track coordinate transverse to the beamline is comparable to the size of the interaction region. Consequently, when assigning individual tracks to PVs, the spread in the transverse coordinate of the PVs is not relevant.

On the other hand, for most tracks the resolution of the coordinate along the beamline is more than sufficient to separate PVs. This coordinate is defined as the z-coordinate of the point of closest approach to the beamline, given by

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \begin{aligned} z_\text {poca}&= z_\text {trk } + \frac{(t_{x,\text {trk}}-t_{x,\text {b}}) (x_\text {b}- x_\text {trk }) }{ (t_{x,\text {trk}}-t_{x,\text {b}})^2 + (t_{y,\text {trk}}-t_{y,\text {b}})^2 }\\&\quad + \frac{(t_{y,\text {trk}}-t_{y,\text {b}}) (y_\text {b}- y_\text {trk }) }{ (t_{x,\text {trk}}-t_{x,\text {b}})^2 + (t_{y,\text {trk}}-t_{y,\text {b}})^2 }, \end{aligned} \end{aligned}$$\end{document}

where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_\text {trk }$$\end{document}$ , $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y_\text {trk }$$\end{document}$ , $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$z_\text {trk }$$\end{document}$ , $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t_{x,\text {trk}}$$\end{document}$ , $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t_{y,\text {trk}}$$\end{document}$ are the track parameters, while $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_\text {b}$$\end{document}$ , $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y_\text {b}$$\end{document}$ , $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t_{x,\text {b}}$$\end{document}$ , $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t_{y,\text {b}}$$\end{document}$ are the beamline parameters. The right part of Fig. 4 shows the difference between the reconstructed $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$z_\text {poca} $$\end{document}$ and the true z coordinate of the associated PV. The distribution is much narrower than the spread of PVs in z-coordinate shown in the right part of Fig. 3. To find the PVs, the $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$z_\text {poca} $$\end{document}$ values of all track segments in an event are filled into a histogram. Since the tracks originating from a certain PV should have similar values of $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$z_\text {poca}$$\end{document}$ , peaking distributions in the histogram indicate the presence of a PV, roughly at the z-position of the peak.

The PV finding algorithm consists of the following steps:

VELO tracks extrapolation to the point of closest approach to the beamline;
Histogram filling with the $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$z_\text {poca} $$\end{document}$ value of each track;
Peak search in the histogram;
Track association to the identified peaks;
Vertex fit using the assigned tracks. The major difference between the two hardware architectures occurs at the track association stage and propagates to the fitting procedure.

VELO track preparation

A simplified Kalman filter, which includes the effects of multiple scattering, is performed to estimate the track parameters which are subsequently used to compute $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$z_\text {poca} $$\end{document}$ according to Eq. (1). The uncertainty in $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$z_\text {poca} $$\end{document}$ strongly depends on the track slope and the distance to the first hit on the track. For the performance of the histogramming method, it is important to exploit the variation in this uncertainty.

The estimated $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$z_\text {poca}$$\end{document}$ uncertainty can be computed from the state covariance matrix. Given the track parameter covariance matrix V at position $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$z_\text {trk } $$\end{document}$ , the covariance matrix for the transverse coordinates extrapolated to position z is given by

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \begin{aligned} V_{x,y} (z)&\; = \; \begin{pmatrix} V_{xx} & V_{xy} \\ V_{xy} & V_{yy} \\ \end{pmatrix} \\&\quad + \; \begin{pmatrix} 2 {\varDelta z}\, V_{x t_x} & {\varDelta z}\, V_{x t_y} \\ {\varDelta z}\, V_{x t_y} & 2 {\varDelta z}\, V_{y t_y} \\ \end{pmatrix} \\&\quad + \; \begin{pmatrix} {\varDelta z}^2 \, V_{t_x t_x} & {\varDelta z}^2 \, V_{t_y t_x}\\ {\varDelta z}^2 \, V_{t_y t_x} & {\varDelta z}^2 \, V_{t_y t_y} \\ \end{pmatrix}, \end{aligned} \end{aligned}$$\end{document}

with $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varDelta z}= z - z_\text {trk } $$\end{document}$ and where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V_{ij}$$\end{document}$ indicates the ij element of the covariance matrix V.

In the simplified VELO track fit, the multiple scattering is treated independently in x and y. As a consequence, the estimated x and y track coordinate uncertainties are also assigned identical magnitudes and treated as uncorrelated. Under these assumptions, linear error propagation of Eq. (1) results in

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \sigma _\text {poca}= \sqrt{V_{xx} / \left( \left( t_{x,\text {trk}}- t_{x,\text {b}} \right) ^2 + \left( t_{y,\text {trk}}- t_{y,\text {b}} \right) ^2 \right) }. \end{aligned}$$\end{document}

The uncertainty is approximately inversely proportional to the track slope.

Together with the track parameters, this constitutes all the necessary inputs for later algorithm stages. The track parameters and the inverted covariance matrix W of each track are computed once and used throughout the rest of the algorithm. This approach has been found to have a negligible impact on the performance of the algorithm, but improves its throughput.

Histogram filling

In the next step, the $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$z_\text {poca} $$\end{document}$ values of the tracks are used to fill a histogram. The histogram boundaries are $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$[-550,300]$$\end{document}$ $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$~\textrm{mm}$$\end{document}$ , spanning both the main pp interaction region ( $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$z \approx 0$$\end{document}$ $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$~\textrm{mm}$$\end{document}$ ) and the SMOG2 gas cell ( $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$z\approx -450$$\end{document}$ $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$~\textrm{mm}$$\end{document}$ ). The bin size, dz, is chosen to be 0.25 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$~\textrm{mm}$$\end{document}$ . In the peak search, the minimum distance between two peaks is two times the bin size. Consequently, the bin size determines the minimal PV separation. In practice, in the pp-interaction region the algorithm cannot separate vertices that are closer than about 2 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$~\textrm{mm}$$\end{document}$ .

To reduce effects due the choice of the bin size and the uncertainty of the track extrapolation, a single track may contribute to multiple bins. The contribution to bin i with bin boundaries $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$[z_{i,\text {min}},z_{i,\text {max}}]$$\end{document}$ is calculated by the integral over the bin of a Gaussian distribution with mean $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$z_\text {poca} $$\end{document}$ and standard deviation $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma _\text {poca}$$\end{document}$ . With the given histogram range, the contributions $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\omega _i$$\end{document}$ for a single track add up to unity. To reduce computation costs, the integrals are only performed on a finite number of bins neighbouring the central bin that contains $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$z_\text {poca} $$\end{document}$ . Furthermore, the maximal $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma _\text {poca}$$\end{document}$ for adding a track to the histogram is set to 1.5 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$~\textrm{mm}$$\end{document}$ for the pp interaction region, and 10 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$~\textrm{mm}$$\end{document}$ for the gas-cell region. An example histogram covering part of the z-range is shown in left Fig. 5, where peaking structures likely corresponding to PVs for pp collisions can be seen. Positions of the reconstructible simulated PVs, defined as in Sect. 4, are also indicated by the orange markers.

The integrals of the Gaussian kernel are relatively expensive to compute. Therefore, we have developed two types of approximations. The x86 implementation works with a finite set of template histograms, selected based on the value of $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma _\text {poca}$$\end{document}$ . In the GPU implementation, we rely on a cubic polynomial approximation of the cumulative density of the Gaussian distribution.Fig. 5. Typical histogram filled by the PV reconstruction algorithm with the $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$z_\text {poca}$$\end{document}$ values, following the method explained in the text. The orange markers indicate the position of the simulated reconstructible vertices, defined in Sect. 4. The right plot is a zoom of the top distribution on some of the identified peaks. These are shown as blue vertical lines and the borders of the peaks as vertical red lines

Peak search

After filling the histogram, a peak search is performed. In a first step, “proto-clusters” are identified as regions of subsequent bins with content above a threshold. Since two PVs might be very close in z, such that there are no bins below the threshold separating their proto-clusters, a dip search is then performed to be able to split them into seed clusters. First, all significant minima and maxima in the range of histogram bins of a proto-cluster are identified. The proto-cluster is then split into seed clusters at minima which have two neighbouring maxima. The splitting is only done if the track integral of the resulting seed cluster is above a threshold, which effectively means that enough tracks will contribute to the vertex fit.

The logic of the splitting of proto-clusters differs between the x86 and GPU implementations. While the GPU implementation iterates through the minima ordered in z as potential splitting points, the CPU implementation considers the smallest minimum between the two largest maxima and does the splitting recursively.

In the second step, the z-position of a cluster is computed as

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} z_\text {seed} \; = \; z_{i} + \delta \times dz, \end{aligned}$$\end{document}

where i is the bin with the maximum content in the cluster and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$z_i$$\end{document}$ is the midpoint of this bin. The correction $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\delta $$\end{document}$ is computed from the bin content $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N_i$$\end{document}$ and that of the neighbouring bins as

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \delta = \frac{1}{2} \frac{N_{i+1} - N_{i-1}}{2N_{i} - N_{i+1} - N_{i-1}}. \end{aligned}$$\end{document}

As $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N_{i\pm 1} < N_i$$\end{document}$ , this correction is in the range $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$[-1/2, 1/2]$$\end{document}$ and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\delta \rightarrow \pm 1/2$$\end{document}$ in the limit $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N_{i\pm 1} \rightarrow N_i$$\end{document}$ . By construction, the produced clusters are ordered in z. Once the z-position of the seed is computed, its xy position is evaluated using the known beamline.

The right part of Fig. 5 shows the result of the seed reconstruction, zoomed in on a subset of the identified peaks, for a typical event. The edges of each cluster are indicated in red, while the identified peaks are denoted as blue lines. In this event, the simulated PV close to $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$z=-40~\textrm{mm} $$\end{document}$ is not reconstructed by the algorithm.

Tracks association

A different tracks-to-PV association strategy is chosen for the x86 and the GPU algorithm implementations. While in the former a track is only associated to one PV, in the latter each track can contribute to multiple PVs with different weights.

The CPU time consumption of the x86 algorithm is dominated by the vertex fit. The time is proportional to the number of tracks per vertex, the number of vertices and the number of iterations of the vertex fit. Therefore, to minimise the CPU time, every track is associated to a single vertex and the track-to-vertex assignment is stored in a look-up table. First, every bin in the $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$z_\text {poca}$$\end{document}$ histogram is assigned an index corresponding to the closest cluster. This is performed by first determining partitioning points, which are defined as the bin in between of the upper bound of one cluster and the lower bound of the next cluster. Subsequently, the bins in between the partitioning points are assigned: this requires effectively a single loop over all bins. Once the histogram bins have been assigned, for every track the index of its vertex is found by computing its $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$z_\text {poca}$$\end{document}$ bin, which requires a single loop over all tracks. After the partitioning is performed, the number of tracks per cluster is determined: In the rare case that this is smaller than a threshold (respectively 3 for the pp interaction region and 2 for the SMOG2 region), the seed cluster is removed form the list of clusters and the procedure is repeated. The advantage of this approach is that it does not require track-vertex combinatorics, nor any comparison operations.

The iterative procedure is less suitable for the GPU implementation as tracks have to be re-distributed if a vertex candidate has too few associated tracks. This creates dependencies between the vertex candidates and limits the parallelism that can be achieved. In contrast, the computation of each track’s PV weights can be fully parallelised. The implicit track association is therefore chosen, allowing tracks to contribute to more PVs during the vertex-fit procedure, as described in the next section.

Vertex fit

To improve the resolution of the vertex position and compute the associated covariance matrix, each seed is fitted with a least squares method. The general vertex-fit procedure is first described for the case of the explicit track association, and subsequently the modifications needed for the implicit case are discussed.

The vertex fit minimises a $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document}$ defined as

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \chi ^2 \; = \; \sum _{\text {tracks}\, i} w_i \, \chi ^2_i \end{aligned}$$\end{document}

with respect to the vertex position. In this expression, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w_i$$\end{document}$ is the weight of the track i, discussed later, and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2_i$$\end{document}$ is the contribution of track i to the vertex. The latter can be written as

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \chi ^2_i \; = \; r_i^{T} \, V_i^{-1} \, r_i, \end{aligned}$$\end{document}

where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r_i$$\end{document}$ is the residual of the track i and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V_i$$\end{document}$ is the state covariance matrix. In the vertex fit where tracks are explicitly associated to the vertex, the parameters of the fit are the vertex position $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\vec {x}_\text {vtx}$$\end{document}$ and the outgoing momentum vectors (or eventually direction vectors) $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\vec {p}_i$$\end{document}$ of all of the tracks. The five-component residual can then be expressed as

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} r_i \; = \; m_i \, - \, h_i( \vec {x}_\text {vtx}, \vec {p}_i ), \end{aligned}$$\end{document}

where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$m_i$$\end{document}$ are the (five) track parameters and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$h_i$$\end{document}$ is usually called the measurement model. In this case $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V_i$$\end{document}$ is the covariance matrix of the track parameters $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$m_i$$\end{document}$ . An efficient implementation of the ordinary LHCb vertex fit can be found in [19].

The minimisation of the $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document}$ to the momentum vectors of the tracks makes the vertex fit nonlinear. Therefore, to minimise CPU costs, it is chosen not to minimise with respect to these parameters for the PV fit. In the first iteration of a vertex fit, the momentum parameters are usually initialised with the measured momentum parameters of the track. As a result only two components of the residual are nonzero. These can be chosen as

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} r_i \; = \; \begin{pmatrix} x_\text {trk i} + (z_\text {vtx}- z_\text {trk i}) \cdot t_{x,\text {trk i}}- x_\text {vtx} \\ y_\text {trk i} + (z_\text {vtx}- z_\text {trk i}) \cdot t_{y,\text {trk i}}- y_\text {vtx} \end{pmatrix}. \end{aligned}$$\end{document}

The corresponding $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2\times 2$$\end{document}$ covariance matrix $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V_i$$\end{document}$ is given by Eq. (2) with $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varDelta z}= z_\text {vtx}- z_\text {trk i} $$\end{document}$ . The disadvantage of this choice for the residual $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r_i$$\end{document}$ is that the covariance matrix $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V_i$$\end{document}$ of the residual depends on the vertex position. To minimise CPU costs, the covariance matrices $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V_i$$\end{document}$ are evaluated and inverted only once, using the vertex position of the seed, discussed above. Since the initial z position is close to the final fitted z position for the majority of vertex seeds, this choice has negligible impact on the performance. Furthermore, because the x and y projections of the VELO track fit are independent, the matrix $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V_i$$\end{document}$ is diagonal, which can be exploited to simplify the expressions in the actual implementation.

The vertex $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document}$ is minimised with the Newton–Raphson method. The first and second derivatives of the $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document}$ are computed with respect to the vertex parameters $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha \equiv (x_\text {vtx},y_\text {vtx},z_\text {vtx})$$\end{document}$ and are given by

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \frac{\partial \chi ^2}{\partial \alpha } \;= & \; 2 \sum _{\text {tracks}\, i} w_i H_i^T V_i^{-1} r_i, \end{aligned}$$\end{document}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \frac{\partial ^2 \chi ^2}{\partial \alpha ^2} \;= & \; 2 \sum _{\text {tracks}\, i} w_i H_i^T V_i^{-1} H_i, \end{aligned}$$\end{document}

where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H_i$$\end{document}$ is the derivative (or projection) matrix

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} H_i \; \equiv \; \frac{\partial r_i}{\partial \alpha } \; = \; \begin{pmatrix} -1 & 0 & t_{x,\text {trk}} \\ 0 & -1 & t_{y,\text {trk}} \end{pmatrix}. \end{aligned}$$\end{document}

Note that in these expressions we explicitly ignore the dependence of $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V_i$$\end{document}$ (and eventually $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w_i$$\end{document}$ ) on $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document}$ . Given an initial vertex position $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _0$$\end{document}$ , the solution that minimises the $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document}$ is now given by

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \alpha _1 \; =\; \alpha _0 - \left( \left. \frac{\partial ^2 \chi ^2}{\partial \alpha ^2}\right| _{\alpha _0} \right) ^{-1} \left. \frac{\partial \chi ^2}{\partial \alpha }\right| _{\alpha _0}, \end{aligned}$$\end{document}

with the derivatives evaluated using $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha =\alpha _0$$\end{document}$ . The estimated covariance matrix for $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _1$$\end{document}$ is

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} C \; = \; \left( \frac{1}{2} \, \left. \frac{\partial ^2 \chi ^2}{\partial \alpha ^2}\right| _{\alpha _0} \right) ^{-1}. \end{aligned}$$\end{document}

The expected $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document}$ of the new solution can be computed as

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \chi ^2_1 \; = \; \chi ^2_0 + \varDelta \chi ^2, \end{aligned}$$\end{document}

with the expected change in $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document}$ given by

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \varDelta \chi ^2&= \; (\alpha _1 - \alpha _0) \left. \frac{\partial \chi ^2}{\partial \alpha }\right| _{\alpha _0} + \frac{1}{2} (\alpha _1 - \alpha _0)^2 \left. \frac{\partial ^2 \chi ^2}{\partial \alpha ^2}\right| _{\alpha _0} \end{aligned}$$\end{document}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned}&\; = \; \frac{1}{2} (\alpha _1 - \alpha _0) \left. \frac{\partial \chi ^2}{\partial \alpha }\right| _{\alpha _0}. \end{aligned}$$\end{document}

If not for the presence of the weights $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w_i$$\end{document}$ , the solution would be exact and no fit would be required. In practice, the weights discussed below make the fit strongly nonlinear. Therefore the vertex fit requires multiple iterations, with the residuals and derivatives for the next iteration evaluated using the last best estimate of the vertex position. A convergence criterion is chosen based on the $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varDelta \chi ^2$$\end{document}$ in Eq. (17) and the observed change in $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$z_\text {vtx} $$\end{document}$ . The motivation not to rely on $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varDelta \chi ^2$$\end{document}$ only is that because of possible large variations in the weights, the $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varDelta \chi ^2$$\end{document}$ evaluated using Eq. (17) is not always a good estimate of the actual change in $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document}$ . For the x86 implementation the vertex fit is considered converged if $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$|\varDelta \chi ^2|<0.01$$\end{document}$ and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$|\varDelta z_\text {vtx} |<1$$\end{document}$ $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$~\upmu {\textrm{m}}$$\end{document}$ . For the GPU implementation, the criterion is $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$|\varDelta z_\text {vtx} |<0.5$$\end{document}$ $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$~\upmu {\textrm{m}}$$\end{document}$ with no requirement on $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varDelta \chi ^2$$\end{document}$ . To limit the CPU costs due to poorly converging fits, the maximum number of iterations is set to ten. The fits typically converge in three to seven iterations.

To reduce the effect of tracks that are mistakenly assigned to the vertex, track contributions to the vertex $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document}$ are weighted with a weight $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w_i$$\end{document}$ that is a function of $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi _i$$\end{document}$ . In the x86 implementation these weights are chosen according to Tukey’s bi-square function [20, 21],

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} w_i = {\left\{ \begin{array}{ll} \left( 1 - \chi ^2_i/\chi ^2_\text {max}\right) ^2 & \text { for } \chi ^2_i < \chi ^2_\text {max}\\ 0 & \text { for } \chi ^2_i \ge \chi ^2_\text {max}, \end{array}\right. } \end{aligned}$$\end{document}

where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2_\text {max}$$\end{document}$ is a cut-off value set to 12, that was optimised by considering both the effect of the tails on the resolution and the impact on the efficiency of low-multiplicity vertices.

In the GPU implementation, all PVs are fitted simultaneously as inspired by multivertex fitter algorithms [22] and every track is implicitly associated to every vertex. The weights are chosen such that they not only depend on the $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2_i$$\end{document}$ of a track with respect to the closest vertex candidate j, but also on the other vertex candidates [22]

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} w_{ij} = \frac{\exp (-\chi ^2_{ij}/2)}{\exp (-\chi ^2_\text {max}/2)+ \sum _{k} \exp (-\chi ^2_{ik}/2)}, \end{aligned}$$\end{document}

where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2_\text {max}$$\end{document}$ is a cut-off value and the $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2_{ik}$$\end{document}$ is the $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document}$ of track i relative to vertex k, evaluated as in Eq. (7). This means that a track close to two vertex candidates contributes to both but with a smaller weight than would be the case if it had been explicitly assigned to a PV. Figure 6 illustrates $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w_{ij}$$\end{document}$ as function of $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document}$ in the case where no other competing vertices are in proximity and in the case where there are other PVs with $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2=9$$\end{document}$ and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2=3$$\end{document}$ with respect to that track. The parameter controlling the steepness of the curve is $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2_{max}$$\end{document}$ . The smaller its value, the faster the weight falls to zero with increasing $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document}$ . It is not a strict cut-off like in the case of the Tukey weight introduced earlier, but can be understood as the $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document}$ value at which the weight is equal to 0.5 in the case of no other competing vertices. If there are no other consistent vertices nearby, this weight function becomes similar to the Tukey weight. To keep the fit of a certain vertex candidate independent of the other vertex fits in the event, the weight is always calculated using the initial position of the other vertices. To reduce the number of duplicate PVs, where one collision is reconstructed as two separate PVs, an additional step searches for PVs within close proximity (by default requiring the ratio between the difference of the two $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hbox {PV}_{z}$$\end{document}$ position and the sum of the two z variances to be below 25) and rejects the one with fewer associated tracks.Fig. 6. Multivertex track weights as function of $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document}$ for a vertex candidate with no competing vertices (blue), a competing vertex with $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2=9$$\end{document}$ (orange) and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2=3$$\end{document}$ (green), all with $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2_\text {max}=4$$\end{document}$ . The Tukey biweights with a $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2_{max}=12$$\end{document}$ are shown as red dashed line. The grey filled area shows the $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document}$ distribution for two degrees of freedom

After all vertex positions have been obtained, a final selection is applied to reduce the contamination by secondary decay vertices of long-lived particles to an acceptable, below 2%, level. Vertices in the pp interaction region that are not within 0.3 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$~\textrm{mm}$$\end{document}$ distance from the beamline are discarded.

Physics performance

The physics performance is determined with minimum bias samples produced under Run 3 conditions and the full LHCb simulation [23, 24].2 The same events are used for the evaluation of x86 and GPU performance, allowing for a direct comparison of the algorithms. In addition, the Run 2 PV reconstruction algorithm, referred to as Run2-like hereafter, has been optimised for Run 3 conditions [25] and is compared with the x86 and GPU implementations described in this paper. The Allen framework allows to compile algorithms developed for GPU architectures also on x86 architectures. A detailed comparison of the physics performance and reproducibility between different architectures is out of the scope of this paper, but in general agreement at the permille-level is observed.

A reconstructible PV ( $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hbox {PV}^{\mathrm{{MC}}}_{\mathrm{{rcible}}}$$\end{document}$ ) is defined as an inelastic interaction which produces at least four reconstructed VELO tracks. This criterion is lowered to three reconstructed VELO tracks for p-gas collisions due to their lower average PV-track multiplicity. A reconstructed vertex ( $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hbox {PV}^{\mathrm{{REC}}}$$\end{document}$ ) is matched to a simulated reconstructible PV if the distance between the simulated and reconstructed z-coordinate of PV is lower than five times its reconstruction uncertainty. If a simulated PV is matched to more than one reconstructed PV, only the closest match is retained. The reconstructed and matched primary vertices ( $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hbox {PV}^{\mathrm{{REC}}}_{\mathrm{{matched}}}$$\end{document}$ ) are then selected to measure the following figures of merit.

Efficiency, defined as the ratio of reconstructed and matched PVs to the total number of reconstructible PVs in the simulation

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \epsilon = \frac{\# \text {PV}^\text {REC}_\text {matched}}{\# \text {PV}^\text {MC}_\text {rcible}}. \end{aligned}$$\end{document}

A low efficiency would result in some prompt tracks being identified as originating from decays of long-lived particles, increasing background for the real-time processing and physics analyses. 2. Fake rate, defined as the ratio of reconstructed, but not matched PVs to the total number of reconstructed PVs

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} f = \frac{\# \text {PV}^\text {REC} - \# \text {PV}^\text {REC}_\text {matched}}{\# \text {PV}^\text {REC}}. \end{aligned}$$\end{document}

Most fake PVs are secondary vertices from decays of long-lived particles. Thus, a high fake rate would reduce the signal efficiency for physics analyses. 3. Position resolution, defined as the standard deviation of the distribution of the difference between a reconstructed and its matched simulated PV position. The PV resolution is an important component of the decay-time resolution for long-lived particles and of the track IP resolution. For the latter, it is particularly important for high-momentum tracks, which undergo little multiple scattering and whose IP resolution is therefore dominated by the PV resolution itself. 4. Pull, defined as the ratio between the position resolution and the reconstruction uncertainty. An optimal pull distribution has zero mean and unit width, while deviations hint at biases in the PV position reconstruction or a not accurately estimated covariance matrix. The algorithm performance is studied as a function of different quantities such as the PV z position, the number of particles associated to the primary vertex called PV multiplicity, and for different vertex categories. A PV is defined as close if any reconstructible neighbouring PV is closer than 10 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$~\textrm{mm}$$\end{document}$ . Otherwise, the PV is labelled as isolated. For the purpose of performance categorisation, PVs are sorted from highest to lowest VELO track multiplicity (1st, 2nd, 3rd, ...). Finally, PVs which produce particles containing quark species (beauty, charm, strange, other) are benchmarked separately. The PV multiplicity varies across categories, averaging 68, 63, and 37 associated particles for beauty, charm, and strange PVs, respectively. In contrast, the other category has a significantly lower average of just 7 associated particles, resulting in a much lower reconstruction efficiency.

Performance for pp collisions

Table 1. Primary-vertex-reconstruction efficiency for x86, GPU and Run2-like implementations. Different primary vertex categories for the pp conditions are listed as described in the text. All numbers are given in percentagesCategoryx86GPURun2-likeAll93.393.791.3 beauty 98.198.498.2 charm 98.098.398.3 strange 93.593.991.5 other 63.363.548.9 isolated 97.697.695.4 close 89.089.787.1 1st 99.599.599.5 3rd 96.296.595.4 5th 90.591.187.5

The PV reconstruction efficiencies for the different PV categories are summarised in Table 1. On average, both the x86 and GPU implementations reconstruct primary vertices with an efficiency at the level of 93.5%, which is 2% higher than for the Run2-like algorithm. The PV reconstruction efficiency is shown in Fig. 7 as a function of the number of tracks and z position in the simulated pp collisions. The efficiency is expected to be lower for PVs with a smaller number of associated tracks, since the peak resulting from such PVs may not be significant enough to be identified by the peak-finding procedure. This is confirmed by the numbers in Table 1: the PV with the highest multiplicity in the event is found in 99.5% of the cases, while the 5th PV in multiplicity order is only found in about 91% of the cases. On average, the PV efficiency is about 96% for PVs with at least 10 associated tracks. The PV reconstruction efficiency is slightly reduced at the centre of the interaction region along z. This can be explained by the observation that for those z values PVs are more densely populated and are more likely to spatially overlap. Such PVs are harder to distinguish and could be reconstructed as a single PV instead of two distinct ones. Indeed, efficiencies of about 97% and 89% are found for isolated and close vertices, respectively. Both x86 and GPU implementations are highly performant for PVs which produce either beauty or charm particles (about 98%), which are used in the majority of the physics analyses in the LHCb collaboration. In comparison with Run2-like algorithm, both x86 and GPU implementations are about 17% more efficient for finding PVs with less than 10 associated tracks and 3% more efficient in the central z region between $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-50$$\end{document}$ and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$50~\textrm{mm} $$\end{document}$ .

The peak-finding procedure is based on the z-coordinate of the point of closest approach to the beamline, making it reliant on accurate beamline position measurements. Both GPU and x86 algorithms demonstrate robustness, maintaining an efficiency of 99% within a beamline position uncertainty of 50 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\upmu $$\end{document}$ m, as illustrated in Fig. 8. The efficiency remains at 99.9% within a beamline position uncertainty of 20 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu $$\end{document}$ m, which is selected as the threshold for the beamline position calibration.Fig. 7. Primary-vertex-reconstruction efficiency as function of (left) its multiplicity and (right) its simulated z position. The red squares, blue circles and green triangles points are obtained using the dedicated x86, GPU and Run2-like implementation, respectively, the grey histograms show the distribution of simulated primary vertices and the hollow red, blue, green points the number of reconstructed primary vertices in the x86, GPU and Run2-like cases, respectivelyFig. 8Relative primary-vertex-reconstruction efficiency for the (left) GPU and (right) x86 algorithm implementation as a function of beam position offsets in x and y directions

The measured fake rate is 1.7% (1.6%) for the x86 (GPU) implementation, respectively, considering all reconstructed PVs. It reduces to 0.2% (0.6%) for those with at least 10 associated particles. The majority of false PVs belong to the close category for which the fake rates are around 2.5% for both the x86 and GPU implementations. Primary vertices with a smaller number of associated particles and which produce neither beauty nor charm hadrons are more likely to be misidentified. The comparison with Run2-like algorithm shows similar fake rate pattern.

The PV resolutions as a function of the number of tracks in the associated simulated PV and the z position are shown in Fig. 9. The resolution strongly depends on the number of associated particles and degrades for $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-150<z<-100~\textrm{mm} $$\end{document}$ . This is a consequence of the VELO module spacing shown in Fig. 2. The region $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-150<z<-100~\textrm{mm} $$\end{document}$ falls within a gap between the detector layers, where the distance to the closest measured point is larger compared to other regions in z. While this degradation in resolution is quite significant, it should be noted that only a small fraction of the total number of PVs are produced in this z-region so the overall effect on the selection of displaced tracks is small. The x86 and GPU implementations show a similar resolution as the Run2-like algorithm for the z-coordinate and an improvement of $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$5\%$$\end{document}$ for the x- and y-coordinates.Fig. 9. Primary-vertex resolution for (left) x and (right) z-coordinate as (top) function its multiplicity and (bottom) the simulated primary-vertex z position. The red squares, blue circles and green triangles points are obtained from the x86, GPU and Run2-like implementation of the primary-vertex-reconstruction algorithm, respectively

The pull distributions for all implementations show that the PV estimator is unbiased, and uncertainties are well estimated. No dependence is found for the pull mean on either the number of associated tracks in the PV or the z position. Both the x86 and GPU implementations exhibit a similar dependence of the pull width on the number of associated tracks, following a pattern comparable to the resolution shown in Fig. 9. No dependence is found for the z position.

The reconstructed PVs in the x86 and GPU implementations are also mutually tested, considering matched PVs as those reconstructed with a smaller distance than three times their combined uncertainty. About 98% of PVs are matched positively, with a correlation between matched PVs x and z-coordinates of 94.5 and 99.9%, respectively. For the x-coordinate, the correlation increases to 97% for PVs with at least 10 associated tracks.

A fraction of VELO tracks falls within the acceptance of the rest of the LHCb detector, allowing their momentum to be determined with a precision of $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$0.5{-}1.0$$\end{document}$ % for momenta in $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2{-}100$$\end{document}$ $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$~\mathrm {GeV/}c$$\end{document}$ . These tracks are the primary inputs to LHCb physics analyses, and since their momentum is known, their covariance matrices are more accurate than those of the other VELO tracks. For this subset of tracks, the impact of using the more accurate track parameters and covariance matrices on the PV reconstruction performance has been evaluated with the dedicated x86 implementation. A relative improvement up to 3–5% is seen for the low multiplicity PV resolution in x-direction. A relative improvement below 1% is obtained for z. In the range of $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1/~\mathrm {p_{\text {T}}} <1$$\end{document}$ c/ $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$~\textrm{GeV}$$\end{document}$ a difference up to 3–5% is observed, but the track IP $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document}$ is found to agree very well with the baseline approach. The PV reconstruction efficiencies are not affected. The overall impact of this choice is therefore found to be negligible for the vast majority of use-cases, and the simpler baseline approach of treating all tracks equally is retained.

Fixed-target primary vertex reconstruction

In view of the simultaneous acquisition of pp and p-gas collisions, the PV reconstruction performance is also studied on events including collisions between LHC beam protons and nuclei at rest in the SMOG2 target. The topology of these collisions differs to a large extent from the pp case, as they occur upstream of the baseline interaction region. The lower centre-of-mass energy of p-gas collisions produces PVs that have a lower average track multiplicity and the created particles are boosted in the forward direction because of the asymmetric momentum in the laboratory frame between the beam and the target. This results in a larger uncertainty when extrapolating the VELO tracks towards the beamline. The PV resolution is thus expected to be significantly worse.

The algorithm reconstruction efficiency and resolution are studied on simulated samples with three different conditions:

stand-alone pHe collisions in the SMOG2 cell;
stand-alone pp collisions in the nominal Run 3 conditions;
overlapped pp and p-gas collisions with injected helium or argon as examples of light or heavy target gases. As the rate of p-gas collisions in the simultaneous data-taking scenario is not expected to exceed 0.2 per beam crossing, events are simulated with a single p-gas interaction. In the simulation, the gas is assumed to be confined in the region $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$z \in [-500, -300] ~\textrm{mm} $$\end{document}$ ,3 with a triangular longitudinal profile (see Fig. 10), according to a simplified model of the expected pressure profile within the SMOG2 storage cell. By comparing the performance in the three samples, the effect of the presence of the p-gas collisions on the pp reconstruction performance, and vice versa, is assessed.Table 2. Optimisation of the primary vertex reconstruction for pp and p-gas collisionsParameterpp**p-gasz [mm][ $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-300, 300$$\end{document}$ ][ $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-500,-300$$\end{document}$ ]Min. tracks in the PV43Min. cluster integral2.51.75Max. track $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma _\text {poca}$$\end{document}$ 1.510

Fig. 10. Primary-vertex-reconstruction (left) efficiency and (right) z resolution as a function of the z coordinate of the simulated primary vertex. In both plots, the magenta curve refers to the $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$pp+p$$\end{document}$ He sample with the pp-optimized algorithm implementation, while the tuned performance is shown in green, blue and red for the pp, pHe and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$pp+p$$\end{document}$ He samples, respectively. The longitudinal profile of the simulated positions for reconstructible vertices is also shown, on arbitrary scale, in the left plot

The PV reconstruction efficiency and resolution with the pp-optimized implementation of the algorithm on simulated overlapped pp and pHe are shown as the magenta curves of Fig. 10 as a function of the PV z position. When running the algorithm with optimal pp settings, the efficiency for p-gas vertices is significantly lower and steeply decreases with z. The reason are the tight thresholds set in the histogramming and clustering phases of the algorithm, optimising the speed and the physics performance for pp collisions. A different tuning, summarised in Table 2 [26], is hence defined for p-gas and applied to the only vertex candidates with $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$z<\text {-}300 ~\textrm{mm} $$\end{document}$ , in order to not affect the PV reconstruction performance for pp collisions. As shown in Fig. 10 in red for overlapped pp and pHe simulated collisions, the specific tuning removes the inefficiency, and the algorithm is verified to provide a comparable efficiency for both types of collisions.

This is not the case for the PV resolution, which steeply worsens when moving away from the central VELO region. This is expected as an intrinsic limitation due to the displaced vertex position and large track pseudorapidities in fixed-target collisions. The same conclusions are drawn when considering a heavier gas target, as shown by the performance comparison between the samples with helium or argon gas in Fig. 11, though a better performance can be seen in the argon case, as expected from the higher track multiplicity in such p-gas collisions.

The performance on pp collisions is equivalent in all three conditions. This demonstrates the robustness of the reconstruction algorithm against the additional detector hits introduced by the p-gas collisions. The performance on p-gas collisions is also not affected by the simultaneous presence of pp collisions. Therefore, the results demonstrate that a single vertex reconstruction algorithm, configured differently for the two z regions, achieves optimal performance for both the p-gas and pp physics programs simultaneously.Fig. 11. Primary-vertex-reconstruction efficiency (left) and z resolution (right) as a function of the simulated $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hbox {PV}_z$$\end{document}$ for the (red) $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$pp+p$$\end{document}$ He and (orange) $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$pp+p$$\end{document}$ Ar samples after the threshold tuning

Vectorisation and parallelism

x86

The described algorithm has sections where an operation is performed for multiple or all tracks, which is a prime candidate for vectorisation. The track class model [12] implements a structure of arrays design, where the same data members (e.g. $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_{trk}$$\end{document}$ ) of all tracks are contiguous in memory. Chunks of data members can therefore be loaded into size $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N \in \{4, 8, 16\}$$\end{document}$ vector registers quickly. With the help of vector operations [27], N tracks can then be processed at the same time. The track preparation and extrapolation steps are executed this way and matrix operations are particularly accelerated by using vectorisation. Filling of the histogram, the peak search, and the partitioning all cost little time and are executed sequentially, since the operations there have too many interdependencies to be vectorised efficiently. Tracks are sorted by their partition with a vector gather operation and the vertex fit parallelises over all tracks in the same partition.

Fig. 12. Throughput of the algorithms optimised for GPU architecture on various GPU cards and of the x86 one on an AMD EPYC 72F3 server. This includes the preprocessing algorithms producing input to the primary vertex finding and the primary vertex finding algorithm itself. The relative measurement uncertainty of around 0.2% is too small to be seen in the figure Fig. 13. Breakdown of the primary vertex reconstruction sequence optimised for (left) x86 and for (right) a RTX A5000 GPU architecture. The primary vertex finding algorithms adds up to 4% and 10% of the total processing time, respectively. For the GPU architecture, the time spent in every step of the algorithm is individually measured and the primary vertex fit dominates

GPU

The GPU implementation achieves the necessary throughput by making use of the thread- and block-level parallelism. Since events are independent from each other, batches of several hundred events are processed in parallel, with one block per event for every step of the PV reconstruction. Thread-level parallelism is defined individually for every step as described in the following. The first step, the track extrapolation, can be performed in parallel by assigning one thread to a track since the track states are independent from each other and are read from and saved to distinct places in memory. The histogram can be filled in a similar manner, where one thread is assigned to an extrapolated track, looks up its $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$z_\text {poca}$$\end{document}$ –position and increases the corresponding histogram bin, again taking into account the uncertainty on $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$z_\text {poca}$$\end{document}$ . To avoid race conditions, where two threads access and write to the same memory location at the same time, atomic functions are used.

The peak search follows the same sequential logic of the initial CPU implementation. A possible optimisation would be to subdivide the histogram into different, possibly overlapping regions, where within every region a thread is assigned to identify peaks.

The next two steps, the association of tracks to PV candidates and the PV fitting, are done for every vertex fit in parallel by one thread, and the $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document}$ -sum over all tracks and derived quantities is again parallelised for every vertex candidate. To speed up the calculations and to prevent completely unrelated tracks, whose weights would be almost zero, from contributing to the vertex fit, only tracks within a certain $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2_i$$\end{document}$ with respect to the vertex candidate are considered. To speed up the matrix calculations, they are explicitly written out exploiting the fact that many elements are zero.

Throughput performance

The throughput of the PV reconstruction sequence is measured on a CPU and several GPU cards. This includes the VELO raw data decoding, clustering of individual pixel measurements, VELO tracking and the PV reconstruction itself. It should be noted that the pre-processing algorithms have different implementations optimised for the CPU [12] and the GPU [13] architectures. Therefore, the fraction of time spent on PV finding cannot be compared directly, but gives an indication of the optimisation of the algorithm with respect to the other algorithms in the sequence. The measurement of the GPU throughput includes data transfers between the CPU and the GPU. Several CUDA streams are launched in parallel, each processing separate batches of events, to keep the GPU busy with compute operations while data transfers occur. Figure 12 shows the throughput on the different types of hardware for simulated pp-collision events, while Fig. 13 shows the fraction spent on the PV finding for the algorithm optimised for GPU and x86 architectures, respectively. As discussed in Ref. [28], both implementations, as well as all HLT1 reconstruction algorithms, meet the requirement of processing 30 MHz of input data with the available resources. When processing simulated pHe-collisions, as used in LHCb fixed-target program, the throughput on a single GPU card decreases by 5%.

Conclusion

A new vertex finding algorithm is developed for the high-level trigger of the LHCb Upgrade detector. It is shown to deliver sufficient physics performance while having a high enough throughput. It is demonstrated that the algorithm can be parallelised at different levels and therefore efficient implementations on both x86 and GPU architectures are possible. Both the x86 and GPU implementations outperform the previous Run2-like algorithm in terms of reconstruction efficiency and resolution. It is shown that the algorithm can be further extended to the beam-gas region, which is separated from the nominal pp collision region. With small adjustments of the parameters of the algorithm beam-gas PVs can be reconstructed with high efficiency despite them being more difficult to treat, without disturbing the reconstruction of pp PVs. This offers to the LHCb experiment the possibility to simultaneously take beam-beam and beam-gas collision data.

Bibliography14

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1LH Cb Collaboration, Framework TDR for the LH Cb Upgrade: technical design report. CERN-LHCC-2012-007 (2012). https://cds.cern.ch/record/1443882
2LH Cb Collaboration, R. Aaij et al., The LH Cb Upgrade I. JINST 19, P 05065, 552 (2024). 10.1088/1748-0221/19/05/P 05065
3LH Cb Collaboration, LH Cb SMOG Upgrade. CERN-LHCC-2019-005 (2019). https://cds.cern.ch/record/2673690
4A. Bursche et al., Physics opportunities with the fixed-target program of the LH Cb experiment using an unpolarized gas target. LH Cb-PUB-2018-015 (2018). https://cds.cern.ch/record/2649878
5LH Cb Collaboration, LH Cb trigger and online upgrade technical design report. CERN-LHCC-2014-016 (2014). https://cds.cern.ch/record/1701361/
6LH Cb Collaboration, LH Cb Upgrade GPU high level trigger technical design report. CERN-LHCC-2020-006 (2020). https://cds.cern.ch/record/2717938
7M. Kucharczyk, P. Morawski, M. Witek, Primary vertex reconstruction at LH Cb. LH Cb-PUB-2014-044 (2014). https://cds.cern.ch/record/1756296
8A. Dziurda, Studies of time-dependent CP violation in charm decays of mesons. CERN-THESIS-2015-246. https://cds.cern.ch/record/2115353