Hunting for Galaxies and Halos in simulations with VELOCIraptor

Pascal J. Elahi; Rodrigo Ca\~nas; Rhys J.J. Poulton; Rodrigo J. Tobar,; James S. Willis; Claudia del P. Lagos; and Chris Power; Aaron S.G.; Robotham

arXiv:1902.01010·astro-ph.CO·June 19, 2019

Hunting for Galaxies and Halos in simulations with VELOCIraptor

Pascal J. Elahi, Rodrigo Ca\~nas, Rhys J.J. Poulton, Rodrigo J. Tobar,, James S. Willis, Claudia del P. Lagos, and Chris Power, Aaron S.G., Robotham

PDF

TL;DR

VELOCIraptor is a new, highly parallel galaxy and halo finder that can identify substructures, including tidally disrupted objects and stellar halos, revealing new insights into subhalo distributions within larger halos.

Contribution

The paper introduces VELOCIraptor, a novel parallel code capable of identifying subhalos and disrupted objects, with improved detection of substructure deep within host halos compared to existing methods.

Findings

01

VELOCIraptor detects subhalos with negligible density contrast.

02

Large subhalos are more common near the center of host halos.

03

VELOCIraptor's results align with HBT+ in well-resolved halos.

Abstract

We present VELOCIraptor, a massively parallel galaxy/(sub)halo finder that is also capable of robustly identifying tidally disrupted objects and separate stellar halos from galaxies. The code is written in c++11, use the MPI and OpenMP API's for parallelisation, and includes python tools to read/manipulate the data products produced. We demonstrate the power of the VELOCIraptor (sub)halo finder, showing how it can identify subhalos deep within the host that have negligible density contrasts to their parent halo. We find a subhalo mass-radial distance dependence: large subhalos with mass ratios of $≳ 1 0^{- 2}$ are more common in the central regions that smaller subhalos, a result of dynamical friction and low tidal mass loss rates. This dependence is completely absent in (sub)halo finders in common use, which generally search for substructure in configuration space, yet is present in…

Tables4

Table 1. Table 1 : Key VELOCIraptor parameters

	Name	Default Value	Comments
Field Halos			Related to field halo search.
	$ℓ_{x}$	0.2	Base 3DFOF configuration-space linking length in units of inter-particle spacing. See Eq. (1), Eq. (2) ¹⁵¹⁵15For historical reasons, the code actually uses the substructure linking length to define the halo linking length, i.e., the code actually takes $α_{x, H}$ and $ℓ_{x, S}$ as input. The typical value of 0.2 is the commonly used one for finding 3DFOF halos and corresponds to identifying overdensities of $≳ 100 ρ_{m}$ .
	$α_{v}$	1.25	Scaling of base 6DFOF velocity-space linking length $ℓ_{v}$ in units of local halo velocity dispersion. See Eq. (2). For virialised objects, typical values should be of order unity. Using large values will effectively transform 6DFOF halos to 3DFOF halos.
Substructure			Related to velocity outlier substructure search.
	$N_{v}$	32	Number of particles used to estimate local velocity density.
	$N_{se}$	256	Number of neighbouring particles used to estimate local velocity density.
	$f_{cell}$	0.01	Fraction of halo contain in cell used to estimate background velocity density. See Eq. (5), Eq. (7-9).
	$ℒ_{th}$	2.5	Outlier threshold when linking particles. See Eq. (14a).
	$α_{x, S}$	0.5	Scaling of the $ℓ_{x}$ , halo linking length. See Eq. (14b).
	$𝒱_{r}$	2	Velocity ratio. See Eq. (14c).
	$Θ_{op}$	0.1	Opening angle in units of $π$ . See Eq. (14d).
Core Search & Major Mergers			Related to core/major merger search.
	$α_{x, C}$	0.8	Scaling of 6DFOF configuration-space linking length in core search. See Eq. (18).
	$α_{v, C}$	1.0	Scaling of 6DFOF velocity-space linking length in core search. See Eq. (18).
	$n_{loop, C}$	5	Number of loops to search for cores.
	$f_{C}$	0.02	Fraction of halo a core must contain.
	$α_{N, C}$	1.2	Scaling of $N_{\min, C}$ in core search.
Unbinding/Cleaning			Related to cleaning of structure catalogue and particles associated to structures.
	$N_{\min}$	20	Key Parameter: minimum number of particles an group must contain. This parameter is likely the most often altered one depending on the science goal in question. For instance, for Smooth Particle Hydrodynamical simulations, objects composed of fewer particles than the number used to calculate the hydrodynamics are dominated by numerical artefacts and can be ignored. .
	$β_{E}$	0.95	Key Parameter: fraction of kinetic energy that potential energy must be for particle to be considered bound. See Eq. (17). As the code will naturally recover the continuum of substructures, from well bound, physically overdense, dynamically cold subhalos to unbound, underdense, dynamically cold streams, this parameter needs to be set with the desired catalogue in mind. If the use desires a catalogue of bound subhalos, then this default value is acceptable. To include more tidal debris objects, this parameter should be decreased.
	$β_{ℒ}$	1	Significance of substructure’s outlier average relative to noise. See Eq. (15).
	$β_{C}$	1	Significance of core relative to noise. See Eq. (22).

Table 2. Table 2 : Simulation parameters

Name	Box size	Number of	Particle Mass	Softening
		Particles		Length
	$L_{box}$ [ $h^{- 1} Mpc$ ]	$N_{p}$	$m_{p}$ [ $h^{- 1} M_{⊙}$ ]	$ϵ$ [ $h^{- 1} kpc$ ]
L40N512	$40$	$512^{3}$	$4.13 \times 10^{7}$	2.6
L210N1536	$210$	$1536^{3}$	$2.21 \times 10^{8}$	4.5

Table 3. Table 3 : VELOCIraptor configuration parameters

	Name	Recommended	Comments
		Value
General Search Options			Related to general structure finding.
	Particle_search_type	1	Integer flag setting particle types to search. All [1], Dark Matter [2], Gas [3], Star [4].
	Search_for_substructure	1	Integer flag whether to search for substructure. Yes/No [1/0].
	Halo_core_search	2	Integer flag whether to search for cores. Off [0], Flag if cores found [1], Find and grow cores [2].
	Unbind_flag	1	Integer flag whether to applying an unbinding process (to substructures). Yes/No [1/0].
	Bound_halos	0	Integer flag whether to applying an unbinding process to halos. Yes/No [1/0].
	Baryon_searchflag	2	Integer flag setting how baryons are treated, specifics dependent on Particle_search_type. No special behaviour [0]; Do not search baryons initially when searching for substructure but assign to nearest dark matter substructures in phase-space, works with All particle or DM particle search, baryons [1]; Baryons are also not used to generate links when searching for field FOF objects and also assigned to nearest dark matter substructures in phase-space, works with All particles searched [2]. DM+Baryons mode is Particle_search_type=1 + Baryon_searchflag=2.
	Minimum_size	20	Minimum number of particles a (sub)halo must be composed of.
	Minimum_halo_size	-1	Minimum number of particles a halo must be composed of, allowing for different halo/subhalo minimum sizes. -1 Sets this to Minimum_size.
	Effective_resolution	-1	For multi-resolution cosmological simulations, can set the effective resolution and there by set the inter-particle spacing (used to scale linking lengths). Code must be compiled for zoom simulations. -1 means the masses of dark matter particles are used to estimate the inter-particle spacing.
	Singlehalo_search	0	Integer flag that indicates input consists of a single halo and no field halo search is required. Yes/No [0/1].
Halo Search Options			Related to field halo search.
	FoF_Field_search_type	3	Integer flag setting the field FOF algorithm to use. Adaptive 6DFOF [3], 3DFOF [5], Uniform 6DFOF (single velocity scale for all halos) [4].
	Keep_FoF	0	Integer flag setting whether to keep the 3DFOF envelop of 6DFOF structures, useful for extracting stellar halos. Yes/No [1/0].
	Halo_linking_length_factor	2.0	Factor by which the linking substructure linking length, Physical_linking_length, is multiplied by to find halos.
	Halo_6D_linking_length
	_factor	1.0	Factor by which the initial physical linking length of a 3DFOF halo is multiplied by when running a 6DFOF. Useful for galaxy searches.
	Halo_6D_vel_linking
	_length_factor	1.25	Factor by which the dispersion of a 3DFOF halo is multiplied by when running a 6DFOF, $α_{v}$ to give linking length. See Eq. (2).
Substructure Search Options			Related to velocity outlier substructure search.
	FoF_search_type	1	Integer flag setting the substructure FOF algorithm to use. Standard phase-space algorithm [1], ROCKSTAR like core search only [6].
	Iterative_searchflag	1	Integer flag setting whether iterative search used. Yes/No [1/0].
	Cell_fraction	0.01	Fraction of halo mass in each cell used to calculate background, $f_{cell}$ . See equations (7)-(9).
	Grid_type	1	Integer flag setting type of criterion used to build mesh. Default is configuration-space shannon entropy criterion [1]; full phase-space Shannon entropy criterion [2].
	Nsearch_velocity	32	Integer setting number of velocity neighbours used to calculate local velocity density $N_{v}$ .
	Nsearch_physical	256	Integer setting number of physical neighbours to search when calculating local velocity density $N_{se}$ .
	Outlier_threshold	2.5	Threshold to apply in phase-space algorithm, $ℒ_{th}$ . See Eq. (14a).
	Physical_linking_length	0.1	Physical linking length. For cosmological simulations , the linking length is in units of inter-particle spacing. Otherwise, in internal units. See Eq. (14b).
	Velocity_ratio	2.0	Velocity ratio allowed in phase-space linking. See Eq. (14c).
	Velocity_opening_angle	0.18	Angle between velocity vectors allowed in phase-space linking. See Eq. (14d).
	Velocity_linking_length		.
	Iterative_threshold_factor	1.0	Factor multiplying $ℒ_{th}$ when using iterative method to identify outlier regions associated with the initial candidate list of spatially compact outlier groups, $γ_{ℒ}$ . Typical values are $≳ 1$ .
	Iterative_linking_length
	_factor	2.0	Factor multiplying physical linking length when using iterative method to identify outlier regions associated with the initial candidate list of spatially compact outlier groups, $γ_{l_{x, S}}$ . Typical values are $≳ 1$ . Common to set this value to Halo_linking_length_factor, thus setting the substructure linking length to that of halos.
	Iterative_Vratio_factor	1.0	Factor multiplying velocity ratio when using iterative method to identify outlier regions associated with the initial candidate list of spatially compact outlier groups, $γ_{𝒱_{r}}$ . Typical values are $≳ 1$ .
	Iterative_ThetaOp_factor	1.0	Factor multiplying opening angle when using iterative method to identify outlier regions associated with the initial candidate list of spatially compact outlier groups, $γ_{Θ_{op}}$ . Typical values are $≳ 1$ .
	Significance_level	1.0	Minimum significance level of group, $β_{ℒ}$ , see Eq. (15).
Core Search Options			Related to core/major merger search.
	Use_adaptive_core_search	0	Integer flag setting how linking lengths are scaled when searching for cores. Only scale the velocity linking length dispersions, useful in dark matter only simulations [0]; scale both configuration and velocity linking lengths using dispersions, useful for galaxy searches [1].
	Use_phase_tensor_core
	_growth	2	Integer flag setting setting how cores are grown. Simple assignment using constant configuration and velocity space dispersion [0] (replace distance in Eq. (21) with $D^{2} = (𝐱_{i} - {\bar{𝐱}}_{j}) / σ_{x} + (𝐯_{i} - {\bar{𝐯}}_{j}) / σ_{v}$ ); Calculate distance using phase-space tensor calculated with initial core via Eq. (21); Calculate distances with phase-space tensor via is Eq. (21), where the tensor is recalculated for each active core at each level [2].
	Halo_core_ellx_fac	1.0	Initial factor by which the physical halo linking length is multiplied when starting core search, should be $\leq 1$
	Halo_core_ellv_fac	1.0	Initial factor by which the velocity halo linking length is multiplied when starting core search, should be $\leq 1$ .
	Halo_core_ncellfac	0.0005	Factor that sets the initial mininum number of particles a core must be composed of by $f_{C} N_{H}$ , where $N_{H}$ is the number of particles in the host being searched.
	Halo_core_adaptive
	_sigma_fac	2	When running fully adaptive core search, this specifies the width of the physical linking length in configuration space dispersion, useful when searching for galaxies in hydrodynamical simulations. Typically values are $2$ .
	Halo_core_num_loops	10	Integer setting number of loops when searching for cores, $Δ_{C}$ .
	Halo_core_loop_ellx_fac	0.8	Factor by which the physical linking length is multiplied at each loop when searching for cores, $α_{x, C} < 1$ , see Eq. (18).
	Halo_core_loop_ellv_fac	1.0	Factor by which the velocity linking length is multiplied at each loop when searching for cores, $α_{v, C} < 1$ , see Eq. (18).
	Halo_core_loop_elln_fac	1.2	Factor by which the minimum number of particles for an active core is multiplied, $α_{N, C}$ .
	Halo_core_phase
	_significance	2.0	Significance a core must be in terms of phase-space distance scaled by dispersions, $β_{C}$ , see Eq. (22).
Unbinding Options			Related to processing candidate groups with unbinding routines.
	Unbinding_type	1	Integer flag setting the unbinding criteria that removes particles considered “unbound” (unbound particles meet Eq. (17)) . Remove unbound particles [0], remove unbound particles and possibly loosely bound particles till the fraction of the system has formally bound particles (setting $β_{E} = 1$ in Eq. (17)) [1].
	Allowed_kinetic_potential
	_ratio	0.95	Ratio of kinetic to potential energy at which a particle is still considered bound, $β_{E}$ , see Eq. (17). Values of $β_{E} ≳ 0.95$ keeps particles that are unlikely to leave the halo within a dynamical time, $β_{E} = 1$ is the commonly used value in configuration-space finders, and $β_{E} ≲ 0.95$ allows one to identify unbound tidal debris.
	Min_bound_mass_frac	0.6	Fraction of formally bound particles required $f_{E}$ .
	Softening_length	0	Set the (simple plummer) gravitational softening length. For cosmological simulations, in units of inter-particle spacing.
	Keep_background_potential	0	Integer flag setting whether one keeps the potential of unbound particles during the unbinding process when determining whether particles are unbound. As objects are treated in isolation, it is more self-consistent to ignore the potential of particles removed from a candidate group but this potential can be retained. Yes/No [0/1].
	Kinetic_reference_frame
	_type	0	Integer flag setting the kinetic frame when determining whether particle is bound. Use the central regions near the centre-of-mass to determine the velocity frame [0]; Use the central region around minimum of the potential to determine the velocity frame [1].
	Min_npot_ref	10	Integer setting the minimum number of particles used to calculate the velocity frame.
	Frac_pot_ref	0.1	Fraction of closest particles (to either the centre-of-mass or minimum potential) used to calculate the velocity frame. Typical values are $\sim 0.1$ .
Units			Converts input units to internal/output units. For accuracy, units should be chosen so quantities are close to unity.
	Length_unit	1.0	Conversion to apply to input velocity units to set internal code and output units. Example: if input was in Gpc and wanted units of gigacubits, then value is 69497252252252000.
	Length_unit_to_kpc	1.0	Conversion applied to the output unit to convert it to a standard unit. Example: if input was in Mpc, output was in pc (Length_unit=1e6), then this would be 1e-3.
	Velocity_unit	1.0	Conversion to apply to input velocity units to set internal code and output units. Example: if input was in km/s and wanted units of feet/forthnight, then value is 3.96850394e9.
	Velocity_to_kms	1.0	Conversion applied to the output unit to convert it to a standard unit. Example: if input was in kpc/Gyr, output was in cm/s (Velocity_unit=97781.3106), then this would be 1e-5.
	Mass_unit	1.0	Conversion to apply to input velocity units to set internal code and output units. Example: if input was in 1e10 solar masses and wanted units of milliounces, then value is 7.01634377e34.
	Mass_to_solarmass	1.0	Conversion applied to the output unit to convert it to a standard unit. Example: if input was in g, output was in earth mass (Mass_unit=1.674481e-28), then this would be 0.000003003.
	Hubble_unit	100.0	The unit of Hubble flow in internal code unit and should be H0/h in the internal Length_unit *Velocity_unit. Example: if internal units are kpc and km/s, then should be 0.1, if Mpc and km/s, then 100.
	Gravity	43.021	Gravitational constant in internal units, Length_unit * (Velocity_unit)² / Mass_unit. Example: if internal units are kpc, km/s, and 1e10 solar masses, should be 43.0211349e3.
	Mass_value	1.0	If code is compiled so as to not store mass (useful to save memory when processing N-Body cosmological simulations with uniform mass resolution), this sets the mass of all particles.
Simulation and Cosmology
	Period	1.0	Period of simulation box. For some input formats (Gadget, RAMSES, HDF5), this is taken from the input file.
	Scale_factor	1.0	Scale Factor of simulation box. For some input formats (Gadget, RAMSES, HDF5), this is taken from the input file.
	h_val	1.0	The so-called little h, where the Hubble constant is h*Hubble_unit. For some input formats (Gadget, RAMSES, HDF5), this is taken from the input file.
	Omega_m	1.0	Cosmological matter density $Ω_{m}$ at $a = 1$ . For some input formats (Gadget, RAMSES, HDF5), this is taken from the input file.
	Omega_cdm	1.0	Cosmological cold dark matter density $Ω_{c d m}$ at $a = 1$ . For some input formats (Gadget, RAMSES, HDF5), this is taken from the input file.
	Omega_b	0.0	Cosmological baryon density $Ω_{b}$ at $a = 1$ . For some input formats (Gadget, RAMSES, HDF5), this is taken from the input file.
	w_of_DE	-1.0	Dark energy equation of state. Not yet implemented.
	Omega_Lambda	1.0	Cosmological dark energy density $Ω_{Λ}$ (or more generally $Ω_{DE}$ at $a = 1$ . For some input formats (Gadget, RAMSES, HDF5), this is taken from the input file.
	Critical_density	1.0	Cosmological critical density $ρ_{c} = 3 H^{2} / 8 π G$ at $a = 1$ . For some input formats (Gadget, RAMSES, HDF5), this is taken from the input file.
	Virial_density	500.0	User defined virial overdensity $Δ$ used to calculate the virial mass. If -1, then output virial masses will refer to the Bryan & Norman (1998) overdensity mass.
Output Options
	Snapshot_value	0	Set if halo IDs should be temporally unique, useful for halo merger tree codes and analysing multiple snapshots. The resulting IDs are then the index of a halo +1 + Snapshot_value*TEMPORALHALOIDVAL, where TEMPORALHALOIDVAL= $10^{12}$ when the code is compiled to use long integers, TEMPORALHALOIDVAL=1000000 when the code is compiled to use only integers (useful for small simulations and if worried about memory).
	Inclusive_halo_masses	2	How masses of halos are calculated. Substructure masses are always calculated using particles exclusively belonging to the object but halos can have masses calculated in several different fashions. Use only particles belonging to the object, giving no abrupt change in spherical overdensity masses as a halo transitions from a halo to a subhalo, though overdensity masses now are not full spherical overdensity masses [0]; Use all particles in the FOF envelop, that is include the contribution from substructures [1]; Use all particles centred on the centre-of-mass, regardless of whether they belong to the halo FOF, background or another halo when calculating spherical overdensity masses [2].
	Comoving_units	0	Integer flag indicating whether the properties output is in physical or comoving little h units. Yes/No [1/0].
	Binary_output	2	Integer flag setting output format. HDF [2], binary [1] or ascii [0].
	Write_group_array_file	0	Integer flag indicating whether to write a single large tipsy style group assignment file that list the group id of every particle. Yes/No [0/1].
	Separate_output_files	0	Integer flag indicating whether separate files are written for field and subhalo groups. Yes/No [0/1].
	Extensive_halo_properties
	_output	0	Integer flag setting whether to calculate/output even more halo properties. Yes/No [0/1].
	Extended_output	0	Integer flag indicating whether to produce extended output for quick particle extraction of particles in groups from input file. Requires more memory as particles store input file and index in the file at which they are located. Yes/No [0/1].
	Spherical_overdensity_halo
	_particle_list_output	0	Output list of particles in spherical overdensity regions of halos. Yes/No [0/1].
Input Options
	Cosmological_input	1	Integer flag indicating input data is from a cosmological simulation. Code uses cosmological information. Yes/No [0/1].
	Input_chunk_size	100000	Amount of information to read from input file in one go, useful for managing memory when reading input data.
	NSPH_extra_blocks	0	Integer setting the number of extra gas/SPH particle related data blocks are to be read/are present in the file if loading gadget snapshot.
	NStar_extra_blocks	0	Integer setting the number of extra star particle related data blocks are to be read/are present in the file if loading gadget snapshot.
	NBH_extra_blocks	0	Integer setting the number of extra black hole/sink particle related data blocks are to be read/are present in the file if loading gadget snapshot.
	HDF_name_convention	1	Integer setting the HDF naming convention to use. Currently implemented conventions are for EAGLE, ILLUSTRIS, GIZMO/SIMBA.
	Input_includes_star
	_particle	1	Integer flag indicating star particles are present in the input data file. Yes/No [0/1]
	Input_includes_bh
	_particle	1	Integer flag indicating black hole/sink particles are present in the input data file. Yes/No [0/1]
	Input_includes_wind
	_particle	1	Integer flag indicating wind particles are present in the input data file. Yes/No [0/1]
	Input_includes_tracer
	_particle	1	Integer flag indicating tracer particles are present in the input data file. Yes/No [0/1]
	Input_includes_extradm
	_particle	1	Integer flag indicating extra dark matter types particles are present in the input data file. Yes/No [0/1]
Additional Options
	Verbose	0	Integer flag on how verbose code is during operation. Minimal [0]; Moderate [1]; Very Verbose [2].
	MPI_particle_total
	_buf_size	-1	Total memory size in bytes used to store particles in temporary buffer such that particles are sent to non-reading mpi processes in one communication round in chunks of size MPI_particle_total_buf_size/Number of MPI process/memory to store a particle. Ensures that communications exceed the memory available in the MPI_COMM_WORLD. If -1, code determines this amount, may not be optimal.
	MPI_part_allocation_fac	0.05	Memory allocated in MPI mode to store particles is (1+MPI_part_allocation_fac)*memory to store all particles. This factor should be $≳ 0$ to allow room for particles to be exchanged between MPI threads without requiring new memory to be allocated.

Table 4. Table 4 : VELOCIraptor Outputted halo/galaxy properties.

	Name	Comments
ID and Type information
	ID	Halo ID. ID = index of halo + 1 + TEMPORALHALOIDVAL*Snapshot_value, giving a temporally unique halo id that can be quickly parsed for an index and a snapshot number.
	ID_mbp	Particle ID of the most bound particle in the group.
	hostHaloID	ID of the host field halo. If an object is a field halo, this is -1.
	Structuretype	Structure types contain information on how the object was found and at what level in the subhalo hierarchy. Field halos are 10. Substructures identified using the local velocity field are type 10+10=20, substructures identified using cores are type 10+5=15. For structures found at level 2 (ie: subhalos within subhalos), the type offset is 20, and so on.
	numSubStruct	Number of substructures. Subhalos can have subsubhalos.
Mass and radius properties		All properties are in output units.
	npart	Number of particles belonging exclusively to the object.
	Mass_tot	Total mass of particles belonging exclusively to the object, $M_{tot}$ .
	Mass_FOF	Total mass of particles in the FOF, $M_{FOF}$ . Is zero for substructure.
	Mass_200mean	Overdensity mass defined by the mean matter density, $M_{200 ρ_{m}}$ . For field halos, if inclusive masses are desired, this is based on the particles in the FOF. If full spherical overdensity masses are desired, then includes all particles (whether they belong to the object, the background or another object) within a spherical region. For subhalos, this is based on particles belonging exclusively to the object.
	Mass_200crit	Overdensity mass defined by the critical density, $M_{200 ρ_{c}}$ . Behaviour like Mass_200mean.
	Mass_BN98	Overdensity mass defined by the mean matter density and $Δ (z)$ given by Bryan & Norman (1998), $M_{Δ (z) ρ_{c}}$ . Behaviour like Mass_200mean.
	Mvir	User defined virial mass, $M_{vir}$ . Behaviour like Mass_200mean.
	R_size	Maximum distance of particles belonging exclusively to the object and the centre-of-mass.
	R_200mean	Radius related to overdensity mass Mass_200mean.
	R_200crit	—"—
	R_BN98	—"—
	Rvir	—"—
	R_HalfMass	Half mass radius based on the Mass_tot.
Position and Velocity		All properties are in output units. Objects need not have positions periodically wrapped.
	Xc	$x$ coordinate of centre-of-mass.
	Yc	—"—
	Zc	—"—
	Xcmbp	$x$ coordinate of most bound particle.
	Ycmbp	—"—
	Zcmbp	—"—
	VXc	$v_{x}$ velocity of centre-of-mass.
	VYc	—"—
	VZc	—"—
	VXcmbp	$v_{x}$ velocity of most bound particle.
	VYcmbp	—"—
	VZcmbp	—"—
Velocity and Angular Momentum		All properties are in output units.
	Vmax	Maximum circular velocity based on particles belonging exclusively to the object, where circular velocities are defined by $V_{circ}^{2} = G M / R$ .
	Rmax	Radius of maximum circular velocity.
	sigV	Velocity dispersion based on the velocity dispersion tensor $σ_{v} = {\| Σ \|}^{1 / 6}$ , where $Σ$ is the velocity dispersion tensor.
	veldisp_xx	The $x, x$ component of the velocity dispersion tensor.
	veldisp_xy	—"—
	veldisp_xz	—"—
	veldisp_yx	—"—
	veldisp_yy	—"—
	veldisp_yz	—"—
	veldisp_zx	—"—
	veldisp_zy	—"—
	veldisp_zz	—"—
	Lx	$x$ component of the total angular momentum about the centre-of-mass using particles belonging exclusively to the object.
	Ly	—"—
	Lz	—"—
	lambda_B	Bullock et al. (2001) spin parameter $λ_{B} =$ using total angular momentum and the spherical overdensity mass, $λ_{B} = \frac{J}{\sqrt{2} M V R}$ .
	Krot	Measure of rotational support about the angular momentum axis $κ_{rot} = {\frac{\sum_{i} 1 / 2 m_{i} j_{z, i} r_{i}}{\sum}}_{i} T_{i}$ , where the first sum is over the motion of particles along the angular momentum axis and the second sum is over kinetic energies (see Sales et al., 2010).
Morphology		All properties are in output units.
	cNFW	Calculated assuming an NFW profile Navarro et al. (1997) following Prada et al. (2012) where we solve $\frac{V_{\max}^{2}}{G M_{Δ} / R_{Δ}} - \frac{0.216 c}{\ln (1 + c) - c / (1 + c)} = 0 .$
	q	We calculate the shape using the reduced inertia tensor Dubinski & Carlberg (1991); Allgood et al. (2006), ${\tilde{I}}_{j, k} = \sum_{n} \frac{m_{n} x_{j, n}^{'} x_{k, n}^{'}}{{(r_{n}^{'})}^{2}}$ , where the sum is over particles exclusively belonging to the object and, ${(r_{n}^{'})}^{2} = {(x_{n}^{'})}^{2} + {(y_{n}^{'} / q)}^{2} + {(z_{n}^{'} / s)}^{2}$ is the ellipsoidal distance between the halo’s centre-of-mass and the $n$ th particle, primed coordinates are in the eigenvector frame of the reduced inertia tensor and $q$ & $s$ are the semi-major and minor axis ratios respectively. Thus q is the semi-major axis ratio.
	s
	eig_xx	—"—
	eig_xy	—"—
	eig_xz	—"—
	eig_yx	—"—
	eig_yy	—"—
	eig_yz	—"—
	eig_zx	—"—
	eig_zy	—"—
	eig_zz	—"—
Energy		All properties are in output units.
	Ekin	The total kinetic energy, $\sum T_{i}$ .
	Epot	The total gravitational potential energy $1 / 2 \sum W_{i}$ , where the 1/2 comes from double counting.
	Efrac	The fraction of particles that are formally bound (i.e., have $W_{i} + T_{i} < 0$ .
Quantities within $R (V_{\max})$		Variety of properties based on particles within $r \leq R (V_{\max})$ .
	RVmax_sigV	Dispersion, like sigV for $r \leq R (V_{\max})$ .
	RVmax_veldisp_xx	Dispersion tensor, like veldisp_xx for $r \leq R (V_{\max})$ .
	RVmax_veldisp_xy	—"—
	RVmax_veldisp_xz	—"—
	RVmax_veldisp_yx	—"—
	RVmax_veldisp_yy	—"—
	RVmax_veldisp_yz	—"—
	RVmax_veldisp_zx	—"—
	RVmax_veldisp_zy	—"—
	RVmax_veldisp_zz	—"—
	RVmax_lambda_B	Spin parameter, like lambda_B for $r \leq R (V_{\max})$ .
	RVmax_Lx	Total angular momentum, like Lx $r \leq R (V_{\max})$ .
	RVmax_Ly	—"—
	RVmax_Lz	—"—
	RVmax_q	Semi-major axis ratio, like q for $r \leq R (V_{\max})$ .
	RVmax_s	—"—
	RVmax_eig_xx	Eigenvectors of morphology, like eig_xx for $r \leq R (V_{\max})$ .
	RVmax_eig_xy	—"—
	RVmax_eig_xz	—"—
	RVmax_eig_yx	—"—
	RVmax_eig_yy	—"—
	RVmax_eig_yz	—"—
	RVmax_eig_zx	—"—
	RVmax_eig_zy	—"—
	RVmax_eig_zz	—"—
Gas quantities		Bulk properties of gas particles/tracers when compiled to process gas properties. Properties unique to gas are T_gas and SFR_gas.
	n_gas	Number of gas particles.
	M_gas	Total gas mass $M_{gas}$ .
	M_gas_Rvmax	Gas mass within $R (V_{\max})$ .
	M_gas_30kpc	Gas mass within 30 pkpc.
	M_gas_500c	Gas mass within a spherical overdensity of $500 ρ_{c}$ .
	Xc_gas	$x$ coordinate of centre-of-mass of gas particles relative to Xc.
	Yc_gas	—"—
	Zc_gas	—"—
	VXc_gas	$x$ coordinate of centre-of-mass velocity of gas particles relative to VXc.
	VYc_gas	—"—
	VZc_gas	—"—
	Efrac_gas	Like Efrac but for gas particles only.
	R_HalfMass_gas	Like R_HalfMass but for gas particles only.
	veldisp_xx_gas	Like veldisp_xx but for gas particles only and relative to the centre-of-mass.
	veldisp_xy_gas	—"—
	veldisp_xz_gas	—"—
	veldisp_yx_gas	—"—
	veldisp_yy_gas	—"—
	veldisp_yz_gas	—"—
	veldisp_zx_gas	—"—
	veldisp_zy_gas	—"—
	veldisp_zz_gas	—"—
	Lx_gas	Like Lx but for gas particles only and relative to the centre-of-mass.
	Ly_gas	—"—
	Lz_gas	—"—
	q_gas	Like q but for gas particles only and relative to the centre-of-mass.
	s_gas	Like s but for gas particles only and relative to the centre-of-mass.
	eig_xx_gas	Like eig_xx but for gas particles only and relative to the centre-of-mass.
	eig_xy_gas	—"—
	eig_xz_gas	—"—
	eig_yx_gas	—"—
	eig_yy_gas	—"—
	eig_yz_gas	—"—
	eig_zx_gas	—"—
	eig_zy_gas	—"—
	eig_zz_gas	—"—
	Krot_gas	Like Krot but for gas particles only and relative to the centre-of-mass.
	T_gas	Average temperature of gas.
	Zmet_gas	Average metallicity of gas.
	SFR_gas	Average star formation rate of gas.
Star quantities		Bulk properties of star particles when compiled to process star properties. Similar to gas properties but has _star instead of _ gas. For brevity, we list only quantities unique to star particles.
	tage_gas	Average stellar age.
Black hole quantities		Bulk properties of black hole particles when compiled to process black hole properties.
	n_bh	Number of black hole particles.
	Mass_bh	Total mass of black hole particles.
Interloper particles		If analysing multi-resolution simulations, low resolution particles are often treated as contaminants. These are bulk properties of low resolution contaminant particles.
	n_interloper	Number of low resolution, interloper particles.
	Mass_interloper	Total mass of low resolution, interloper particles.

Equations53

\frac{( x _{i} - x _{j} ) ^{2}}{ℓ _{x}^{2}} < 1,

\frac{( x _{i} - x _{j} ) ^{2}}{ℓ _{x}^{2}} < 1,

\frac{( x _{i} - x _{j} ) ^{2}}{ℓ _{x}^{2}} + \frac{( v _{i} - v _{j} ) ^{2}}{ℓ _{v}^{2}} < 1,

\frac{( x _{i} - x _{j} ) ^{2}}{ℓ _{x}^{2}} + \frac{( v _{i} - v _{j} ) ^{2}}{ℓ _{v}^{2}} < 1,

F (x, v) = ρ (x) f (v) .

F (x, v) = ρ (x) f (v) .

\frac{F _{S} ( x , v )}{F _{bg} ( x , v )} = [\frac{ρ _{S} ( x )}{ρ _{bg} ( x )}] [\frac{σ _{bg}^{3}}{σ _{S}^{3}}] [\frac{e ^{- (v - v_{S})^{2} /2 σ_{S}^{2}}}{e ^{- (v - v_{bg})^{2} /2 σ_{bg}^{2}}}] .

\frac{F _{S} ( x , v )}{F _{bg} ( x , v )} = [\frac{ρ _{S} ( x )}{ρ _{bg} ( x )}] [\frac{σ _{bg}^{3}}{σ _{S}^{3}}] [\frac{e ^{- (v - v_{S})^{2} /2 σ_{S}^{2}}}{e ^{- (v - v_{bg})^{2} /2 σ_{bg}^{2}}}] .

f_{bg} (v_{k}) = \frac{exp [ - \frac{1}{2} ( v _{k} - v ˉ ( x _{k} )) Σ _{v}^{- 1} ( x _{k} ) ( v _{k} - v ˉ ( x _{k} )) ]}{( 2 π ) ^{3/2} ∣ Σ _{v} ( x _{k} ) ∣ ^{1/2}},

f_{bg} (v_{k}) = \frac{exp [ - \frac{1}{2} ( v _{k} - v ˉ ( x _{k} )) Σ _{v}^{- 1} ( x _{k} ) ( v _{k} - v ˉ ( x _{k} )) ]}{( 2 π ) ^{3/2} ∣ Σ _{v} ( x _{k} ) ∣ ^{1/2}},

S = \frac{1}{lo g n _{bins}} k \sum n_{bins} - \frac{m _{k}}{m _{tot}} lo g \frac{m _{k}}{m _{tot}},

S = \frac{1}{lo g n _{bins}} k \sum n_{bins} - \frac{m _{k}}{m _{tot}} lo g \frac{m _{k}}{m _{tot}},

\overset{ˉ}{x}

\overset{ˉ}{x}

\overset{ˉ}{v}

σ_{i, j}^{2}

u (x) = i = 0 \sum N \frac{w _{i} ( x ) u _{i}}{\sum _{j = 0}^{N} w _{j} ( x )},

u (x) = i = 0 \sum N \frac{w _{i} ( x ) u _{i}}{\sum _{j = 0}^{N} w _{j} ( x )},

R_{k} = ln [f_{l} (v_{k} ∣ x_{k}) / f_{bg} (v_{k} ∣ x_{k})] .

R_{k} = ln [f_{l} (v_{k} ∣ x_{k}) / f_{bg} (v_{k} ∣ x_{k})] .

f_{SG}

f_{SG}

\displaystyle\quad+\exp\left[-\frac{\left(\mathcal{R}-\bar{\mathcal{R}}\right)^{2}}{2\sigma_{\mathcal{R}}^{2}}\right]\Theta(\mathcal{R}-\bar{\mathcal{R}})\Biggr{\}},

L_{k} \equiv (R_{k} - \overset{ˉ}{R}) / σ_{R} .

L_{k} \equiv (R_{k} - \overset{ˉ}{R}) / σ_{R} .

L_{i}, L_{j} \geq L_{th}

L_{i}, L_{j} \geq L_{th}

\frac{( x _{i} - x _{j} ) ^{2}}{( α _{x, S} ℓ _{x} ) ^{2}} < 1,

1/ V_{r} \leq v_{i} / v_{j} \leq V_{r},

cos Θ_{op} \leq \frac{v _{i} \cdot v _{j}}{v _{i} v _{j}},

⟨ L ⟩ \geq \overset{ˉ}{L} (L_{th}) (1 + β_{L} / N) .

⟨ L ⟩ \geq \overset{ˉ}{L} (L_{th}) (1 + β_{L} / N) .

\overset{ˉ}{L} = \frac{L _{th} \int \infty x e ^{- x^{2} /2} d x}{L _{th} \int \infty e ^{- x^{2} /2} d x} = \frac{\frac{2}{π} e ^{- L_{th}^{2} /2}}{1 - erf ( L _{th} / 2 )} .

\overset{ˉ}{L} = \frac{L _{th} \int \infty x e ^{- x^{2} /2} d x}{L _{th} \int \infty e ^{- x^{2} /2} d x} = \frac{\frac{2}{π} e ^{- L_{th}^{2} /2}}{1 - erf ( L _{th} / 2 )} .

β_{E} T + W \geq 0,

β_{E} T + W \geq 0,

ℓ_{x, C} = α_{x, C}^{l} ℓ_{x}, ℓ_{v, C} = α_{v, C}^{l} ℓ_{v},

ℓ_{x, C} = α_{x, C}^{l} ℓ_{x}, ℓ_{v, C} = α_{v, C}^{l} ℓ_{v},

\overset{ˉ}{X}

\overset{ˉ}{X}

σ_{X_{i, j}}

D_{k, n}^{2} = (X_{k} - \overset{ˉ}{X}_{n}) Σ_{X, n}^{- 1} (X_{k} - \overset{ˉ}{X}_{n}),

D_{k, n}^{2} = (X_{k} - \overset{ˉ}{X}_{n}) Σ_{X, n}^{- 1} (X_{k} - \overset{ˉ}{X}_{n}),

D_{p, n}^{2} \geq β_{C},

D_{p, n}^{2} \geq β_{C},

D_{B, DM}^{2} = (x_{B} - \overset{ˉ}{x}_{DM}) / ℓ_{x} + (v_{B} - \overset{ˉ}{v}_{DM}) / σ_{v},

D_{B, DM}^{2} = (x_{B} - \overset{ˉ}{x}_{DM}) / ℓ_{x} + (v_{B} - \overset{ˉ}{v}_{DM}) / σ_{v},

σ_{V, outlier} \equiv [(V_{S} - \overset{ˉ}{v}_{bg}) Σ_{v, bg}^{- 1} (V_{S} - \overset{ˉ}{v}_{bg})]^{1/2}

σ_{V, outlier} \equiv [(V_{S} - \overset{ˉ}{v}_{bg}) Σ_{v, bg}^{- 1} (V_{S} - \overset{ˉ}{v}_{bg})]^{1/2}

d n / d f_{M} = A f_{M}^{- α} exp [- (f_{M} / f_{o})^{β}],

d n / d f_{M} = A f_{M}^{- α} exp [- (f_{M} / f_{o})^{β}],

d n / d V \propto (r / r_{s})^{- α} (1 + r / r_{s})^{- β},

d n / d V \propto (r / r_{s})^{- α} (1 + r / r_{s})^{- β},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

\jid

PASA

\jyear2024

Hunting for Galaxies and Halos in simulations with VELOCIraptor

Pascal J. Elahi1,2,†

Rodrigo Cañas1,2

Rhys J.J. Poulton1,2

Rodrigo J. Tobar1

James S. Willis3

Claudia del P. Lagos1,2

Chris Power1,2

Aaron S. G. Robotham1

1International Centre for Radio Astronomy Research, University of Western Australia, 35 Stirling Highway, Crawley, WA 6009, Australia

2ARC Centre of Excellence for All Sky Astrophysics in 3 Dimensions (ASTRO 3D)

3Institute for Computational Cosmology (ICC),

†Email: [email protected]

Abstract

We present VELOCIraptor, a massively parallel galaxy/(sub)halo finder that is also capable of robustly identifying tidally disrupted objects and separate stellar halos from galaxies. The code is written in c++11, use the MPI and OpenMP API’s for parallelisation, and includes python tools to read/manipulate the data products produced. We demonstrate the power of the VELOCIraptor (sub)halo finder, showing how it can identify subhalos deep within the host that have negligible density contrasts to their parent halo. We find a subhalo mass-radial distance dependence: large subhalos with mass ratios of $\gtrsim 10^{-2}$ are more common in the central regions that smaller subhalos, a result of dynamical friction and low tidal mass loss rates. This dependence is completely absent in (sub)halo finders in common use, which generally search for substructure in configuration space, yet is present in codes that track particles belonging to halos as they fall into other halos, such as hbt+. VELOCIraptor largely reproduces the dependence seen without tracking, finding a similar radial dependence to hbt+ in well resolved halos from our limited resolution fiducial simulation.

doi:

10.1017/pas.2024.xxx

keywords:

methods: numerical – galaxies: evolution – galaxies: halos – dark matter

1 Introduction

Running a cosmological simulation, whether N-Body or full hydrodynamical, is the first step in understanding cosmic structure formation and the evolution of galaxies. A critical step in extracting information from sophisticated, multi-billion particle simulations is the identification of structures, like dark matter halos and synthetic galaxies. Identifying (sub)structures is a non-trivial task and has led to the development of equally sophisticated structure finders (see Knebe et al., 2011; Onions et al., 2012; Knebe et al., 2013a, b, for an overview of (sub)halo/galaxy finding). A variety of codes exist that attempt to excise structures of interest from simulations, with most focusing on searching for overdense, gravitationally self-bound regions within cosmological simulations. For cosmological N-body simulations, these objects are dark matter halos, for hydrodynamical simulations these objects can be galaxies.

The two most common pure halo finders are Friends-of-Friends algorithms (e.g. Davis et al., 1985) and Spherical Overdensity algorithms (e.g. Lacey & Cole, 1994), the former using a linking length based on a desired density criterion, the latter identifying density peaks and grouping all particles within a spherical region that encloses some density (see Knebe et al., 2011, for a more thorough discussion and comparison of halo finding).

Beyond halo finders are those that also attempted to excise substructures residing within the gravitationally collapsed, nonlinear environment of halos, so-called subhalo finders. Subhalo finders can be broadly classified into two types: configuration-space finders and phase-space finders. Older, more common configuration-space finders, like ahf Knollmann & Knebe (2009), subfind Springel et al. (2001), adaptahop Tweed et al. (2009), search for physical overdensities or clustering in configuration space111In practice, even configuration-space finders are pseudo phase-space finders as candidate objects must be passed through an unbinding process, whereby unbound particles are removed from a candidate, to return sensible results.. Phase-space finders, like hsf Maciejewski et al. (2009) and rockstar Behroozi et al. (2013), use extra velocity information to identify overdensities and clustering in the full phase-space.

Different (sub)halo finders suffer from different issues (see Knebe et al., 2013b, for a in depth discussion of structure finding). Configuration-space base finders rely on saddle points in the density field in some form or another to separate structures. Consequently, subhalos are artificially truncated as they fall towards pericentre and grow again as the move out to apocentre (see Muldrew et al., 2011; Behroozi et al., 2015, for specific examples using subfind & ahf). Phase-space finders are better able to separate these structures since they will overlap less in phase-space, and in principle need not inherently shrink/grow the mass associated with subhalos as they move towards pericentre/apocentre.

Here we present VELOCIraptor (formerly known as STructure Finder, stf Elahi et al., 2011), a phase-space (sub)halo finder capable of identifying dark matter halos and galaxies222Freely available https://github.com/pelahi/VELOCIraptor-STF.git. Documentation is found at http://velociraptor-stf.readthedocs.io/en/latest/. This code can ingest both pure N-Body simulation input and hydrodynamical data. Here we present significant update to the original algorithm described in Elahi et al. (2011).

Our paper is organised as follows: in section §2, we outline the code package, present tests of our algorithm in §3 and conclude in §4 with a summary and discussion.

2 Identifying Structures with VELOCIraptor

VELOCIraptor is a (sub)halo finder that identifies structures in a multi-stage process, the exact details depending on the operational mode it is being used in: identifying dark matter halos, dark matter halos+baryonic content, or just galaxies. VELOCIraptor is built on STF Elahi et al. (2011), providing significant upgrades to the halo finding algorithm, treatment of baryons, the mass reconstruction of major merger events, along with parallelisation and integration into N-body codes (specifically swift Schaller et al., 2016). We describe the various aspects of our code below. For readers interested in input interfaces, output, and general modes of operation we suggest skipping to §2.6. Readers interested in the main benefits and results of VELOCIraptor can skip to §3.

The identification process proceeds in a two stage approach: 1) identify field halos/galaxies; 2) for each field object search for substructure using phase-space information. Unlike almost all other structure finders currently available, this algorithm is also capable of robustly identifying tidally disrupted objects (see Elahi et al., 2013) along with self-bound, physically dense halos/galaxies. A flow chart describing the operational stages is shown in Fig. 1.

2.1 Field Halos

The code first identifies candidate halos using a 3DFOF algorithm (3D Friends-of-Friends in configuration space, see Davis et al., 1985), linking particles together if

[TABLE]

where ${\bf x}_{i}$ is the $i^{\rm th}$ particle’s position, and $\ell_{\rm x}$ is the linking length. This initial linking can also make use of a particle’s type, whether dark matter (N-body), or gas, star (baryon). Cosmological simulations typically set $\ell_{\rm x}=0.2$ times the inter-particle spacing.

Simple FOF algorithms are susceptible to artificially joining two structures together by a single (or a few) particle(s), a so-called particle bridge. We appeal to the physics of the structures we seek to identify, i.e., virialised halos, and use velocity information333In general, artificial particle bridges could be removed by identifying a particle(s) that, if removed, would split the structure into several structures, i.e., those particles that have groups of links whose sole common link is the particle itself.. For each structure $k$ we calculate a velocity dispersion, $\sigma_{v,k}$ , and apply a 6DFOF,

[TABLE]

which splits virialised structures connected by dynamically unrelated particle bridges and tends to remove very unbound particles that may have been grouped by the original FOF algorithm. Here $\ell_{\rm v}=\alpha_{\rm v}\sigma_{v,k}$ , and $\alpha_{\rm v}$ is a scaling term on the order of unity.

Addition of Baryons:

Simulations can contain both N-body (Dark Matter, DM) particles and other particle types and along with the inclusion of extra forces, like the addition of gas tracers and hydrodynamical forces. Fully hydrodynamical cosmological simulations often contain either gas particles (or tracers for codes such as AREPO Springel, 2010, or cells such as RAMSES Teyssier, 2002), star particles, and even sink particles representing supermassive black holes. These baryons tracers can be treated in a special fashion by VELOCIraptor if the appropriate flags are set. If desired, specific particle types can be searched, such as stars to produce a galaxy catalogue. The code can also search all particle types, either treating all particles equally or allowing for special linking behaviour dependent on particle type.

The two most common modes of operation are either to assign baryonic particles to DM structures, so-called DM+Baryons, or to search only star particles and identify galaxies, so-called Galaxies+Baryons. We discuss how the field search operates in both these modes.

Since gas particles are subject to hydrodynamical forces and can clump together to form long filaments, applying a simple FOF algorithm can leading to the artificial linking together of several dynamically distinct structures. Hence the typical mode of operation to group both DM and baryons together is to produce FOF links using DM particles only, i.e., a DM particle can link to other DM particles and baryon particles, but baryon particles are ignored when searching for new FOF links. An application of this mode has been applied to hydrodynamical zoom simulations (e.g. Elahi et al., 2016; Arthur et al., 2017).

When searching for galaxies using star particles, we first identify 3DFOF stellar structures. These structures are then searched using a 6DFOF, with the critical difference between the DM search being that we keep track of the star particles linked in the 3DFOF but not linked in the 6DFOF as a structure. This remnant 3DFOF represents the diffuse, kinematically distinct stellar halos that surround galaxies. An application of this mode has been applied to hydrodynamical simulations to look at the sizes of galaxies and a preliminary investigation of diffuse stellar halos (Cañas et al., 2018, Canas et al, in prep). The code can also use star particles as a basis for links to assign other baryonic particle types to structures in a similar fashion to the DM mode described above.

2.2 Subhalos & Streams

We briefly describe the specifics of identifying substructures here as it is discussed in Elahi et al. (2011). Substructures are identified using a phase-space FOF algorithm on particles that appear to be dynamically distinct from the mean “Maxwellian” halo background, i.e., particles which have a local velocity distribution that differs significantly from the mean, smooth background halo. This approach is capable of not only finding subhalos, but also tidal debris surrounding subhalos as well as tidal streams from completely disrupted subhalos. The method for identifying substructure is shown in Fig. 2.

Dynamically distinct particles:

The algorithm identifies particles that are dynamically distinct from a background distribution by examining velocity space assuming that that a halo’s velocity distribution can be split into a virialised background and substructures. To illustrate this method, consider the phase-space distribution function:

[TABLE]

Here we assume the distribution function is separable into $\rho({\bf x})$ and $f({\bf v})$ , the physical and velocity density distribution functions respectively. Assuming Gaussian velocity distributions for a substructure and a halo, the distribution ratio of a substructure S to the background bg at a given $({\bf x},{\bf v})$ is:

[TABLE]

This ratio has three terms: the physical density contrast; velocity dispersion contrast; and a ratio of Gaussian terms. Subhalos are dynamically cold overdensities, unlike tidal streams, which can have negligible density contrasts and velocity dispersion comparable with the background. Hence, it is common practice to focus on the density ratio to identify subhalos. However, regardless of whether a substructure is a subhalo or tidal debris, the velocity distribution of the particles belonging to the substructure will differ from the background. These particles will have a ratio of at least ${\exp(\delta v^{2}/2\sigma_{\rm bg}^{2})}$ .

This exponential factor, a measure of orbit clustering, is key to our algorithm. Instead of estimating the full phase-space density at a particle’s phase-space position ${\bf X}$ , we measure local velocity density, $f_{\rm l}({\bf v}|{\bf x})$ , as this is less computationally expense and not as noisy. We then divide out the expected velocity density of the background, $f_{\rm bg}({\bf v}|{\bf x})$ , neglecting the first term in Eq. (4) at this stage. Particles belonging to velocity distributions that differ from the background will have ratios of $f_{\rm l}/f_{\rm bg}\gg 1$ .

The local velocity density of a particle $k$ , $f_{\rm l}({\bf v}_{k})$ , is measured using a kernel-scheme with an Epanechnikov smoothing kernel Sharma & Steinmetz (2006). This density is calculated using $N_{\rm v}$ nearest velocity neighbours from the set of $N_{\rm se}$ nearest physical neighbours, where $N_{\rm v}\leq N_{\rm se}$ 444Using a subset of physical neighbours to measure the local velocity density will give a biased result but as the goal is to highlight any clustering in velocity space, this is perfectly acceptable. Typical values are $N_{\rm v}=32,N_{\rm se}=256$ .

The mean background velocity density is characterised by a multivariate Gaussian555Numerical simulations show the velocity distribution of a small region of a cosmological halo are reasonably characterised by a multivariate Gaussian (e.g. Vogelsberger et al., 2009)., thus, the expected background velocity density for a particle $k$ with velocity ${\bf v}_{k}$ is

[TABLE]

where $\bar{\bf v}$ is the mean velocity, and $\Sigma_{v}$ is the matrix representation of velocity dispersion tensor about $\bar{\bf v}$ , both of which depend on the position within the halo, ${\bf x}$ .

The mean field is estimated by splitting the halo into volumes containing enough particles so that the statistical error on bulk quantities calculated for a cell is negligible but not so large that density (and thus the velocity dispersion) varies greatly across the volume. To balance these competing effects, we split the halo into cells containing $N_{\rm cell}$ particles using a KD-Tree (Friedman et al., 1977; Appel, 1985; Barnes & Hut, 1986), iteratively splitting along the spatial dimension that maximises Shannon entropy, $S$ . We calculate $S$ for each dimension by binning particles in $n_{\rm bins}$ that span the extent of the dimension using the formula

[TABLE]

where $m_{k}$ is the mass in the $k^{th}$ bin and $m_{\rm tot}$ is the total mass. This process splits volumes in the dimension with the greatest amount of variation in the spacing between particles, effectively minimises the variation in particle density across any given cell volume.

The cell size sets the background scale, below which we can robustly identify orbital clustering. We typically set $N_{\rm cell}=f_{\rm cell}N_{\rm H}$ , where $f_{\rm cell}\sim 0.01$ is the fraction of $N_{\rm H}$ , the number of particles in the halo. This fraction is increased if $N_{\rm cell}\lesssim 100$ up to a maximum of $\sim 1/8N_{\rm H}$ in order to have an accurate dispersion tensor.

For each volume we calculate the centre-of-mass, centre-of-mass velocity and the velocity dispersion tensor:

[TABLE]

where $M_{\rm cell}$ is the mass contain in the cell and the sums are over all particles in the cell. The velocity quantities are interpolated to a particle’s position with a inverse-distance interpolation scheme using the cell containing the particle and the six neighbouring cells (those that share faces with the cell of interest):

[TABLE]

where $u$ is the quantity we wish to determine at a position $\bf x$ based on cells with center-of-mass positions $\bar{\bf x}_{i}$ , and $w_{i}(\mathbf{x})=|\mathbf{x}-\bar{\mathbf{x}}_{i}|^{-1}$ .

We then calculate the logarithmic ratio for each particle $k$ ,

[TABLE]

As both quantities have noise, this noise must be taken into account to determine if a particle is an outlier of the background distribution and belongs to a substructure. Based on tests using smooth, spherical halos with density profiles ranging from cored isothermal to a steep $r^{-1.5}(1+r/a_{o})^{-1.5}$ generated by galactICs Kuijken & Dubinski (1995); Widrow & Dubinski (2005); Widrow et al. (2008) the $\mathcal{R}$ -distribution is characterised by Skew-Gaussian:

[TABLE]

where $s$ is a measure of the skew or asymmetry, and $\Theta(x)$ is the Heaviside function. The skew arises from the biased estimator of $f_{\rm l}({\bf v}_{k}|{\bf x}_{k})$ . We fit a Skew-Gaussian to the binned distribution in order to accurately measure the mean and dispersion and calculate the normalised ratio:

[TABLE]

A particle is considered a significant outlier if $\mathcal{L}>1$ .

Linking particles:

The next stage uses a phase-space algorithm to link particles. Particles $i$ & $j$ are linked iff

[TABLE]

where $V_{r}$ is the velocity ratio threshold, and $\cos\Theta_{\rm op}$ is the threshold on the cosine of the angle between the velocities.

The first criterion limits the linking to dynamically distinct particles. The second criterion is the standard FOF criterion with the linking length scaled by a factor $\alpha_{\rm x,S}$ . The next two criteria ensure that the particles have similar velocities. The reason we do not use a simple 6DFOF, i.e., $({\bf v}_{i}-{\bf v}_{j})^{2}/\ell_{\rm v}^{2}<1$ , is that tidal streams may have large velocities and dispersions. Consequently, scaling an allowed velocity dispersion, $\ell_{\rm v}^{2}$ is non-trivial. In total, this FOF algorithm has $4$ parameters, $\mathcal{L}_{\rm th}$ , $\alpha_{\rm S}$ , $\mathcal{V}_{\rm r}$ and $\cos\Theta_{\rm op}$ .

As with all FOF algorithms, poor choice of linking parameters can produce spurious structures. A threshold of $\mathcal{L}_{\rm th}\approx 0$ includes all particles whereas $\mathcal{L}_{\rm th}\gg 1$ would ensure few contaminants. The speed ratio, $\mathcal{V}_{\rm r}$ , has two limiting cases: $\mathcal{V}_{\rm r}\approx 1$ is conservative; and $\mathcal{V}_{\rm r}\gg 1$ is relaxed. The related velocity parameter $\cos\Theta_{\rm op}$ has limits of $\cos\Theta_{\rm op}\approx 1$ (conservative) and $\cos\Theta_{\rm op}\approx-1$ (relaxed). This also applies to $\alpha_{\rm x,S}$ , with $\alpha_{\rm x,S}<1$ ( $\alpha_{\rm x,S}>1$ ) a conservative (relaxed) choice. Conservative choices would ensure high purity but possibly miss substructures, whereas more relaxed will recover more particles at the cost of a lower purity and the inclusion of spurious groups.

To alleviate the issue of either using conservative values and missing substructures or relaxed conditions that ensure maximum recovery but low purity, we also employ a two stage approach. First we use conservative values for the FOF parameters to find an initial set of candidate substructures. The FOF criteria are then relaxed to link previously untagged particles neighbouring currently tagged particles, thereby recovering the less dynamically distinct/more diffuse portions of substructures. The thresholds in Eq. (14) are changed to $\mathcal{L}_{\rm th}\rightarrow\mathcal{L}_{\rm th}/\gamma_{\mathcal{L}}$ , $\mathcal{V}_{\rm r}\rightarrow\gamma_{\mathcal{V}_{\rm r}}\mathcal{V}_{\rm r}$ , $\Theta_{\rm op}\rightarrow\gamma_{\Theta_{\rm op}}\Theta_{\rm op}$ , and linking lengths increased to $\gamma_{\rm x,S}\alpha_{\rm x,S}\ell_{\rm x}$ , where the $\gamma$ ’s are order unity and $\geq 1$ . To recover extended tidal features, $\gamma_{\rm x,S}=1/\alpha_{\rm x,S}$ , i.e., the linking length used to identify entire halos.

For guidance on the initial conservative parameters, we appeal to probabilistic or physical arguments. To minimise contamination, we start with $\mathcal{L}_{\rm th}\approx 2.5$ . The $\alpha_{\rm x,S}\ell_{\rm x}$ linking-length parameter can significantly influence the results and, in the form used, there is no specific value to appeal to without prior knowledge. We argue for $\alpha_{\rm x,S}\sim 1/2$ , picking out the densest regions of substructures. The speed ratio should be of order unity so values of $\sim 2$ are reasonable. For the opening angle we typically use $\Theta_{\rm op}=18^{\circ}$ . These specific values are based on tuning done in Elahi et al. (2011) to recover subhalos and tidal tails using idealised simulations, though similar values will yield similar results.

Note that using conservative criteria can artificially split substructures and relaxing the criteria can join groups, in some circumstances artificially. Therefore, as substructures are grown and new links identified, substructures are only joined if the number of new connections exceeds $f_{\rm merge,th}N_{\rm p,o}$ for either substructure, where $N_{p,o}$ is the original size of the substructure. The default fraction threshold is $f_{\rm merge,th}=0.25$ , though values close to unity are reasonable.

The FOF algorithm without criterion Eq. (14a) and some tuning is itself able to recover the central densest regions of subhalos with moderate purity but this criterion is critical to identify subhalos with high purity and robustly recover tidal debris.

Cleaning the catalogue:

As with all halo finders, the catalogue must be cleaned of spurious groups and links. A group’s average $\langle\mathcal{L}\rangle$ value is a natural measure of significance. Purely artificial groups resulting from linking unrelated particles that are outliers due to random fluctuations are likely to have $\langle\mathcal{L}\rangle$ within Poisson noise of the expected $\bar{\mathcal{L}}$ calculated using the background distribution and the threshold $\mathcal{L}_{\rm th}$ imposed. Thus, we require a group composed of $N$ particles have

[TABLE]

Here $\beta_{\mathcal{L}}$ is the required significance level, typically $\beta_{\mathcal{L}}\approx 1$ and

[TABLE]

Groups not satisfying this criterion have particles removed in order of smallest $\mathcal{L}$ value until Eq. (15) is satisfied.

Additionally, groups can be pruned by an unbinding process666The common terminology of “unbinding” is a bit misleading as discussed in Knebe et al. (2013b). The bound state is determined instantaneously, typically neglects the background potential by treating objects in isolation, and uses a somewhat arbitrary velocity reference frame. Loosely unbound particles at a given instant will not immediately leave their host but remain in similar orbits as their host, drifting away over a dynamical time., where by particles deemed too unbound are removed. We calculate the potential energy $W$ of particles using a tree algorithm with groups treated in isolation, that is neglecting the surrounding tidal field. The instantaneous kinetic energy $T$ is calculated relative to the group’s centre-of-mass velocity reference frame777By default, the code uses shrinking spheres to determine the centre-of-mass and uses the inner most $10\%$ of particles to determine the centre-of-mass velocity. This can significantly differ from the bulk velocity of a halo as discussed in Behroozi et al. (2013). VELOCIraptor can be configured to use either a bulk velocity or a centre-of-mass velocity when determining the boundness of particles..

In most halo finders, a strict binding energy is used, where particles with $T+W>0$ are removed and potentials and centre-of-mass velocity frames are recalculated with each removal. This strict unbinding process is only truly necessary for configuration-based finders such as subfind, where initial particle assignment to subhalos can be quite poor. Due to the initial step of identifying dynamically distinct particles, VELOCIraptor does not suffer from this issue, allowing the binding criterion to be greatly relaxed in order to identify tidal debris.

Therefore, to retain tidal debris if desired, we use a modified binding energy criterion, removing particles with

[TABLE]

in order of least bound. For self-bound subhalos, $\beta_{\rm E}\approx 0.95$ is ideal, retaining some loosely unbound particles that would not immediately drift away from their subhalo host888Consider particles orbiting inside an NFW potential representing the subhalo near the virial radius, where orbital time are $\gtrsim 1$ Gyr. Particles with kinetic energies of $T=W/\beta_{\rm E}$ compared to $T=W$ for $\beta_{\rm E}=0.95$ will have apocentres that are $\lesssim 10\%$ larger. These inflated radii are typically still within the tidal radius of a NFW subhalo orbiting inside a larger, less concentrated NFW halo, at least for orbital distances of $\gtrsim 0.5R_{200\rho_{c},{\rm host}}$ . Only once $\beta_{\rm E}\lesssim 0.9$ do apocentres increase significantly by $\gtrsim 50\%$ , with apocentres likely outside the tidal radius.. To retain tidal debris with high purity we find that $\beta_{\rm E}\gtrsim 0.2$ works well (based on tests presented in Elahi et al., 2013). One can also require that the group as a whole have some fraction of completely bound particles where $T+W\leq 0$ , $f_{\rm E}$ .

The total mass assigned to subhalos typically only changes by few percent for $0.95\lesssim\beta_{\rm E}\lesssim 1$ . This is well within the differences of $10-20\%$ observed between different (sub)halo finders Onions et al. (2012); Knebe et al. (2013b), which arise from subtle differences in the kinetic reference frame used and how potentials are calculated. We argue that unless one is interested in tidal debris, the binding criterion be set to $0.95\leq\beta_{\rm E}\leq 1$ , although one can always recover the formally self-bound mass in the output from the code for any $\beta_{\rm E}$ .

Finally, groups must be composed of $N\geq N_{\rm min}$ particles. Typically we set $N_{\rm min}=20$ .

2.3 Core Search & Major Mergers

Major mergers occur when two approximately equal mass objects (within a factor of a few) coalesce. These events present a uniquely difficult problem for many halo finders. Many configuration-space based finders will artificially shrink one of the objects, designating it a subhalo, while the other object will be artificially larger and be designated a host. The subhalo/halo designation and the mass can switch between objects. Phase-space based finders are in principle less prone to this swapping (see Behroozi et al., 2015 for a discussion of major mergers; see Muldrew et al., 2011 for examples of the short-comings of configuration-space halo finders).

During a major merger the “halo” consists of two (or more) overlapping distributions in phase-space containing similar amounts of mass. Our orbit clustering approach will not be able to disentangle the merging halos if the secondary halo is significantly larger than $f_{\rm cell}N_{\rm H}$ particles. In such an instance, the background will consist of the merging halo that we are trying to separate.

We disentangle mergers (both major and minor with mass ratios of $\gtrsim f_{\rm cell}$ ) by appealing to the properties of the dynamically cold, dense core of halos. An early version of this method was used in Behroozi et al. (2015). Here we describe in full this new addition to VELOCIraptor. We search background particles not belonging to any substructure for these cores using a iterative, conservative 6DFOF and then proceed to grow them to reconstruct the mass as shown in Fig. 3, taking inspiration from rockstar Behroozi et al. (2013).

Core Identification:

We begin by searching the “background” particles of a halo, those not in substructure, using a conservative 6DFOF for groups larger than than some fraction $f_{\rm C}$ of $N_{\rm H}$ the number of particles in the halo. The linking lengths, $\ell_{\rm x}$ & $\ell_{\rm v}$ here are based on the original halo linking length and the halo velocity dispersion respectively. This search is repeated with configuration- & velocity-space linking lengths iteratively shrunk and the “background” particles list updated for each loop:

[TABLE]

where $\alpha_{\rm x,C},\alpha_{\rm v,C}<1$ and $l$ is the loop number.

The “background” for each successive iteration is defined as the largest 6DFOF group identified in the previous iteration, the so-called “primary core”. If at any point, more than a single group is identified, all but the largest are stored as candidate “cores”. We loop until no groups are found (no background to search) up to a maximum desired number of iterations, $\Delta_{\rm C}$ . The code can also alter the minimum number of particles a group must contain at a given iteration $l$ to $N_{\rm min,C}=\alpha_{\rm N,C}^{l}f_{\rm C}N_{\rm H}$ .

Core Growth & Mass Reconstruction:

If more than a single “core” has been identified, the next step is to assign all untagged halo particles to these candidate “cores” and the “primary core”. We start at the last iteration at which multiple groups were found, setting these “cores” and “primary core” as “active”. Phase-space dispersion tensors are calculated for these active cores:

[TABLE]

We then assign untagged particles that were searched at this iteration, “active background particles”, to the closest active core in phase-space. The distance used is:

[TABLE]

where here we show the distance of particle $k$ to a core $n$ and $\Sigma$ is the matrix representation of $\sigma_{\bf X_{i,j}}$ .

Once all active particles at the current level are assigned, we then move up to the previous iteration and assign particles. If cores are present at this iteration, they are added to the active core list and we proceed as outlined above. We repeat the process till all particles not associated with substructure have been assigned to a core.

This method is similar to assigning particles based on a Gaussian mixture model999A Gaussian mixture model is a probabilistic model that assumes data points are drawn from a mixture of a finite number of Gaussian distributions with unknown parameters. There are several techniques used to iteratively determine the number of Gaussians and their properties that describe the data using the Bayesian Evidence in some form. Data points can be assigned to the Gaussian with the highest probability of producing the data point, thereby classifying the data., but less time-consuming as we do not calculate full likelihoods. It also has the added advantage that we do not require each distribution to be characterised by a single global dispersion tensor.

The use of phase-space tensor based distance also has an advantage over algorithms that use a simple 6DFOF-like distance metric (see Eq. (2), e.g., rockstar) as it does not impose a spherical distribution, nor ignore covariance between positions and velocities. That is not to say that for moderately aspherical distributions typical of halos, using scalar dispersions performs poorly but that results can be improved using dispersion tensors.

We compared assigning particles using dispersion tensors to dispersion scalar using simple models composed of overlapping multivariate Gaussians. We draw particles from several $n$ -dimensional multivariate Gaussian distributions with means roughly separated by $\sim 1-3\sigma$ from each other, and with each sub-population containing similar numbers of members. Initial dispersion scalars and tensors are determine using 100 particles and then assign particle group membership using the relevant distance in single step. We find tensor-based distance assignment results in groups of higher purity, that is a higher fraction of correctly identified members. There is also a reduction in the group-to-group scatter in purity. The amount of improvement depends on the asphericity of the distributions, with increase of a few percent or more. More aspherical distributions have larger increases in purity as well as the fraction of the group recovered. Iterating this process improves the results.

For example, consider particles drawn from two Gaussian distributions, one spherical, the other quite aspherical (with minor axis ratio of $0.03$ ), separated by a phase-space tensor normalised distance of $\sim 2$ . Assignment using the dispersion scalar distances results in a purity of $0.76$ & $0.92$ for the spherical and aspherical populations respectively. Using tensor based distances improves the purity to $0.79$ & $0.93$ respectively. The recover fractions are similarly improved from $0.94$ & $0.70$ to $0.95$ & $0.76$ respectively.

Cleaning the Catalogue:

We clean the candidate core list of spurious objects prior to core growth by requiring that the distance of a core $n$ to the primary core $p$ identified at the same point to be significant,

[TABLE]

where the distance is based on the secondary core’s phase-space tensor using Eq. (21), and $\beta_{\rm C}$ is the significance. The substructures after core growth are then processed by the unbinding procedure (see Eq. (17)).

2.4 Substructure and Baryons

Assigning baryonic particles to substructure or identifying baryonic substructures depends on the mode of operation. We discuss the two principal modes here.

Substructure in DM+Baryons mode:

In this mode, baryons have already been assigned to a FOF envelop. For each FOF envelop, baryons are assigned to the group of the DM particle that is closest in phase-space using a simple phase-space metric

[TABLE]

where $\sigma_{v}$ is the typical velocity dispersion of structures found101010A more complex phase-space metric could be used, where the dispersion depends on the FOF halo being searched or even a full tensor but the extra computational cost does not drastically improve the initial particle assignment. This is particularly true when the initial baryonic assignment is processed by an unbinding routine..

Substructure in Galaxies+Baryons mode:

The process used to identify DM substructures is ill suited to separating interacting galaxies as stars are constantly being formed and there need not have a well defined background. Instead interacting galaxies are separated using the core search as outlined in §2.3 (see Cañas et al., 2018, for details). Once interacting galaxies have been separated, the same assignment scheme is used as in the DM+Baryons mode to assign other baryonic particles (gas and black hole particles) to the nearest star particle.

2.5 Halo Properties

The code calculates a large number of bulk properties for each object (see Table LABEL:tab:velociraptor:properties for an almost complete list; the exact number of properties calculated depending on input). Calculating properties is complicated by the presence of substructure. Should substructures be excluded or included? The answer depends on the scientific goal. For following the evolution of objects across cosmic time using halo merger trees for input into SAMs, ideal masses are likely that of particles belonging exclusively to the object, whether halo or subhalo. This avoids abrupt changes in mass as an object transitions from a halo to a subhalo. For lensing, one is likely interested in the total mass within some region.

VELOCIraptor allows some flexibility: masses can either be calculated using particles exclusive to the object, or, for halos one can include substructures. Inclusive halo masses, such as commonly used spherical overdensity halo masses111111An example would be $M_{200\rho_{c}}=4\pi\Delta\rho_{c}R_{\Delta\rho_{c}}/3$ , where $\rho_{c}$ is the critical density, and $R_{\Delta\rho_{c}}$ is the radius enclosing an average density of $\Delta\rho_{c}$ , where $\Delta=200$ can include particles belonging to substructures, the background and even neighbouring halos. Subhalos have exclusive masses, that is calculated using only particles belonging to the subhalo. Angular momentum, like mass can be calculated in a variety of was for halos. Other properties, such the maximum circular velocity, are by default are calculated using particles exclusively belonging to the object.

Another complication in bulk properties has to do with the phase-space position of a halo. The overall bulk motion of particles within the FOF envelop maybe offset from the motion of the central most bound regions particularly the motions of particles near the edge of the FOF envelop Behroozi et al. (2013). By default, centre-of-mass positions are calculated using shrinking spheres till the the last sphere encloses $\sim 10\%$ of the group’s particles and velocities are calculated using this inner most $10\%$ . These positions better characterise the orbital motion of halos, though it does not represent the overall bulk motion of mass.

VELOCIraptor also outputs all the particle IDs in each structure so users can post-process data to calculate desired properties.

2.6 Code Structure

VELOCIraptor is a c++ code that uses OpenMP+MPI APIs for parallelisation but can be compiled in serial mode, solely with OpenMP, or solely with MPI. The code requires a configuration file (example are provided with the repository), input data and an output file name.

The code has been designed to take the following types of N-body/Hydrodynamical input: HDF5121212Library can be found at https://www.hdfgroup.org/.; gadget binary format Springel et al. (2005); ramses binary format Teyssier (2002); and tipsy binary format N-Body Shop (2011). For all input save tipsy, VELOCIraptor extracts cosmological information and the spatial bounds for the particles. This information can be provided via the configuration file if not present in the input data.

The spatial extent of the particle distribution must be provided for MPI domain decomposition, even for non-periodic input. This information can be provided either via the input data itself or via the configuration file. Currently implemented MPI domain decomposition scheme is a Binary Tree like splitting131313A graph-partitioning scheme using the metis library http://glaros.dtc.umn.edu/gkhome/metis/metis/overview is in the works.

It produces the following types of output formats: ASCII; custom binary format; HDF5 (preferred); and ADIOS141414Library can be found at https://www.olcf.ornl.gov/center-projects/adios/. (alpha). The output files consist of two types: a collection of bulk properties for each group; and a list of the IDs of all particles belonging to each group. It can also produce a list of particle types and even information on the file and index each particle is located at, allowing for quick extraction of particle data for further follow-up analysis. We outline a sample of the bulk properties calculated in the appendix, Table LABEL:tab:velociraptor:properties.

There are a variety of configuration options available. We list the critical parameters in Table 1, providing a more complete list and the specific code parameter key words in the appendix, Table LABEL:tab:velociraptor:config. We note that for most users, these default parameters will produce standard halo catalogs and subhalo need no alteration. Most users will simply alter the minimum number of particles per halo. For identifying tidal debris, the key parameter to change is the unbinding parameter, $\beta_{\rm E}$ , which can be set to values of $\sim 0.1-0.5$ . We highlight parameters that are likely to be changed depending on the input simulations and the desired scientific outcome.

3 Results

Here we present how well halos/galaxies and substructures are identified. As input we primarily use a small cosmological N-Body simulation consisting of $512^{3}$ particles (from the SURFS suite Elahi et al., 2018). The simulation details are presented in Table 2.

For this analysis, we also make use a halo merger tree builder, TreeFrog Elahi et al. (2019). This related software is a so-called “Tree Builder’, software that takes as input halo catalogs across cosmic time and reconstructs the history of a halo, producing halo merger trees. Details of how TreeFrog reconstructs halo merger trees can be found in Elahi et al. (2019). Here we summarise the salient points: the code uses particle IDS and the group to which they belong to compare one snapshot to the next, identifying descendants by maximising a merit function that effectively links halos at a time $t_{1}$ to halos found a later time that share the largest number of most bound particles. We also compare results to three different (sub)halo finders: ahf Knollmann & Knebe (2009), a configuration-space based finder; rockstar Behroozi et al. (2013), a phase-space finder; and hbt+ Han et al. (2018) a 3DFOF tracker that uses 3DFOF halos found across all snapshots and tracks particles assigned to 3DFOF halos as they enter larger 3DFOF halos to build a halo merger tree as well as a subhalo hierarchy.

We start by looking at the identification and decomposition of individual objects and then look at the statistical properties of the ensemble population extracted from our simulations. We use a particle limit of $N_{\rm min=20}$ and focus on self-bound objects, that is we use $\beta_{\rm E}=0.95$ (see Eq. (17)).

3.1 Individual Halo

Figure 4 shows a 3DFOF halo extracted from the L40N512 simulation and how each step in VELOCIraptor decomposes the candidate/parent object. In this example, we use a large halo composed of $\approx 10^{6}$ particles identified at $z=0$ with a 3DFOF mass of $4.2\times 10^{14}h^{-1}{\rm M}_{\odot}$ and a mass $M_{\Delta\rho_{c}}=2.7\times 10^{14}h^{-1}{\rm M}_{\odot}$ , where $M_{\Delta\rho_{c}}=4\pi\Delta\rho_{c}R_{\Delta\rho_{c}}/3$ , $\rho_{c}$ is the critical density, and $R_{\Delta\rho_{c}}$ is the radius enclosing an average density of $\Delta\rho_{c}$ , where $\Delta=200$ , commonly referred to as the virial mass. This 3DFOF object was identified using the standard linking length of $\ell_{x}=0.2L_{\rm box}/N_{p}$ , where $L_{\rm box}/N_{p}$ is the inter-particle spacing.

The initial 3DFOF halo clearly consists of several large density peaks, some of which are well outside the virial radius centred on the largest density peak. All the density peaks would be considered subhalos of the FOF envelop, save for the peak that has the largest mass associated with it, which is considered the parent halo. This subhalo/halo definition is less than ideal as some of the larger peaks are well outside the virial radius. Moreover, some of these peaks are spuriously linked to the primary via a thin particle bridge by the FOF algorithm. This example illustrates the need for more sophisticated algorithms.

Applying the 6DFOF algorithm separates the initial 3DFOF candidate into 75 (bound) groups, 3 of which are composed of $\gtrsim 10\%$ of the original 3DFOF object’s particles. Approximately $87\%$ of the original 3DFOF object’s particles are still within a group, with the largest object containing $68\%$ and having approximately the same virial mass as the original 3DFOF. The 6DFOF algorithm produces a better mapping of a FOF object to the physical definition of a halo, that of a virialised overdensity.

The largest 6DFOF halo itself appears to contain at least 4 large density peaks and numerous smaller ones. If we search for substructure by identifying locally dynamically distinct particles and linking them with a phase-space algorithm (method outlined in §2.2) we find 217 groups, containing $\approx 9\%$ of the mass of the halo, the few largest of which each contain $\approx 1\%$ of the total halo’s mass.

The largest density peaks within the 6DFOF are separated into 3 groups plus the main halo by the core search (see §2.3). These objects, remnants of minor/major mergers, contains $21\%$ of the initial host halo’s mass, with the smallest containing $3\%$ and the largest $9\%$ . The second largest merger remnant happens to be close to the main halo, making particle assignment during the core growth phase non-trivial, particularly for the outer regions that overlap in phase-space with the host. The sharp boundary between the object and the main halo is a result of a compromise between computational efficiency and rigour as we use few steps to grow cores and a global phase-space tensor to assign particles based on their distance to the cores’s centre-of-masses. Finer steps would reduce the sharpness of this transition but as it effects small amounts of mass, extra steps are unnecessary.

For comparison, other methods find similar amounts of mass, though there are some differences. hbt+, which tracks halos, assigns the least amount of mass to the most distant object. ahf underestimates the mass of the most central object relative to all other finders, expected given its configuration-space approach. rockstar, which has a similar approach to that outlined in §2.3, returns similar, if typically larger masses. Both VELOCIraptor and rockstar also give similar results to a full Gaussian mixture model using centre-of-mass of the cores as initial guess161616We use an implementation in SciKit Python package that uses variational inference which maximizes a lower bound on model evidence (including priors) instead of data likelihood..

The phase-space distribution of these objects within of the parent halo is presented in Fig. 5. Here we focus on the objects found within the 6DFOF envelop and use the total mass exclusively assigned to an object, $M_{\rm tot}$ 171717VELOCIraptor does calculate overdensity masses such as $M_{200\rho_{c}}$ for subhalos. However, these masses are calculated treating the object in isolation unlike the calculation for field halos as using all particles within a spherical region is not as physically meaningful for an object that itself resides in an overdensity.. The relative velocities and radial distances of the subhalos are scaled by maximum circular velocity of the host and its virial radius. We also show the largest halos that were separated by the 6DFOF from the initial 3DFOF envelop.

The radial motions (as well as the total relative velocity) of all subhalos belonging to the 6DFOF envelop are well within the escape velocity envelop. For this particular halo, the parent 3DFOF halo would have linked together several objects that are on first infall and lie outside the virial radius, again pointing to a better mapping between a 6DFOF object and a virialised overdensity, a.k.a, a halo. For example, the typical apocentre for particles orbiting a halo is $\sim 1.6-1.9R_{200{\rm crit}}$ (though the exact value depends on the mass accretion rate of a halo and the rarity of the halo, for $75\%$ of a halo’s particles apocentres are $\approx 1.0-1.2R_{200\rho_{m}}$ , where $R_{200\rho_{m}}\approx 1.6R_{200\rho_{c}}$ , see Diemer et al., 2017). The two largest objects separated by the 6DFOF algorithm are well outside the virial radius at similar distances of $\approx 2R_{200\rho{c,{\rm H}}}$ . However, they have large infalling radial velocities of $\approx-0.9V_{\rm max,H}$ , significantly different from most particles that have completed at least one orbit of their host halo. Following their history using a halo merger tree (built with TreeFrog Elahi et al., 2019), we see that they are on first infall, as are most of the halos within the surrounding environment181818The reason for the large number of background gray points is that there are a large number of loosely bound, poorly resolved 6DFOF halos around the main 6DFOF halo and the three infalling halos are quite rich, containing lots of substructure. (as seen by the gray diamonds with negative velocities in Fig. 5).

The inner most subhalos highlight the benefit of a phase-space finder. As an example, we focus on the large infalling subhalo located at $\approx 0.2R_{200\rho_{c},{\rm H}}$ and its surroundings, presented in Fig. 6. In configuration space, the subhalo has a similar density to the background halo. It is only in velocity space that the subhalo becomes readily apparent. The object is a local velocity outlier as it lies outside the local velocity dispersion. The extent to which this object centre-of-mass motion $V_{\rm S}$ relative to the local surroundings is an outlier is given by

[TABLE]

where $\bar{\bf v}_{\rm bg}$ and $\Sigma_{v,{\rm bg}}$ are the local mean velocity and velocity dispersion tensor. We find that its centre-of-mass velocity is a $\gtrsim 3\sigma$ outlier of the local velocity distribution. Moreover, the particles belong to the object are far more clustered about its velocity than the expectation, with the ratios of the dispersion tensors giving $2\times 10^{-6}$ .

To compare the particles belonging to the substructure to the background, we randomly sample the background particles 1000 times using the same number of particles belonging to this subhalo in a region centred on the subhalo within a radius of $1.5R_{\rm 200\rho_{c}}$ . We find velocity differences of $\sigma_{V,{\rm outlier}}=3.27\pm 0.18$ , dispersion ratios of $|\sigma_{\rm S}|/|\Sigma_{\rm bg}|=(1.6\pm 0.2)\times 10^{-6}$ and density ratios of $\left\langle{\log\rho_{\rm S}}\right\rangle/\left\langle{\log\rho_{\rm bg}}\right\rangle=1.02\pm 0.02$ . The object’s mean density is similar to the background yet the subhalo’s centre-of-mass velocity is a significant outlier and its velocity dispersion is much colder.

The low density contrast does not necessarily mean that this object cannot be recovered by configuration-space finders. For instance, AHF does recover this object, however, the object proceeds to shrink as it enters deep within the host. Moreover, the initial collection of particles within the density peak will contain both particles belonging to the subhalo and those of the background, which must be carefully pruned by an unbinding process. By using velocity information, the particles belonging to the object can be robustly separated from the background, particularly the more underdense outer regions, minimising the amount of cleaning that must be done.

The low density contrast might also suggests that this object is perhaps artificial, despite being identified by VELOCIraptor, AHF, and rockstar. To verify its physical origin, we must examine is history. We find that it is present in hbt+ catalogue and thus must have originated from a 3DFOF halo. We show the mass accretion history as reconstructed by TreeFrog Elahi et al. (2019) from the VELOCIraptor catalogue along with its the radial motion, radial and tangential velocities and maximum circular velocity in Fig. 7, highlighting how well VELOICraptor works (see Fig. 15 in §A for more examples).

At $z=0$ , this subhalo is found on a primarily radial orbit deep within the host. This object’s first progenitor formed at a redshift of $z_{\rm form}=5.1$ with 32 particles and gradually moves closer to the main branch of the host halo. On its way, it experiences a significant merger event at high redshift, i.e., it contains a subhalo that has a mass ratio of $\geq 1:10$ as indicated by the open diamond and open stars surrounding its track. This event also corresponds to when it experiences significant fluctuations in mass & $V_{\rm max}$ . The fluctuations are quite severe, changing masses by factors of $\sim 2$ , as the object is not well resolved at this time, composed of $\sim 200$ particles. The fluctuations in mass are also partially due to the fact that masses for subhalos are exclusive, whereas for field halos, the mass includes substructure. At these high redshifts, the main branch also experiences several major mergers, giving rise to mass fluctuations and changes in the relative motion of the subhalo to the host.

Prior to its accretion, the object contains a single large substructure containing $\sim 25\%$ of its total mass. The sudden drop in mass upon accretion is due to subhalo masses being exclusive in VELOCIraptor. Critically, the mass evolution after accretion is physically reasonable. Little mass is lost till pericentric passage, at which the system is shocked, increasing its $V_{\rm max}$ (and $R\left({V_{\rm max}}\right)$ ). After this impulsive heating, the halo begins to lose mass, the rate of mass loss decreasing as it reaches apocentre, which lies outside the halo at $2R_{200\rho_{c}}$ . The object then plunges radially through the halo. The slight kinks in the radial and tangential velocities during this radial infall here are due to the host halo merging with a subhalo with a mass ratio of 1:6. The central regions of the main halo consist of two overlapping phase-space distributions with slightly different velocities that are rapidly phase-mixing. VELOCIraptor is no longer able to disentangle these objects, causing a small amount of jitter in the centre-of-mass velocity of the host.

For comparison, we examine the counterpart identified by ahf, a configuration-based finder, which identifies a subhalo despite the low density contrast. The object is similar if lower mass at the last snapshot. As the orbital reconstructed orbital motion is similar, we focus on mass and maximum circular velocity evolution in Fig. 8, highlighting where the object is a subhalo and has itself significant substructure. We also show the evolution of the VELOCIraptor object and highlight with shaded regions where the object contains significant substructure or is a subhalo. This figure shows that both codes recover similar mass evolution save for two key differences. The AHF subhalo experiences a rapid mass fluctuation in mass, dropping by an order of magnitude as the object approaches pericentre where density contrasts are low. The object also forms much later when composed of $\sim 200$ particles, during a period where the object is undergoing a major merger. In the ahf catalogue, the object is lost for a few snapshots, truncating the halo merger tree. This figure indicates that in general both configuration-space and phase-space finders perform well and it is only during pericentric passages and major mergers that phase-space based finders outperform configuration-space based ones.

The other instance where a phase-space finder like VELOCIraptor outperforms configuration-space based ones is in recovering tidal debris. Tidal debris is not spatially overdense and requires measurement of the local velocity density. By using the local velocity density, VELOCIraptor naturally identifies a continuum of substructures from bound subhalos to unbound dynamically cold streams. This initial catalogue is cleaned by invoking an unbinding process. If we relax the unbinding criterion and also use the two stage iterative procedure described in §2.2 to retain tidal features and debris, we have the structures shown in the bottom-right sub-panel of Fig. 4, where we have limited the groups to those that have at most $50\%$ of their particle’s bound. These objects consist of tidal shells originating from the larger merging subhalos and subhalos with large, extended tidal tails. For a thorough discussion of tidal debris, see Elahi et al. (2013). Here we will focus on the recovered subhalo distribution.

3.2 Population

3.2.1 Halos

We examine the impact of and the results from each stage of the algorithm using default parameters. We start by looking at halos identified with 3DFOF versus a 6DFOF. Using a 6DFOF does not significantly alter the input 3DFOF population as, on average, 3DFOF halos contain a 6DFOF object with $0.82^{+0.07}_{-0.10}$ of the mass of the original FOF object, independent of the number of particles in the 3DFOF as seen in Fig. 9. Consequently, the 6DFOF mass function should show a small suppression in mass relative to the 3DFOF mass function. The number of 6DFOF objects per 3DFOF group increases with increasing number of particles in the 3DFOF group as seen in Fig. 9.

Due to resolution limits, not all 3DFOF objects contain a viable 6DFOF halo, particularly close to the imposed particle threshold of 20, where only $50\%$ of 3DFOF objects have a 6DFOF halo above this threshold. The absence of a 6DFOF halo in a 3DFOF object drops to $\lesssim 1\%$ for 3DFOF objects composed of $\sim 100$ particles. These objects are typically highly unrelaxed, unbound 3DFOF objects, i.e., spurious 3DFOF objects.

The resulting halo mass from the different FOF algorithms and N-Body simulation are shown in Fig. 10. For FOF masses, the 6DFOF mass function is effectively the 3DFOF mass function shifted to the left by $\approx 0.1$ dex (as on average 6DFOF halos contain $80\%$ of the original 3DFOF halo’s particles). The virial mass remains unchanged when comparing the 3DFOF halo to the largest 6DFOF object within the 3DFOF halo, with small mass differences due to small differences in the centre-of-mass. However, as there are on average 1.3 6DFOF groups per 3DFOF halo, the 6DFOF virial mass function has more halos per mass bin. The peak the virial mass distribution at low masses arises from loosely bound, poorly resolved halos with low overdensities.

The residuals show that the 6DFOF mass function has fewer objects than the 3DFOF one at a given $M_{\rm FOF}$ , as expected. We also compare the 6DFOF algorithm to three models, Sheth et al. (2001), Tinker et al. (2010) and Watson et al. (2013). These models span a range of algorithms used to find halos and the type of mass recorded. Sheth et al. (2001) & Watson et al. (2013) use 3DFOF algorithms, whereas Tinker et al. (2010) used to a spherical overdensity finder. Watson et al. (2013) uses $M_{\rm FOF}$ , whereas the other two use $M_{\rm 200\rho_{c}}$ . The 6DFOF relative to the models has fewer objects per mass bin. The systematic shift is of the same size as going from 3DFOF to 6DFOF. This is partially due to the 6DFOF naturally decomposing 3DFOF objects into dynamically distinct halos, although there are other systematic effects between the simulation and the theoretical curves arising from the finite volume of the box191919The models are calibrated using larger volumes. The finite volume introduces systematic biases in mass functions, suppressing growth. Cosmic variance present in larger volumes is also absent.. We also compare our reference simulation to our larger volume, lower mass resolution simulation, L210N1536. The simulations agree to within $\lesssim 5\%$ for well resolved halo masses of $\gtrsim 5\times 10^{9}h^{-1}{\rm M}_{\odot}$ , though the larger volume simulation contains slightly fewer halos with $M_{200\rho_{c}}\lesssim 10^{10.5}h^{-1}{\rm M}_{\odot}$ .

The velocity function, not presented here for brevity, is effectively unchanged save for the fact that the 6DFOF is able to decompose 3DFOF objects into multiple halos, increasing the amplitude of the 6DFOF relative to the 3DFOF for well resolved halos with $V_{\rm max}\gtrsim 30$ km/s.

3.2.2 Subhalos

We next examine the results of subhalo/core search for our example halo and the population as a whole. To determine the average subhalo mass function we stack all halos composed of $\geq 50000$ particles, i.e., all halos that at least probe the subhalo mass function down to masses fractions of $f_{M}\geq 5\times 10^{-4}$ , with halo masses of $M_{200\rho_{c}}\gtrsim 2\times 10^{12}h^{-1}{\rm M}_{\odot}$ . There are $128$ such halos in our reference simulation. We focus on overdensity mass ratios $f_{M_{200\rho_{c}}}\equiv M_{200\rho_{c}{\rm S}}/M_{200\rho_{c}{\rm H}}$ presented in Fig. 11 (although using the total mass dynamically associated to a substructure relative to the dynamical mass exclusively associated with the parent subhalo, $f_{M_{\rm tot}}\equiv M_{\rm tot,S}/M_{\rm tot,H}$ does not significantly change the resulting mass function). For consistency across catalogs, we identify all objects with the virial radius of the host as subhalos. Here we classify substructures based on the specific method used to identify them to highlight any differences in the distribution arising from the methods: objects identified by the phase-space FOF algorithm on dynamically distinct particles (§2.2) are here referred to as subhalos; objects identified by searching for phase-space dense cores (§2.3) in the parent halo are classified as mergers (containing both major and minor merger events with mass ratios of $\gtrsim 0.05$ ). This categorisation does not imply a hard physical difference between objects, it is simply to highlight any algorithmic differences. We also classify objects identified within substructures, so-called subsubstructures as subhalos.

On average, halos contain a total of $208_{-130}^{+22}$ subhalos with overdensity masses of $f_{M_{200\rho_{c}}}\gtrsim 7\times 10^{-5}$ (with the numbers increasing if looking at total dynamical masses with the same limit of $f_{M_{\rm tot}}\gtrsim 7\times 10^{-5}$ to $272^{+61}_{-177}$ ). Halos contain at least 1 subhalo with a mass of $f_{M_{200\rho_{c}}}\sim 10^{-2}$ . Significant merger events are not uncommon, with an average number per halo of $1.7\pm 1.6$ . The example halo contains subhalos with $10^{-5}\lesssim f_{M_{200\rho_{c}}}\lesssim 10^{-2}$ and contains three large merger remnants with mass fractions of $f_{M_{200\rho_{c}}}\gtrsim 4\times 10^{-2}$ . The fiducial halo has more substructure than the average (it lies close to the $+1\sigma$ envelop), which is to not unexpected given the number of merger remnants it contains and the fact that this 6DFOF halo lies at the nexus of three large merging halos (see Fig. 4).

The median and halo-to-halo scatter seen in our small volume simulation is in agreement with that seen in our large volume, lower-mass resolution simulation, L210N1536, when applying the same particle number threshold (for clarity we only show the median for this simulation). The median distribution from L210N1536 is based on $3000$ halos with a higher median masses of $M_{200\rho_{c}}\approx 3\times 10^{13}h^{-1}{\rm M}_{\odot}$ . The agreement between different host halo masses indicates a mostly scale-free subhalo mass function.

For comparison, we also show the results from ahf, a configuration-space based halo finder, rockstar, a phase-space halo finder, and hbt+, a 3DFOF tracker. These mass functions agree within the scatter modulo differences in the definition of subhalo masses, which vary from halo finder to halo finder (see Knebe et al., 2013b, for a discussion of mass definitions). VELOCIraptor can report a variety of different masses: bound mass, total dynamical mass, overdensity masses, etc. The first two masses are calculated using a exclusive particle list. For halos, it calculates inclusive spherical overdensity masses. For subhalos, all these masses are calculated based on a list of particles belonging exclusively to the object, neglecting the background host and internal substructures. rockstar also calculates masses for subhalos in a similar fashion using exclusive particle lists. ahf calculates inclusive spherical overdensity masses defined by a saddle point and processed through an unbinding procedure, so most resembles the spherical overdensity masses of VELOCIraptor and rockstar. hbt+ returns bound masses based on the initial FOF envelop and does not allow subhalos to accrete mass from their surrounding host halo, though they can accrete material from subsubhalos, those objects that were subhalos when the object itself was a FOF halo. This mass best corresponds to the total bound mass calculated by VELOCIraptor.

Although the mass functions agree, there are systematic differences in the number of subhalos per halo found by each finder. Given the high cadence of the input 3DFOF catalogue202020With high cadence, a 3DFOF tracker is unlikely to miss the formation of a halo., hbt+ is a useful reference catalogue. VELOCIraptor finds similar numbers of objects composed of $\geq 20$ particles within $R_{200\rho_{c}}$ of large halos as hbt+, identifying $98\pm 7\%$ of all 3DFOF halos tracked, some of the variation due to differences in the centre-of-mass. ahf finds a slightly smaller percentage of $84\pm 10\%$ , the lower number arising from small, low-density subhalos. The outlier is rockstar, which identifies a factor of $1.85^{+0.15}_{-0.2}$ more objects, though a significant fraction appear to be diffuse, possibly spurious, phase-space structures with low $M_{200\rho_{c}}$ , with some never reaching overdensities of $200\rho_{c}$ . Removing these low density objects from the halo catalogue places it more in line with the other codes, though it still identifies a factor of $1.05^{+0.1}_{-0.05}$ more objects than hbt+.

The average subhalo distribution is well-characterised by a power-law with an exponential dampening at the high mass. We fit the average mass function using emceeForeman-Mackey et al. (2013) with

[TABLE]

focusing on subhalos explicitly (that is those objects identified by the method outlined in §2.2 with typical mass ratios of $f_{M}\lesssim 10^{-2}$ ), and ignore minor/major merger remnants (objects identified by the method outlined in §2.3 with typical mass ratios of $f_{M}\gtrsim 10^{-2}$ ). We find $\log A=-1.7_{-1.0}^{+0.7}$ , $\alpha=1.85_{-0.18}^{+0.16}$ , $\log f_{o}=-1.33\pm 0.9$ , $\beta=3.2\pm 1.9$ for $M_{200\rho_{c}}$ , though the fit does not vary drastically if we use total masses. The amplitude and power-law are consistent with previous studies (e.g. Madau et al., 2008; Springel et al., 2008; Stadel et al., 2009; Gao et al., 2012; Onions et al., 2012; Rodríguez-Puebla et al., 2016; van den Bosch & Jiang, 2016; Han et al., 2018). The scale of the exponential dampening occurs at $f_{M}\approx 0.05$ , in agreement with recent studies (van den Bosch & Jiang, 2016; Han et al., 2018, e.g.,).

Large mass ratio objects, that is minor and major mergers, appear to be characterised by a skewed-Gaussian-like distribution. The fact that mergers follow a different distribution than subhalos is not surprising as once objects are large enough, they become less prone to tidal stripping and more affected by dynamical friction. Given number of merger remnants in this data set, we refrain from fitting the distribution, though the average of $\log f_{M_{200\rho_{c}},{\rm mergers}}=-1.2\pm 0.8$ is in agreement with Elahi et al. (2018), who used a larger data set to fit results. We find that the total subhalo mass function also agrees with the double power-law fit in Han et al. (2018), although the the second power-law describing the high mass end is poorly constrained with values of $1.1-1.5$ (for completeness we show the double power-law from Han et al. (2018) in the figure).

The fact that the total subhalo mass function (subhalos+mergers) is not characterised by a single power-law is also seen by Han et al. (2018), (see also hbt+ in Fig. 11). They argued for characterising the subhalo mass function by a double Schetcher function with a steep power-law for low mass fractions and a flatter that dominates at high mass fractions. Given the small number of large subhalos, which also span a very small range in $f_{M}$ , it is difficult to differentiate between either model with the number of host halos in this sample and the halo-to-halo scatter.

The radial distribution of subhalos in the form of the differential number density $dn/dV$ normalised by the number of objects at the virial radius is shown in Fig. 12. We limit our sample to halos composed of $\geq 10^{5}$ , as these halos contain significant amounts of substructure and have density profile converged to radii of $\approx 10^{-2}R_{200\rho_{c}}$ Power et al. (2003).

We fit a generalized NFW-like profile to the distribution:

[TABLE]

where $r_{s}$ is the scale radius, and $\alpha$ and $\beta$ represent the inner and outer slopes. This fit is motivated by the fact that halo dark matter density profiles follow this profile with $\alpha=1$ and $\beta=2$ . Subhalos should be radially distributed in a similar to the smoothly accreted dark matter. We find an optimal fit of $r_{s}=0.4\pm 0.1R_{200\rho_{c}}$ , $\alpha=0.10\pm 0.23$ and $\beta=3.85^{+0.11}_{-0.23}$ . This profile that is flatter than a halo density profile in the inner regions, in agreement with previous studies (see for instance Han et al., 2016, where they find an inner slope of $\sim 0.3$ for a very well resolved $10^{12}h^{-1}{\rm M}_{\odot}$ halo, a fit that well describes halos over a wide range in masses.), although our halos are not well resolved enough precisely constrain the exact slope of the inner profile. Only very high resolution zoom simulations, such as Aquarius Springel et al. (2008), contain enough subhalos to properly constrain the inner slope and even then, since subhalos spend most of their time at apocentre and not pericentre, few subhalos are present in the very central regions for long.

The second power-law index implies that the logarithmic slope $d\ln n/d\ln r=-\alpha-\beta\left(r/r_{s}\right)/\left(1+r/r_{s}\right)$ is steeper than a NFW profile and even our fit does not capture the steep slope of the subhalo distribution. However, we stress that at the virial radius, both the subhalo radial distribution and the halo density profiles have similar logarithmic slopes of $\sim-2.8$ . Only at even larger radii do subalos drop off faster than an NFW profile212121It should be noted that average density profiles of cluster mass halos also fall off faster than an NFW profile for $1\lesssim R_{200\rho_{c}}\lesssim 1.6$ before becoming flatter than an NFW profile at larger radii (e.g. Diemer & Kravtsov, 2014)..

Comparing results, we find that the median distribution from the larger volume, lower-mass resolution simulation agrees with our L40N512. The fact that host halos in the L210N1536 sample are $\sim 10$ times more massive argues in favour of a scale-free radial distribution. The ahf radial distribution is biased to larger radii and contains fewer subhalos deep within the host. The lack of subhalos within $0.1R_{200\rho_{c}}$ has to do with the configuration-based nature of ahf. Density contrasts between subhalo and host are small, making it more difficult to separate subhalos from the background. Both hbt+ and rockstar agree within the halo-to-halo scatter.

We now focus on mass of subhalos as a function of radius, where here we identify all objects within the virial radius of the host. The average substructure radial-mass dependence is shown in Fig. 13, where we again stack all well-resolved halos, scaling subhalo masses and radial distances by virial masses and radius of the parent halo. The mass distribution for most radial bins, both in the median and the scatter, shows little radial dependence. The total population shows a very weak correlation with the Pearson covariance coefficient of $0.1\pm 0.3$ , which is consistent with no dependence.

Only the inner radii, typically within the scale radius of the host parent, do subhalo masses strongly depend on radii. The median subhalo mass markedly increases in the central regions. There are even subhalos with mass ratios as large as $f_{M_{200\rho_{c}}}\sim 0.2$ found within $0.27R_{200\rho_{c}}$ . This radial-mass dependence is also present in our larger volume, lower mass resolution run. The reason for this trend is two-fold: 1) large subhalos are strongly affected by dynamical friction, pulling both their pericentres and apocenters inward; 2) large subhalos are also less prone to tidal disruption. Thus we should expect the inner regions to be dominated by large subhalos.

This trend is also seen in hbt+. By tracking 3DFOF halos, Han et al. (2018) found the inner regions of halos contain large subhalos that remain trapped due to dynamical friction. rockstar, another phase-space finder also reproduces the general trend 222222As mentioned previously, the rockstar catalogue contains low physical density objects with $M_{2}00\rho_{c}$ masses below the particle number threshold used, with some having objects having densities below $200\rho_{c}$ , i.e., $M_{200\rho_{c}}=0$ . Here we limit the catalogue to objects with $M_{200\rho_{c}}\geq 20m_{p}$ , where $m_{p}$ is the particle mass.. In contrast, configuration-space based finders like ahf shows a bias in the opposite direction in the very inner regions, and has no subhalos with $f_{M_{200\rho_{c}}}\sim 0.2$ within $\gtrsim 0.4R_{200\rho_{c}}$ .

To further investigate differences between codes, we compare the normalised cumulative radial distribution of subhalos in Fig. 14 to further examine this apparent radial-mass dependence, focusing on low and high mass subhalos. Our lower mass bin samples halos composed of $\sim 100-1000$ particles for the smallest halos in this sample, well above the particle number threshold used to identify structures. Our upper mass bin effectively chooses major mergers. We find little difference between codes, with the inner most objects found well within the scale radius of the host halo. There is greater disagreement for large subhalos, in part owing to how the centre of a halo is defined (most bound particle, shrinking spheres estimate of mass, total bulk centre of mass). Nevertheless, we see that ahf is noticeably more biased to identifying large subhalos at larger radii that the other codes, a consequence of its configuration-space based approach.

4 Discussion and Conclusion

We have presented VELOCIraptor, a novel code designed to identify halos, subhalos, tidal debris and galaxies in both N-body and full hydrodynamical simulations using phase-space information. We have demonstrated that the code robustly identify (sub)halos, particularly cases that are typically notoriously difficult for such codes, namely the mass reconstruction of subhalos deep inside their host halo and major mergers. We summarise key features/results below.

VELOCIraptor identifies structures in a multi-step process. For N-body simulations, it first identifies field halos using a 3DFOF followed by a 6DFOF algorithm. The next step identifies substructure in each halo in two stages. The first stage uses the previously developed algorithm described in Elahi et al. (2011), finding velocity outliers (so-called peaks above the Maxwellian sea) and linking particles using a phase-space FOF. The next stage is to find any remaining large minor/major mergers using an iterative search for dense phase-space cores that are then grown in an iterative fashion using phase-space tensors.

We find that 6DFOF objects are more representative of dark matter halos than 3DFOF objects as 3DFOF objects can link separate virialised overdensities together via particle bridges. The 6DFOF step separates early stage accretion/merger events, with the average number of 1.3 6DFOF objects per 3DFOF objects. The 6DFOF also removes outer unbound particles from the 3DFOF candidate, with FOF masses changing by $M_{\rm 6DFOF}=0.82M_{\rm 3DFOF}$ while leaving spherical overdensity masses, particularly $200\rho_{c}$ , unchanged.

The substructure algorithm (tested in Elahi et al., 2011, 2013, and shown to identify both subhalos and tidal debris) has the advantage over other algorithms of being able to identify subhalos deep within a host halo, where density contrasts relative to background are negligible. We highlighted a particular example where the average logarithmic density contrast between the subhalo and the host halo are $\sim 1$ , yet its particles are very distinct in velocity space. This subhalo does not undergo rapid artificial decrease in mass that affects most subhalo configuration-space based finders.

The merger algorithm, a new addition to the code, is fully described here. This algorithm uses full phase-space tensors to assign particles to any phase-space dense cores that are not already tagged as substructure. This technique, inspired by rockstar Behroozi et al. (2013) and Gaussian mixture models, can separate substructures from the main halo deep within the host (at least up to the scale radius of a host halo). The use of phase-space tensors allows for the mass assignment scheme to asymmetric tidal features associated with an object, unlike rockstar, which uses a scalar dispersion to assign particles. The iterative growth is also more physical than assigning particles using Gaussian mixture models, which assume a global dispersion tensor. This method does not necessarily artificially shrink halos as they move towards pericentre, as seen in the example figures in the appendix, though the scheme can occasionally lose halos or result in mass fluctuations of a few when objects overlap significantly. This can be alleviated somewhat by using a finer steps when searching for cores and assigning mass.

The resulting subhalo mass function reproduces the mass and radial distribution seen in codes that track particles, such as hbt+. Like this FOF tracker, the subhalo mass function can be decomposed into a distribution for low and high mass ratios. The low mass ratio end is described by a power-law with an exponential cut-off, with an index of $\alpha=1.85_{-0.18}^{+0.16}$ and a cut-off mass ratio scale of $f_{o}\sim 0.05$ . Our simulation does not have enough halos to well constrain the high mass end it can either be characterised by a power-law with a much flatter slope or possibly a lognormal distribution in mass ratio.

Critically, VELOCIraptor can recover the radial-mass distribution seen in tracking codes like hbt+, with larger subhalos found at smaller radii, without the need of tracking. The central regions within the scale radius of a halo are dominated by large subhalos and merger remnants. Although our fiducial simulation only contains a small sample of $\sim 50$ well resolved halos composed of $\gtrsim 10^{5}$ particles, which is not enough to rigorously constrain the inner radial distribution, these halos are resolved enough for this trend to be observed by hbt+ and recovered by VELOCIraptor. This is in contrast to the distribution recovered by configuration-space based finders. The code also does not introduce possibly spurious phase-space structures like rockstar, which also recovers the radial-mass dependence.

This radial-mass dependence is seen in our larger volume simulation, which contains $\sim 1500$ well-resolved halos, including $\sim 50$ halos composed of $10^{6}$ particles. As we do not analyse this simulation with hbt+, we cannot definitively say that the observed trend is that recovered by tracking, though given the results from our fiducial simulation, it is likely in agreement.

The code is in active development. New input interfaces for hydrodynamical simulations are being developed (e.g. Cañas et al., 2018) and it is being incorporated into the swift Hydrodynamical N-Body code (www.swiftsim.com Schaller et al., 2016). Additional libraries are being integrated to improve the parallel efficiency, such as the ADIOS library, designed for parallel IO at the $\sim 10^{4}$ node scale, and METIS for efficient MPI decomposition. The output produced also lends itself to large-scale processing as it produces compressed, self-describing binary HDF5 data.

Finally VELOCIraptor is not limited to analysing cosmological simulations. The primary substructure algorithm is suited to finding clustering in a variety of data. One novel application could be to decompose data from GAIA Lindegren et al. (2018), which contains 5-dimensional phase-space information for 1.3 billions stars, and full 6D phase-space information for 7 million in the Milky Way. Early analysis shows the mean velocity structure of the Milky Way disk is complex, with features indicative of substructure in the solar neighbourhood Gaia Collaboration et al. (2018). This data set is only just beginning to be mined for kinematic structures (e.g. Hawkins & Wyse, 2018; Price-Whelan & Bonaca, 2018; Marchetti et al., 2018). For instance, Castro-Ginard et al. (2018) used clustering algorithms and artificial neural networks to identify open clusters in the GAIA data set. This method essentially looks for full phase-space (configuration and velocity) clustering akin to a 6DFOF algorithm, as such is tailored to identifying open clusters. The nature of the substructure algorithm in VELOCIraptor makes it well suited for identifying open clusters and other substructures and even be extended to use other information, such as metallicity, making analysing this data with the code an interesting exercise.

Acknowledgements.

The authors would like to thank the anonymous referee for insightful comments that helped improve the clarity of the text. RC is supported by the SIRF awarded by the University of Western Australia Scholarships Committee, and the Consejo Nacional de Ciencia y Tecnología (CONACyT) scholarship No. 438594 and the MERAC Foundation. RP is supported by a University of Western Australia Scholarship. Parts of this research were conducted by the Australian Research Council Centre of Excellence for All Sky Astrophysics in 3 Dimensions (ASTRO 3D), through project number CE170100013. CL is funded by a Discovery Early Career Researcher Award DE150100618. CL also thanks the MERAC Foundation for a Postdoctoral Research Award. The authors contributed to this paper in the following ways: PJE ran simulations and analysed the data, made the plots and wrote the bulk of the paper. PJE is the primary developer of both VELOCIraptor. RC, RT & JW designed and developed of various aspects of the code: RC developed the core search; RT developed the compilation infrastructure; and JW bug tested and developed the interface with swiftsim. RP, CL, CP, & AR assisted in the design of various aspects of the code. All authors have read and commented on the paper.

Facilities

Magnus (Pawsey Supercomputing Centre)

Software

•

VELOCIraptor https://github.com/pelahi/VELOCIraptor-STF

•

TreeFrog https://github.com/pelahi/TreeFrog

•

NBodylib https://github.com/pelahi/NBodylib

•

VELOCIraptor_Python_Tools https://github.com/pelahi/VELOCIraptor_Python_Tools

•

MergerTreeDendograms https://github.com/rhyspoulton/MergerTree-Dendograms

•

ahf http://popia.ft.uam.es/AHF/Download.html

•

rockstar https://bitbucket.org/gfcstanford/rockstar

•

hbt+ https://github.com/Kambrian/HBTplus

Additional Software

Python, Matplotlib Hunter (2007), Scipy Jones et al. (01), emcee Foreman-Mackey et al. (2013), SciKit Pedregosa et al. (2011), Gadget Springel (2005)

Appendix A Orbits

We show the orbits of a low mass subhalo accreted at high redshift and large subhalo accreted at late times in Fig. 15. The poorly resolved subhalo is still recovered when deep inside the host halo even when composed of $\sim 30$ particles. There are gaps in the subhalo’s orbit where it is momentarily lost. The large subhalo is accreted at late times and is still approaching pericentre. It does lose an appreciable amount of mass as it approaches pericentre, decreasing in mass by $\sim 20\%$ over the last two snapshots as it moves from $r/R_{200\rho_{c}}=0.65$ to its current position of $r/R_{200\rho_{c}}=0.41$ . For comparison, the configuration-space based finder ahf shrinks the object by $\sim 2$ over the same period.

Appendix B Tables

We list the complete set of configuration options along with a list of properties calculated by VELOCIraptor.

Appendix C Associated Tools

VELOCIraptor comes with a Python-2/3 tool-kit, specifically routines to manipulate the output data produced by the various codes. Typically, these produce dict containing numpy arrays, allowing for quick analysis and plotting. The repositories also come with examples of producing metric plots. The codes are Python-3 (compatible with Python-2) and make use of numpy, h5py, scipy, matplotlib, and scikit.learn.

Bibliography66

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Allgood et al. (2006) Allgood B., Flores R. A., Primack J. R., Kravtsov A. V., Wechsler R. H., Faltenbacher A., Bullock J. S., 2006, MNRAS , 367, 1781 · doi ↗
2Appel (1985) Appel A., 1985, SIAM J. Sci. Stat. Comput., 6, 85
3Arthur et al. (2017) Arthur J., et al., 2017, MNRAS , 464, 2027 · doi ↗
4Barnes & Hut (1986) Barnes J., Hut P., 1986, Nature , 324, 446 · doi ↗
5Behroozi et al. (2013) Behroozi P. S., Wechsler R. H., Wu H.-Y., 2013, Ap J , 762, 109 · doi ↗
6Behroozi et al. (2015) Behroozi P., et al., 2015, MNRAS , 454, 3020 · doi ↗
7Bryan & Norman (1998) Bryan G. L., Norman M. L., 1998, Ap J , 495, 80 · doi ↗
8Bullock et al. (2001) Bullock J. S., Kolatt T. S., Sigad Y., Somerville R. S., Kravtsov A. V., Klypin A. A., Primack J. R., Dekel A., 2001, MNRAS, 321, 559