OpenRBC: A Fast Simulator of Red Blood Cells at Protein Resolution
Yu-Hang Tang, Lu Lu, He Li, Constantinos Evangelinos, Leopold, Grinberg, Vipin Sachdeva, George Em Karniadakis

TL;DR
OpenRBC is a highly efficient, parallelized molecular dynamics simulator capable of modeling entire mammalian red blood cells at protein resolution on standard workstations, enabling advanced biomechanical studies.
Contribution
The paper introduces an adaptive Voronoi-based spatial-searching algorithm and a scalable, high-performance code for simulating large-scale red blood cell models at unprecedented detail.
Findings
Outperforms legacy simulators by nearly tenfold in speed.
Enables simulation of 4 million particles representing an entire red blood cell.
Scales efficiently to hundreds of CPU threads.
Abstract
We present OpenRBC, a coarse-grained molecular dynamics code, which is capable of performing an unprecedented in silico experiment --- simulating an entire mammal red blood cell lipid bilayer and cytoskeleton as modeled by 4 million mesoscopic particles --- using a single shared memory commodity workstation. To achieve this, we invented an adaptive spatial-searching algorithm to accelerate the computation of short-range pairwise interactions in an extremely sparse 3D space. The algorithm is based on a Voronoi partitioning of the point cloud of coarse-grained particles, and is continuously updated over the course of the simulation. The algorithm enables the construction of the key spatial searching data structure in our code, i.e. a lattice-free cell list, with a time and space cost linearly proportional to the number of particles in the system. The position and shape of the cells also…
| Capability & Design | Performance - time steps / day | ||||||||
| OpenRBC | Legacy | Improvement | Cores | Particles | OpenRBC | Legacy | Speedup | ||
| Max system size (#particles) | 40 times | 20 | - | - | |||||
| Line of code | 4,677 | 7,424 | 37% less | 4 | 9.2 | ||||
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
OpenRBC: A Fast Simulator of Red Blood Cells at Protein Resolution
Yu-Hang Tang
Lu Lu
He Li
Division of Applied Mathematics, Brown University, Rhode Island, USA 02912
Constantinos Evangelinos
International Business Machines Corporation
Leopold Grinberg
International Business Machines Corporation
Vipin Sachdeva
International Business Machines Corporation
George Em Karniadakis
Division of Applied Mathematics, Brown University, Rhode Island, USA 02912
Abstract
We present OpenRBC 111The source code is available at https://github.com/yhtang/OpenRBC ., a coarse-grained molecular dynamics code, which is capable of performing an unprecedented in silico experiment — simulating an entire mammal red blood cell lipid bilayer and cytoskeleton as modeled by 4 million mesoscopic particles — using a single shared memory commodity workstation. To achieve this, we invented an adaptive spatial-searching algorithm to accelerate the computation of short-range pairwise interactions in an extremely sparse 3D space. The algorithm is based on a Voronoi partitioning of the point cloud of coarse-grained particles, and is continuously updated over the course of the simulation. The algorithm enables the construction of the key spatial searching data structure in our code, i.e. a lattice-free cell list, with a time and space cost linearly proportional to the number of particles in the system. The position and shape of the cells also adapt automatically to the local density and curvature. The code implements OpenMP parallelization and scales to hundreds of hardware threads. It outperforms a legacy simulator by almost an order of magnitude in time-to-solution and more than 40 times in problem size, thus providing a new platform for probing the biomechanics of red blood cells.
Keywords: coarse-grained molecular dynamics, bilayer, cytoskeleton, membrane fluctuation, vesiculation, high-performance computing
Introduction
The red blood cell (RBC) is one of the simplest, yet most important cells in the circulatory system due to their indispensable role in oxygen transport. An average RBC assumes a biconcave shape with a diameter of 8 m and a thickness of 2 m. Without any intracellular organelles, it is supported by a cytoskeleton of a triangular spectrin network anchored by junctions on the inner side of the membrane. Therefore, the mechanical properties of an RBC can be strongly influenced by molecular level structural details that alter the cytoskeleton and lipid bilayer properties.
Both continuum models [1, 2, 3, 4] and particle-based models [5, 6, 7, 8] have been developed with the aim to help uncover the correlation between RBC membrane structure and property. Continuum models are computationally efficient, but require a priori knowledge of cellular mechanical properties such as bending and shear modulus. Particle models are useful for extracting RBC properties from low-level descriptions of the membrane structure and defects. However, it is computational demanding, if not prohibitive, to simulate the large number of particles required for modeling the membrane of an entire RBC. To the best of our knowledge, a bottom-up simulation of the RBC membrane at the cellular scale using particle methods remains absent.
Recently, a two-component coarse-grained molecular dynamics (CGMD) RBC membrane model which explicitly accounts for both the cytoskeleton and the lipid bilayer was proposed [9]. The model could potentially be used for particle-based whole-cell RBC modeling because its coarse-grained nature can drastically reduce computational workload while still preserving necessary details from the molecular level. However, due to the orders of magnitude difference in the length scale between a cell and a single protein, a total of 4 million particles are still needed to represent an entire RBC. In addition, the implicit treatment of the plasma in this model eliminates the overhead for tracking the solvent particles, but also exposes a notable spatial density heterogeneity because all CG particles are exclusively located on the surface of a biconcave shell. The inside and outside of the RBC are empty. This density imbalance imposes a serious challenge on the efficient evaluation of the pairwise force using conventional algorithms and data structures, such as the cell list and the Verlet list, which typically assumes a uniform spatial density and a bounded rectilinear simulation box.
In this paper we present OpenRBC, a new software tailored for simulations of an entire RBC using the two-component CGMD model on multicore CPUs. As illustrated in Figure 1, the simulator can take as input a triangular mesh of the cytoskeleton of a RBC and reconstruct a CGMD model at protein resolution with explicit representations of both the cytoskeleton and the lipid bilayer. This type of whole cell simulation of RBCs can thus realize an array of in silico measurements and explorations of:
RBC shear and bending modulus,
membrane loss through vesiculation in spherocytosis and elliptocytosis [10],
anormalous diffusion of membrane proteins [11],
interaction between sickle hemoglobin fibers and RBC membrane in sickle cell disease [12, 13],
uncoupling between the lipid bilayer and cytoskeleton [14],
adenosine triphosphate (ATP) release due to deformation [15],
nitric oxide (NO) modulated mechanical property change [16],
cellular uptake of elastic nanoparticles [17].
Software Overview
OpenRBC is written in C++ using features from the C++11 standard. To maximize portability and allow easy integration into other software systems[18], the project is organized as a header-only library with no external dependencies. The software implements SIMD vectorization [19] and OpenMP shared memory parallelization, and was specifically optimized toward making efficient use of large numbers of simultaneous hardware threads.
As shown in Figure 2A, the main body of the simulator is a time stepping loop, where the force and torque acting on each particle is solved for and used for the iterative updating of the position and orientation according to the equation of motion. The time distribution of each task in a typical simulation is given in Figure 2B. The majority of time is spent in force evaluation, which is compute-bound. This makes the code highly efficient in utilizing the high thread count of modern CPUs with the shared memory programming paradigm.
Initial structure generation
As shown in Figure 1, a two-component CGMD RBC system can be generated from a triangular mesh which resembles the biconcave shape of a RBC at equilibrium. Note that the geometry may be alternatively sourced from experimental data using techniques such as optical image reconstruction because the algorithm itself is general enough to adapt to an arbitrary triangular mesh. This feature can be useful for simulating RBCs with morphological anomalies. Actin and glycophorin protein particles are placed on the vertices of the mesh, while spectrin and immobile band-3 particles are generated along the edges. The band-3–spectrin connections and actin–spectrin connections can be modified to simulate RBCs with structural defects. Lipid and mobile band-3 particles are randomly placed on each triangular face by uniformly sampling each triangle defined by the three vertices [20]. A minimum inter-particle distance is enforced to prevent clutter between protein and lipid particles. The system is then optimized using a velocity quenching algorithm to remove collision between the particles.
Spatial Searching Algorithm
Pairwise force evaluation accounts for more than 70% of the computation time in OpenRBC as well as other molecular dynamics softwares[21, 22]. To efficiently simulate the reconstructed RBC model, we invented a lattice-free spatial partitioning algorithm that is inspired by the concept of Voronoi diagram. The algorith, at the high level, can be described as
As illustrated in Figure 3, the algorithm adaptively partitions a particle system into a number of Voronoi cells that are approximately equally populated. In contrast, a lattice-based cell list leaves many cells vacant due to the density heterogeneity. Thus, the algorithm can provide very good performance in partitioning the system, maintaining data locality and searhcing for pairwise neighbors in a sparse 3D space. It is implemented in our software using a k-means clustering algorithm, which is, in turn, enabled by a highly optimized implementation of the k-d tree searching algorithm, as explained below.
A Voronoi tessellation [23] is a partitioning of a n-dimensional space into regions based on distance to a set of points called the centroids. Each point in the space is attributed to the closest centroid (usually in the L2 norm sense). An example of a Voronoi Diagram generated by 12 centroids on a 2D rectangle is given in Figure 4A.
The k-means clustering [24] is a method of data partitioning that aims to divide a given set of n vectors into k clusters in which each vector belongs to the cluster whose center is closest to it. The result is a partition of the vector space into a Voronoi tessellation generated by the cluster centers as shown in Figure 4B. Searching for the optimal clustering which minimizes the within-cluster sum of square distance is NP-hard, but efficient iterative heuristics based on e.g. the expectation-maximization algorithm [25] can be used to quickly find a local minimum.
A k-d tree is a spatial partitioning data structure for organizing points in a k-dimensional space [26]. It is essentially a binary tree that recursively bisects the points owned by each node into two disjoint sets as separated by a axial-parallel hyperplane. It can be used for the efficient searching of the nearest neighbors of a given query point in time, where is the total number of points, by pruning out a large portion of the search space using cheap overlap checking between bounding boxes.
The k-means/Voronoi partitioning of a point cloud adapts automatically to the local density and curvature of the points. As such, we exploit this property to create a generalization of the cell list algorithm using the Voronoi diagram. The algorithm can be described as a two-step procedure: 1) clustering all the particles in the system using k-means, followed by an online Expectation-Maximization algorithm that continuously updates the system’s Voronoi cells centroid location and particle ownership; 2) sorting the centroids and particles with a two-level data reordering scheme , where we first order the Voronoi centroids along a space filling curve (a Morton curve, specifically) and then reorder the particles according to the Voronoi cell that they belong to. The pseudocode for the algorithm can be found in SI Algorithm LABEL:SI-alg:voronoi-cell-list. The reordering step in updating the Voronoi cells ensures that neighboring particles in the physical space are also statistically close to each other in the physical space. This locality can speed up the k-d tree nearest-neighbor search by using the closest centroid of the last particle as the initial guess for the current particle. The heuristic helps to further prune out most of the k-d tree search space and essentially reduces the complexity of a nearest-neighbor query from to . In practice, this brings about 100 times acceleration when searching through 200,000 centroids. As shown by Figure 4C, the Voronoi cells generated from a k-means clustering of the CG particles are uniformly distributed on the surface of the lipid membrane.
Force Evaluation
Lipid particles accounts for 80% of the population in the whole-cell CGMD system. The Voronoi cells can be used directly for efficient pairwise force computation between lipids with a quad loop that ranges over all Voronoi cell , all neighboring cells of , all particles in , and all particles in as shown in SI Algorithm LABEL:SI-alg:pairwise. Since the cytoskeleton of a healthy RBC is always attached to the lipid bilayer, its protein particles are also distributed following the local curvature of the lipid particles. This means that we can reuse the Voronoi cells of the lipid particles, but with a wider searching cutoff, to compute both the lipid-protein and protein-protein pairwise interactions. For diseased RBCs with a fully or partially detached cytoskeleton, a separate set of Voronoi cells can be set up for the cytoskeleton proteins to compute the force. A list of bonds between proteins is maintained and used for computing the forces between proteins that are physically linked to each other.
A commonly used technique in serial programs to speed up the force computation is to take advantage of the Newton’s 3rd law of action and reaction. Thus, the force between each pair of interacting particles is only computed once and added to both particles. However, this generates a race condition in a parallel context because two threads may end up simultaneously computing the force on a particle shared by two or more pairwise interactions.
Our solution takes advantage of the strong spatial locality of the particles as maintained by the two-level reordering algorithm, and decomposes the workload both spatially and linearly-in-memory into patches by splitting the linear range of cells indices among OpenMP threads. Each thread will be calculating the forces acting on the particles within its own patch. As shown in Figure 5, force accumulation without triggering racing condition can be realized by only exploiting the Newton’s 3rd law on pairwise interactions where both Voronoi cells belong to a thread’s own patch. Interactions involving a pair of particles from different patches are calculated twice (once for each particle) by each thread. The strong particle locality minimizes the shared contour length between two patches and hence also minimizes the amount of inter-patch interactions.
Validation and Benchmark
In this section we present validation of our software by comparing simulation and experimental data. We also compare the program performance against that of a legacy CGMD RBC simulator used in Ref [9]. The legacy simulator, which performs reasonably well for a small number of particles in a periodic rectangular box, was written in C and parallelized with the message passing interface (MPI) using a rectilinear domain decomposition scheme and a distributed memory model. Two computer systems were used in the benchmark each equipped with a different mainstream CPU microarchitecture, i.e. the AMD Piledriver, and the IBM Power8 [27]. The specification of the machines are given in Table 1.
To compare performance between OpenRBC and the legacy simulator, the membrane vesiculation process of a miniaturized RBC-like sphere with a surface area of of was simulated. The evolution of the dynamic process is visualized from the simulation trajectory and shown in Figure 6A. OpenRBC achieves almost an order of magnitude speedup over the legacy solver in this case on all three computer systems as shown in Table 1.
Furthermore, OpenRBC can efficiently simulate an entire RBC modeled by 3.2 million particles and correctly reproduce the fluctuation and stiffness of the membrane as shown in Figure 6B. The legacy solver was not able to launch the simulation due to memory constraint. The simulation was carried out by implementing the experimental protocol of Ref. [29] that measures the instantaneous vertical fluctuation along the upper rim of a fixed RBC. In addition, a harmonic volume constraint is applied to maintain the correct surface-to-volume ratio of the RBC. We measured a membrane root-mean-square displacement of 33.5 nm, while previous experimental observations and simulation results range between 23.6 nm to 58.8 nm [28, 29, 30, 31].
Scaling benchmark for the whole cell simulation on the three computer systems is given in Figure 7. It can be seen that compute-bound tasks such as pairwise force evaluation can scale linearly across physical cores. Memory-bound tasks benefit less from hardware threading as expected, however thanks to thread pinning and a consistent workload decomposition between threads there is no performance degradation due to side effects such as cache and bandwidth contention.
It is also worth noting that Fu et al. recently published an implementation of a related RBC model in LAMMPS [32], which can simulate particles for time steps on 864 CPU cores in 2761 seconds. However, the use of explicit solvent particles in their model generates difficulty to establish a fair performance comparison between their implementation and OpenRBC. Nevertheless, as a rough estimate and assuming perfect scaling, their timing result can be translated into simulating particles for time steps per day on 864 cores. OpenRBC can perform roughly the same amount of time steps on 20 CPU cores. We do recognized that the explicit solvent model carrys more computational workload, and that implementing non-rectilinear partitioning schemes may not be straightforward within the current software framework of LAMMPS. This comparison serves more as a demonstration of the usage of the shared-memory programming paradigm on fat compute nodes with large amounts of strong cores and memory.
Summary
We presented a from-scratch development of a coarse-grained molecular dynamics software, OpenRBC, which exhibits exceptional efficiency when simulating systems of large density discrepancy. This capability is supported by a key algorithm innovation for computing an adaptive partitioning of the particles using a Voronoi diagram. The program is parallelized with OpenMP and SIMD vector instructions, and implements threading affinity control, consistency loop partitioning, kernel fusion, and atomics-free pairwise force evaluation to increase the utilization of simultaneous hardware threads and to maximize memory performance across multiple NUMA domains. The software achieves an order of magnitude speedup in terms of time-to-solution over a legacy simulator, and can handle systems that are almost two orders of magnitude larger in particle count. The software enables, for the first time ever, simulations of an entire RBC with a resolution down to single proteins, and opens up the possibility for conducting many in silico experiments concerning the RBC cytomechanics and related blood disorders [33]. 222The source code is available at https://github.com/yhtang/OpenRBC .
Author Contribution
YHT designed algorithm. YHT and LL implemented the software. YHT and CE carried out performance benchmark. HL developed the CGMD model. HL and YHT performed validation and verification. YHT, LL and HL wrote the manuscript. LG, CE, and VS provided algorithm consultation and technical support. GK supervised the work.
Acknowledgment
This work was supported by National Institutes of Health (NIH) grants U01HL114476 and U01HL116323 and partially by the Department of Energy (DOE) Collaboratory on Mathematics for Mesoscopic Modeling of Materials (CM4). YHT acknowledges partial financial support from an IBM Ph.D. Scholarship Award. Part of the simulations were carried out at the Oak Ridge Leadership Computing Facility through the Innovative and Novel Computational Impact on Theory and Experiment program at Oak Ridge National Laboratory under project BIP118.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] E. A. Evans. Bending resistance and chemically induced moments in membrane bilayers. Biophysical journal , 14(12):923, 1974.
- 2[2] F. Feng and W. S. Klug. Finite element modeling of lipid bilayer membranes. Journal of Computational Physics , 220(1):394–408, 2006.
- 3[3] T. R. Powers, G. Huber, and R. E. Goldstein. Fluid-membrane tethers: minimal surfaces and elastic boundary layers. Physical Review E , 65(4):041901, 2002.
- 4[4] W. Helfrich. Elastic properties of lipid bilayers: theory and possible experiments. Zeitschrift für Naturforschung C , 28(11-12):693–703, 1973.
- 5[5] S. E. Feller. Molecular dynamics simulations of lipid bilayers. Current opinion in colloid & interface science , 5(3):217–223, 2000.
- 6[6] L. Saiz, S. Bandyopadhyay, and M. L. Klein. Towards an understanding of complex biological membranes from atomistic molecular dynamics simulations. Bioscience reports , 22(2):151–173, 2002.
- 7[7] D. P. Tieleman, S.-J. Marrink, and H. J. Berendsen. A computer perspective of membranes: molecular dynamics studies of lipid bilayer systems. Biochimica et Biophysica Acta (BBA)-Reviews on Biomembranes , 1331(3):235–270, 1997.
- 8[8] K. Tu, M. L. Klein, and D. J. Tobias. Constant-pressure molecular dynamics investigation of cholesterol effects in a dipalmitoylphosphatidylcholine bilayer. Biophysical journal , 75(5):2147–2156, 1998.
