Optimizing metachronal paddling with reinforcement learning at low Reynolds number

Alana A. Bailey; Robert D. Guy

PMC · DOI:10.1140/epje/s10189-025-00511-5·August 8, 2025

Optimizing metachronal paddling with reinforcement learning at low Reynolds number

Alana A. Bailey, Robert D. Guy

PDF

Open Access

TL;DR

This paper uses reinforcement learning to study how organisms might optimize their swimming motion using metachronal paddling at low Reynolds numbers.

Contribution

The novel contribution is applying reinforcement learning to discover optimal limb coordination patterns in a zero Reynolds number swimmer model.

Findings

01

At tight paddle spacings, a back-to-front metachronal wave-like stroke emerges, resembling biological rhythms.

02

At wide spacings, different limb coordination patterns are selected by the reinforcement learning algorithm.

03

The most efficient stroke is consistently a back-to-front wave-like stroke, regardless of paddle count.

Abstract

Metachronal paddling is a swimming strategy in which an organism oscillates sets of adjacent limbs with a constant phase lag, propagating a metachronal wave through its limbs and propelling it forward. This limb coordination strategy is utilized by swimmers across a wide range of Reynolds numbers, which suggests that this metachronal rhythm was selected for its optimality of swimming performance. In this study, we apply reinforcement learning to a swimmer at zero Reynolds number and investigate whether the learning algorithm selects this metachronal rhythm, or if other coordination patterns emerge. We design the swimmer agent with an elongated body and pairs of straight, inflexible paddles placed along the body for various fixed paddle spacings. Based on paddle spacing, the swimmer agent learns qualitatively different coordination patterns. At tight spacings, a back-to-front metachronal…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Diseases1

stroke

Figures16

Click any figure to enlarge with its caption.

Different swimming gaits produced at paddle spacing 2 for different training parameters. The top row of strokes are front-to-back and are slightly faster than the bottom row of back-to-front strokes

The top figure shows plots of swimming speed vs. time for different epsilon-greedy policies with error bars depicting a 95% confidence interval. The bottom figure shows plots of swimming speed vs. time for different learning rates with error bars depicting a 95% confidence interval

For the two strokes pictured in Fig. [2](#Fig2), the top row shows the mean paddle positions (solid line) and the range of paddle motion (shaded region). The bottom row shows the angular displacement from the mean for all paddles over the entire stroke. a The left column depicts the back-to-front stroke at paddle spacing 1. We observe a large range of motion of both paddle sets with the means tilted slightly away from each other in the top figure, and the bottom shows $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage

Images of the paddle configuration at different time points for two different strokes for the three-paddle swimmer. a The paddler swims with the back-to-front stroke with paddles spaced 1 unit apart. b The paddler swims with the front-to-back stroke with paddles spaced 3.5 units apart

For the two strokes pictured in Fig. [4](#Fig4), the top row shows the mean paddle positions (solid line) and the range of paddle motion (shaded region). The bottom row shows the angular displacement from the mean for all paddles over the entire stroke. a The left column depicts the back-to-front stroke at paddle spacing 1. We observe a large range of motion of each paddle set with the means tilted slightly away from each other in the top figure, and the bottom shows the back paddles leading and the phase lag between sets of paddles. b The right column depicts the front-to-back stroke at paddl

Images of the paddle configuration at different time points for two different strokes for the four-paddle swimmer. a The paddler swims with the back-to-front stroke with paddles spaced 1 unit apart. b The paddler swims with the pair-wise front-to-back stroke with paddles spaced 3.25 units apart

Images of the paddle configuration at different time points for two different strokes for the two-paddle swimmer. a The paddler swims with the back-to-front stroke with paddles spaced 1 unit apart. b The paddler swims with the front-to-back stroke with paddles spaced 4 units apart

For the two strokes pictured in Figure [6](#Fig6), the top row shows the mean paddle positions (solid line) and the range of paddle motion (shaded region). The bottom row shows the angular displacement from the mean for all paddles over the entire stroke. a The left column depicts the back-to-front stroke at paddle spacing 1. We observe a large range of motion of each paddle set with the means tilted slightly away from center in the top figure, and the bottom shows the back paddles leading and the phase lag between sets of paddles. b The right column depicts the stroke with paddle’s pairing of

The top row of plots shows swimming speed vs. paddle spacing and the bottom row shows plots of stroke efficiency vs. paddle spacing. The blue circles indicate a front-to-back stroke, magenta squares indicate a back-to-front stroke, and green triangles indicate one of the other 3-paddle strokes

Histograms showing the frequency out of 100 trials of the stroke type found by the 2-paddle swimmer at paddle spacing 2 for various combinations of learning parameters. The numbers above each bar are the mean swimming speed for each stroke

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMicro and Nano Robotics · Biomimetic flight and propulsion mechanisms · Microfluidic and Bio-sensing Technologies

Full text

Introduction

At low Reynolds number, many commonly observed macroscopic swimming strategies are not effective due to the lack of inertial forces and highly viscous fluid environment. The time-reversibility of Stokes flow means that symmetric swimming strokes are not viable; the Purcell scallop theorem asserts that any reciprocal motion results in no net displacement of a swimmer [1]. Thus, microswimmers have adapted many time-irreversible deformations to swim effectively through viscous fluids, using appendages such as cilia and flagella [2].

Ciliated microswimmers, such as Paramecium and Volvox, coordinate the oscillating motion of their cilia such that successive power strokes of adjacent limbs create a time-asymmetric motion, allowing for a net displacement of the swimmer [3]. This technique is referred to as metachronal paddling, a propulsion strategy in which a metachronal wave propagates through a swimmer’s limbs and drives forward motion of the organism. The direction of the metachronal wave in a one-dimensional array of paddles is either symplectic or antiplectic, meaning that the wave propagation is parallel or antiparallel to the direction of a paddles’ power stroke [4]. Antiplectic metachrony is the more commonly observed form of metachrony, which starts the power stroke sequence with the limbs at the back of the organism and propagates a metachronal wave forward across the array of limbs. Antiplectic metachrony has been shown to generate more net fluid flow and minimize drag [5], however, symplectic metachrony is also observed in nature, for example, in the microswimmer Opalina [6]. Furthermore, this swimming strategy is not unique to microswimmers; metachronal paddling is observed in organisms across a wide range of Reynolds numbers, including Paramecium [2] (Re $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$< 1$$\end{document}$ ), ctenophores [3] (Re $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sim 10\, \textrm{to}\, 100$$\end{document}$ ), antarctic krill [7] (Re $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sim 1000$$\end{document}$ ), and mantis shrimp [8] (Re $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sim 10000$$\end{document}$ ), all of which utilize antiplectic metachrony.

The robustness of the stroke across viscous and inertial flow regimes poses a potential use in designing swimming robots for applications such as drug delivery on microscales [9, 10] or underwater target tracking at larger scales [11]. Within the viscous flow regime, studies have investigated the construction and actuation of artificial microswimmers, taking design inspiration from ciliated and flagellated microorganisms and using magnetism or light for actuation [9, 12]. Studies have also investigated optimal cilia beating patterns [13] and rigid paddle swimming strategies [14], both demonstrating high swimming efficiency with metachronal limb coordination. In the design of swimming robots, swimming efficiently is critical due to energy constraints [10], so prioritizing efficient limb coordinations is essential. Furthermore, it may not be feasible to implement many sets of appendages to swim, so sufficient work must be done by each paddle. Thus, for this study, we consider paddlers with as few as two sets of limbs and examine the optimal propulsion strategies.

We leverage a reinforcement learning approach to study the optimal limb coordination of a microswimmer with sets of rigid paddles placed along an elongated body. Our rigid paddles are unjointed and inflexible, so propulsion is driven solely by time-asymmetries in limb coordination. The simplicity of our model makes it well suited for reinforcement learning, as we can check the optimality of the solution selected by the learning algorithm against known propulsion strategies. Reinforcement learning has demonstrated success in various locomotion optimization problems in fluid mechanics [15], including the three-sphere swimmer at low Reynolds number [16], and the multi-link swimmer at low Reynolds number [17] and in potential flow [18]. The design specifications of these reinforcement learning agents incorporate few degrees of freedom to focus solely on the basic principles of low Reynolds number propulsion. These learning agents are placed in a fluid environment and are then able to discern optimal locomotion patterns through exploration and exploitation learning strategies [19]. Endowing swimming microrobots with the ability to self-learn to swim in challenging environments is a growing area of research in machine learning, as microrobots capable of adapting to their surroundings are largely beneficial in applications [20]. By approaching metachronal paddling at low Reynolds number in the framework of reinforcement learning, we aim to investigate optimal limb coordination patterns and compare them to the commonly observed swimming strokes found in nature.

Methods

Model

The model swimmer is two-dimensional with an elongated body 10 units long aligned in the horizontal direction with rounded semicircular ends of radius one. Sets of equally spaced paddles of length 3 are placed symmetrically along the top and bottom of the body. Paddles on the top and bottom beat symmetrically so that the body does not rotate and the displacement is only in the horizontal direction. The swimmer moves through a fluid at zero Reynolds number by coordinated motion among its paddles which is explored with reinforcement learning, described in the next section.

For a swimmer with n pairs of paddles, the configuration of the paddles is described by the angles $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta _{j}$$\end{document}$ for $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j=1\ldots n$$\end{document}$ , which is related to the angles from the horizontal $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\psi _{j}$$\end{document}$ of the $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j^{th}$$\end{document}$ pair of paddles by

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned}&\psi _{j}^{\text {bottom}} = -\frac{\pi }{2} + \theta _{j}, \end{aligned}$$\end{document}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned}&\psi _{j}^{\text {top}} = \phantom {-}\frac{\pi }{2} - \theta _{j}. \end{aligned}$$\end{document}

A value of $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta =0$$\end{document}$ corresponds to the paddle pair perpendicular to the body (see Fig. 1).Fig. 1a The model paddler has an elongated body of length 10 and pairs of straight, inflexible paddles of length 3 equally spaced along the body. b Each paddle can be in one of the above states. States $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-5$$\end{document}$ and 5 correspond to a maximum tilt of $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{\pi }{4}$$\end{document}$ from the center position 0

Reinforcement learning

The paddler agent learns to swim on its own by continuously updating its understanding of the fluid environment and the effects of its actions on the environment. To set up the reinforcement learning problem, we must define a state space, action space, and reward function.

The angle of each paddle set, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document}$ , is restricted to the interval $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\left[ -\pi /4, \pi /4 \right] $$\end{document}$ which is discretized into 11 equally spaced discrete angles of $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi /20$$\end{document}$ . For each paddle, the state is given by the integer $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s=-5,-4,\ldots ,5$$\end{document}$ corresponding to $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta =s\pi /20$$\end{document}$ (see Fig. 1). For a paddler agent with n limbs on each side, we define our states to be n-tuples representing the configurations of the paddles at a given learning step.

At each learning step, each paddle set can either move one state left, one state right, or not move at all. Hence, the actions are also n-tuples representing the paddles’ movement information, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a \in [-1,0,1],$$\end{document}$ with $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-1$$\end{document}$ corresponding to moving left, 1 moving right, and 0 not moving. The time of each action is one time unit so that the paddles move with constant angular velocity $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\dot{\psi }$$\end{document}$ of $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pm \pi /20$$\end{document}$ or 0.

Let $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_{0}$$\end{document}$ denote a reference point on the swimmer body. Then $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{\textrm{d}x_{0}}{\textrm{d}t} = u_{0}$$\end{document}$ is the swimming speed of the paddler. The reward function is defined for each state-action pair as the net displacement of the paddler,

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} r(s,a) = \int \limits _{0}^{1} u_{0} \; \textrm{d}t = x_{0}(1) - x_{0}(0), \end{aligned}$$\end{document}

with forward motion being a positive reward and backward motion being negative.

We use tabular Q-learning as our reinforcement learning algorithm due to its simplicity and low computational cost on low-dimensional problems. The estimated quality of each state-action pair, or Q-table, and initial state of the paddler are initialized randomly, then the paddler enters a training loop during which it explores the fluid environment using an $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon -$$\end{document}$ greedy strategy, taking mostly random actions initially but prioritizing optimal actions as the training progresses. The estimated quality of each state-action pair is updated at each learning step via the Bellman equation [19]:

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} Q(s_{n},a_{n})&= (1-\alpha )Q(s_{n},a_{n}) \nonumber \\&\quad + \alpha \left[ r_{n} + \gamma \max _{a_{n+1}} Q(s_{n+1},a_{n+1})\right] . \end{aligned}$$\end{document}

Fig. 2. Images of the paddle configuration at different time points for two different strokes for the two-paddle swimmer. a The paddler swims with the back-to-front stroke with paddles spaced 1 unit apart. b The paddler swims with the front-to-back stroke with paddles spaced 4 units apart

After the training finishes, we disable any further learning and let the paddler demonstrate the optimal stroke it found. We implement this method on paddlers with 2, 3, and 4 paddles at several different fixed spacings between paddles.

In order to find optimal strokes via Q-learning, we specify the learning parameters in the following way. The learning rate $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document}$ and the exploration rate $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon $$\end{document}$ are initially set to 1 and decay geometrically with each learning episode at a rate of 0.99. We specify a slow decay rate to allow for a large amount of exploration as the paddler begins interacting with the environment [21]. In the two-paddle case, the discount factor $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma $$\end{document}$ is set to 0.99 and the training loop is run for 50 episodes with 50,000 learning steps per episode. The 2-paddle swimmer explores a state space with a maximum of 968 states consisting of $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$11^{2} = 121$$\end{document}$ possible paddle configurations and 8 possible actions (there are 3 possible actions per paddle, and we require that at least one paddle moves at every step). Note that with tight paddle spacings, many state-action pairs are not available, so the state space is often smaller than the maximal case. In the three and four-paddle cases, we set $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma $$\end{document}$ to 0.999 and run the training loop for 500 episodes with 500,000 learning steps per episode. Note that each time we add a set of paddles, the number of possible paddle configurations grows by a factor of 11, and the number of possible actions scales as $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$3^{n}-1,$$\end{document}$ where n is the number of paddles. We specify the number of episodes and learning steps to ensure ample time for the paddler to explore the environment and converge to an optimal stroke [19]. These choices of parameters are explored in more detail in Sect. 3.5.

Fluid mechanics

The fluid motion and resulting translation of the body is determined by solving Stokes equations

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned}&\mu \varvec{\nabla }^{2}\varvec{u} - \varvec{\nabla }p + \varvec{F}= \varvec{0},\end{aligned}$$\end{document}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned}&\varvec{\nabla } \cdot \varvec{u} = 0, \end{aligned}$$\end{document}

where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{F}$$\end{document}$ is the force on the fluid from the swimmer. We set $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu = 1,$$\end{document}$ noting that $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu $$\end{document}$ scales the forces but does not affect the velocity for problems involving prescribed kinematics. We use the method of regularized Stokeslets [22], a numerical method based on a regularized Green’s function, to solve for the coupled fluid/body motion. The body and paddles are discretized into points equally spaced by 0.1. We use the regularization from [22] with regularization parameter $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon =0.05$$\end{document}$ for our simulations. The velocity $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{U}(\varvec{X}_{i})$$\end{document}$ at discrete point $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{X}_{i}$$\end{document}$ is related to the forces on all the other points by

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \varvec{U}(\varvec{X}_{i}) = \sum _{j} \mathcal {S}_{\epsilon }(\varvec{X}_{i},\varvec{X}_{j}) \varvec{F}_{j}, \end{aligned}$$\end{document}

where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {S}_{\epsilon }(\varvec{X}_{i},\varvec{X}_{j})$$\end{document}$ is the regularized Stokeslet tensor. We represent this equation as

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \mathcal {M}\varvec{F}= \varvec{U}, \end{aligned}$$\end{document}

where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{U}$$\end{document}$ and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{F}$$\end{document}$ represent the collection of velocities and forces, respectively, at all discrete points.

In a given time step, the state-action pair define the body configuration, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{X}$$\end{document}$ , and velocity of deformation, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\dot{\varvec{X}}$$\end{document}$ , in a fixed body frame over the unit time interval. Accounting for the motion of the fluid, the overall velocity of the swimmer is the sum of the known prescribed velocity $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\dot{\varvec{X}}=\varvec{U}_{P}$$\end{document}$ and the unknown translational velocity (swimming velocity) $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{U}_{0}$$\end{document}$ ; i.e.

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \varvec{U}= \varvec{U}_{P} + \varvec{U}_{0}. \end{aligned}$$\end{document}

The translational velocity is determined by the constraint that the net force be zero on the swimmer body. Putting together (8) and (9), with the net force constraint gives the system

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned}&\mathcal {M}\varvec{F}- \varvec{U}_{0} = \varvec{U}_{P} \end{aligned}$$\end{document}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned}&\sum _{j}\varvec{F}_{j} = \varvec{0}, \end{aligned}$$\end{document}

which is solved at each time instant to determine the forces on each point and the swimming velocity. Given the symmetry of paddling, the swimming velocity is in the x-direction and the instantaneous swimming speed is denoted by $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textrm{d}x_{0}/\textrm{d}t$$\end{document}$ . The reward function, defined by (3), is computed by integrating the instantaneous swimming speed over the time interval required to change paddle configuration using 3-point Gaussian quadrature.

Results

We implement the Q-learning algorithm on paddlers with 2, 3 and 4 sets of limbs, varying the spacing between paddles from 0.5 at the closest to 5 at the furthest, or otherwise, as far apart as the paddler’s body size allows. Depending on paddle spacing, the paddler agent selects different optimal gaits, which we characterize by the timing of the paddles’ power and return strokes and the mean position of the paddles. Since the paddler’s goal is to move to the right, we define a power stroke as the motion of sweeping through a decreasing sequence of states, or moving a paddle through an arc to the left, which results in net motion to the right. The return stroke is then defined as the motion of sweeping through an increasing sequence of states, or moving a paddle through an arc to the right. Metachronal paddling emerges as a wave of power strokes propagating through the limbs of the paddler.

Two paddles

We begin our learning simulations with the simplest paddler capable of swimming at low Reynolds number, a swimmer with two sets of paddles, or a two-paddle swimmer. Depending on the paddle spacing, two distinct strokes emerge: an antiplectic, tilted-out stroke for tight spacings, and a symplectic, tilted-in stroke for wide spacings (see supplemental videos to see the paddler swim with the two strokes.) In particular, the back-to-front stroke is the optimal strategy for paddle spacings $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$<2,$$\end{document}$ and the front-to-back stroke is selected for spacings $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ge 2,$$\end{document}$ (see Fig. 2 for a time sequence of the two strokes).Fig. 3. For the two strokes pictured in Fig. 2, the top row shows the mean paddle positions (solid line) and the range of paddle motion (shaded region). The bottom row shows the angular displacement from the mean for all paddles over the entire stroke. a The left column depicts the back-to-front stroke at paddle spacing 1. We observe a large range of motion of both paddle sets with the means tilted slightly away from each other in the top figure, and the bottom shows $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$25\%$$\end{document}$ phase lag with the back paddles leading (power strokes are shown as negatively sloped lines). b The right column depicts the front-to-back stroke at paddle spacing 4. The top figure shows the inward tilt the paddles maintain throughout the stroke, and the small range of motion used. The bottom figure shows $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$25\%$$\end{document}$ phase lag with the front paddles leading

Qualitatively, the two strokes appear starkly different. To characterize the traits that set these two strokes apart, we compute several stroke metrics, including stroke length, phase lag, and paddle amplitude. Stroke length is defined as the number of moves the paddler makes to complete a cycle, denoted N. For a pair of two adjacent paddle sets, we define the phase lag as the difference in the timing of the power strokes normalized by the stroke length, i.e., for paddle sets j and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j+1$$\end{document}$ where set j is left of $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j+1$$\end{document}$ ,

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \Delta \phi = \frac{T_{j} - T_{j+1}}{N}, \end{aligned}$$\end{document}

where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{j},T_{j+1}$$\end{document}$ are the times at which paddle sets j and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j+1$$\end{document}$ begin their power strokes, respectively. We report the phase lag in the interval $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$[-0.5,0.5]$$\end{document}$ to capture the shortest time between power strokes, and thus a negative phase lag indicates that the back paddles lead the power stroke sequence. The amplitude of a paddle set is simply the range of motion utilized during the stroke, keeping in mind that the amplitude will always be a multiple of $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi /20$$\end{document}$ due to our discretization.

Using these metrics to broadly compare the two strokes, the back-to-front stroke demonstrates a longer stroke length and larger paddle amplitudes, while the front-to-back stroke utilizes a much smaller range of motion and a shorter stroke length. Picking one of each stroke type for closer inspection, we investigate a back-to-front stroke at paddle spacing 1 and a front-to-back stroke at paddle spacing 4 (see Fig. 3 for a visualization of the strokes and Table 1 for a side-by-side comparison). For both of these strokes, the phase lag is $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$25\%,$$\end{document}$ but with opposite paddles leading the power strokes. Furthermore, the front-to-back stroke uses only $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$30\% $$\end{document}$ of its paddle range, while the back-to-front stroke uses 70–80%. In both stroke types, there is some symmetry within the individual paddle strokes; the leading paddles pause at the end of their power stroke, and the trailing paddles pause at the end of their return stroke.

Both strokes in the two-paddle case appear metachronal wave-like, with machine learning selecting either a front-to-back or back-to-front stroke depending on paddle spacing. However, the front-to-back stroke selected at wide spacings is significantly faster. At paddle spacing 4, we see the front-to-back stroke reaching a swimming speed of 0.0605, while the back-to-front stroke only achieves a speed of 0.0342.Table 1. Table comparing the two strokes that emerge from a 2-paddle swimmer at paddle spacings 1 and 4Stroke metricBack-to-frontFront-to-backPaddle spacing14Stroke length208Swimming speed0.03420.0605Phase lag $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-0.25$$\end{document}$ 0.25Range of back $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$7\pi /20$$\end{document}$ $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$3\pi /20$$\end{document}$ Range of front $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$8\pi /20$$\end{document}$ $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$3\pi /20$$\end{document}$

Fig. 4. Images of the paddle configuration at different time points for two different strokes for the three-paddle swimmer. a The paddler swims with the back-to-front stroke with paddles spaced 1 unit apart. b The paddler swims with the front-to-back stroke with paddles spaced 3.5 units apart

Three paddles

With our three-paddle swimmer, we observe similar stroke trends emerging as we vary the paddle spacing. For wider paddle spacings, the paddler performs front-to-back strokes with the outer paddles tilting in toward the middle, and with tighter spacings, a back-to-front stroke with outer paddles tilting outward emerges (see supplemental videos to see the paddler swim with the two strokes.) This back-to-front stroke is optimal for spacings $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\le 2,$$\end{document}$ and the front-to-back emerges for spacing between 2 and 4 (see Fig. 4 for a time series of the strokes). However, for spacings larger than 4, two other non-wave-like strokes emerge that are more challenging to characterize. We examine these other strokes in more detail in Appendix A.

Generically characterizing the two stroke types, the back-to-front tilted-out stroke resembles that of the two-paddle case. The back paddles lead the power stroke followed by the middle and front sets, generally with increasing phase lags across pairs of paddle sets. The front-to-back stroke performs its power strokes in the reverse paddle ordering. For this stroke, the middle paddle sweeps through a large range of motion while the outer paddles tilt inward toward the center and perform smaller movements.

Examining the paddle spacing 1 and 3.5 cases (as shown in Fig. 5 and Table 2), we see that for both stroke types, the stroke lengths are comparable, which is a change from the two-paddle case. Next, we compare phase lags across pairs of paddle sets in the order in which they power stroke, finding that the pairs of paddle sets have comparable phase lags and that the phase lag increases from leading to trailing paddles. In terms of paddle symmetry, we again observe in both cases that the leading paddle pauses after the power stroke and the last paddle pauses after the return stroke. As for the amplitude, the back-to-front stroke utilizes a larger range of motion than the front-to-back stroke with each of the paddle sets.

Again, both strokes that emerge are wave-like, with the back-to-front stroke having a more metachronal motion in the symmetry of the individual paddle strokes. Contrary to the two-paddle case, here the back-to-front stroke is the faster stroke, with a swimming speed of 0.0548 compared to 0.0440 with the front-to-back stroke.

Four paddles

With four sets of paddles, the paddler performs strokes that resemble the two-paddle case. For tight spacings, we see the familiar back-to-front tilted-out stroke, and with wide spacings, a paired-off front-to-back stroke emerges in which adjacent paddle sets tilt toward one another (see supplemental videos to see the paddler swim with the two strokes.) These paired-off paddle sets perform nearly identical movements with a restricted range of motion resembling the two-paddle case, while the back-to-front stroke continues the trend of large paddle amplitudes (see Fig. 6 for a time sequence of the strokes).Fig. 5. For the two strokes pictured in Fig. 4, the top row shows the mean paddle positions (solid line) and the range of paddle motion (shaded region). The bottom row shows the angular displacement from the mean for all paddles over the entire stroke. a The left column depicts the back-to-front stroke at paddle spacing 1. We observe a large range of motion of each paddle set with the means tilted slightly away from each other in the top figure, and the bottom shows the back paddles leading and the phase lag between sets of paddles. b The right column depicts the front-to-back stroke at paddle spacing 3.5. The top figure shows the inward tilt of the two outer paddles and the smaller range of motion used compared to the middle paddle. The bottom figure shows the front paddles leading and the phase lags across sets of paddles

Comparing the strokes in paddle spacings 1 and 3.25, (visualized in Fig. 7 and Table 3), we see that the stroke lengths return to being longer for the back-to-front stroke and shorter for our pair-wise front-to-back stroke. In the back-to-front stroke, the phase lag increases between pairs of paddles starting from the leading set, while the pair-wise front-to-back stroke has a constant phase lag of 0.22 between the paired-off paddle sets. Similar to before, we observe symmetry in the individual paddle motions. In the back-to-front stroke, the back paddle pauses after the power stroke for the same length of time as the front paddle pauses after the return stroke. The two middle paddles share the same symmetry; the pause length is the same for the back-middle paddle after the power stroke and the front-middle paddle after the return stroke. In the pair-wise front-to-back stroke, we see no pausing, but essentially mirrored versions of the same paddle strokes.

In this case, the back-to-front stroke is metachronal wave-like, however, the front-to-back stroke is overall not wave-like. Moreover, the front-to-back stroke returns to being the faster of the two strokes, with a speed of 0.0818 compared to 0.0680 from the back-to-front stroke.Table 2. Table comparing the two strokes that emerge from a 3-paddle paddler at paddle spacings 1 and 3.5. We compare the phase lag of pairs of paddles in the order in which they power stroke, so “Phase 1st/2nd” means back/middle in the back-to-front case and front/middle in the front-to-back caseStroke metricBack-to-frontFront-to-backPaddle spacing13.5Stroke length2119Swimming speed0.05480.0440Phase 1st/2nd $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-0.19$$\end{document}$ 0.16Phase 2nd/3rd $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-0.33$$\end{document}$ 0.32Range of back $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$7 \pi /20$$\end{document}$ $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$5 \pi /20$$\end{document}$ Range of mid $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$10 \pi /20$$\end{document}$ $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$8 \pi /20$$\end{document}$ Range of front $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$6 \pi /20$$\end{document}$ $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$5 \pi /20$$\end{document}$

Fig. 6. Images of the paddle configuration at different time points for two different strokes for the four-paddle swimmer. a The paddler swims with the back-to-front stroke with paddles spaced 1 unit apart. b The paddler swims with the pair-wise front-to-back stroke with paddles spaced 3.25 units apart

Stroke performance

We now evaluate the performance of the machine-learned strokes across the full spectrum of paddle spacings, 0.5 at the smallest and 5 at the largest, or otherwise, as far apart as the paddler’s body allows. Our performance metrics are the swimming speed and the swimming efficiency, and we compare these metrics across paddle spacings and stroke types (see Fig. 8). We run the Q-learning algorithm five times for each paddle spacing with learning parameters that allow for slight stroke variations and plot the resulting swimming speeds and efficiencies from the strokes that emerge.

The swimming speed of a stroke is given by the net displacement of the paddler, or the total reward collected by the paddler agent, normalized by the stroke length,

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} U = \frac{1}{N} \sum _{n=1}^{N}r(s_{n},a_{n}). \end{aligned}$$\end{document}

We quantify the swimming efficiency, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\eta $$\end{document}$ , by the ratio of the power required to tow the swimmer at the steady swimming speed to the average power used by the paddler over a period:

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \eta = \frac{\zeta U^{2}}{P}, \quad P = \frac{1}{N}\int \limits _{0}^{N} \varvec{F}(t) \cdot \varvec{U}(t) \; \textrm{d}t, \end{aligned}$$\end{document}

where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta $$\end{document}$ is the drag coefficient of the paddler and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{F}(t)$$\end{document}$ and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{U}(t)$$\end{document}$ are the hydrodynamic forces and velocities computed for each move in the sequence. This efficiency metric was originally introduced by Lighthill [23] and commonly used for zero Reynolds number swimming. More information on the calculation of the drag coefficients is given in Appendix B.Fig. 7. For the two strokes pictured in Figure 6, the top row shows the mean paddle positions (solid line) and the range of paddle motion (shaded region). The bottom row shows the angular displacement from the mean for all paddles over the entire stroke. a The left column depicts the back-to-front stroke at paddle spacing 1. We observe a large range of motion of each paddle set with the means tilted slightly away from center in the top figure, and the bottom shows the back paddles leading and the phase lag between sets of paddles. b The right column depicts the stroke with paddle’s pairing off at paddle spacing 3.25. The top figure shows the inward tilt the paddle pairs maintain throughout the stroke and the small range of motion used. The bottom figure shows the $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$22\%$$\end{document}$ phase lag between the pairs of tilted paddles, with the front paddles leadingTable 3Table comparing the two strokes that emerge from a 4-paddle paddler at spacings 1 and 3.25. We again compare phase lag across pairs of paddles ordered by their power stroke sequenceStroke metricBack-to-frontFront-to-backPaddle spacing13.25Stroke length229Swimming speed0.06800.0818Phase 1st/2nd $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-0.14$$\end{document}$ 0.22Phase 2nd/3rd $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-0.23$$\end{document}$ -0.22Phase 3rd/4th $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-0.27$$\end{document}$ 0.22Range of 1st $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$6 \pi /20$$\end{document}$ $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$4 \pi /20$$\end{document}$ Range of 2nd $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$9 \pi /20$$\end{document}$ $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$4 \pi /20$$\end{document}$ Range of 3rd $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$9 \pi /20$$\end{document}$ $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$4 \pi /20$$\end{document}$ Range of 4th $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$6 \pi /20$$\end{document}$ $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$4 \pi /20$$\end{document}$

Comparing across the three cases, we see a general trend of the maximum swimming speed increasing as we add more paddles. However, the strokes that achieve the highest speeds vary with the number of paddles. With only two paddles, the front-to-back tilted-in stroke at paddle spacing 3.75 gives the fastest swimming, beating the fastest back-to-front stroke by a factor of $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sim 1.59$$\end{document}$ . We see the speed increasing with the back-to-front stroke as the space between the paddles decreases, and on the contrary, we see the speed decreasing with the front-to-back stroke as the space between the paddles grows larger than 3.75.

Around paddle spacing 2, we see a transition in stroke type. We investigate both stroke types in detail to determine what is driving this switch, with details provided in Appendix C. The back-to-front strokes selected at tight spacings are nearly identical, with only slight variation in phase lag and mean angle of the paddles. As we space the paddles farther apart, this stroke becomes less hydrodynamically effective and results in swimming speeds lower than the front-to-back stroke. With the front-to-back stroke, the paddles tilt toward each other to operate near a point of collision; as the space between paddles widens, the mean position of the paddles becomes more tilted inward, and for spacings too narrow, this front-to-back stroke is not viable.Fig. 8. The top row of plots shows swimming speed vs. paddle spacing and the bottom row shows plots of stroke efficiency vs. paddle spacing. The blue circles indicate a front-to-back stroke, magenta squares indicate a back-to-front stroke, and green triangles indicate one of the other 3-paddle strokes

In the three-paddle case, the fastest stroke is the back-to-front tilted-out stroke at the tightest paddle spacing 0.5. While the front-to-back stroke is dominant for paddle spacing greater than 2, it is never faster than the back-to-front stroke with three paddles. Finally, in the case of four paddles, the front-to-back tilted-in stroke returns to being the fastest stroke at paddle spacing 3.25. However, this stroke is only faster than the back-to-front stroke at spacing 0.5 by a factor of 1.02. The maximum overall swimming speed is achieved by the 4-paddle swimmer performing the front-to-back stroke at paddle spacing 3.25.

Now comparing stroke efficiency, we see a common trend across the three cases that the back-to-front stroke is generally more efficient than the front-to-back stroke. This is especially true in the two and four-paddle cases, where the maximum front-to-back efficiency is comparable to the minimum back-to-front efficiency. In the three-paddle case, there is an efficiency regime where both stokes perform similarly. The maximum overall efficiency is achieved by the 3-paddle swimmer performing the back-to-front stroke at paddle spacing 0.75.

Recall that the goal of the paddler agent is to optimize swimming speed, so we see little variation in the swimming speeds of the five runs. On the other hand, stroke efficiency is not explicitly being considered by the paddler agent, so there is more variation in the efficiency values.Fig. 9. Histograms showing the frequency out of 100 trials of the stroke type found by the 2-paddle swimmer at paddle spacing 2 for various combinations of learning parameters. The numbers above each bar are the mean swimming speed for each stroke

Reinforcement learning parameters

We explore how the learning parameters affect the robustness of the optimal gate produced by the Q-learning algorithm. We first explore the effects of the discount factor and the training loop length on our 2-paddle swimmer with paddle spacing 2, the spacing at which we observe the switch from a back-to-front to a front-to-back optimal stroke. Then, we examine the effects of the exploration rate, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon $$\end{document}$ and learning rate, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document}$ , on our 3-paddle swimmer with paddle spacing 2. The 3-paddle swimmer exhibited the most variation in resulting performance and required careful selection of learning parameters.

Training loop parameters

First, we examine the effect of varying the length of the training loop. We run the learning algorithm 100 times with $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma = 0.99$$\end{document}$ and classify the type of stroke for the following combinations of training loop parameters: 10 episodes with 5,000 learning steps, 10 episodes with 10,000 learning steps, and 50 episodes with 50,000 learning steps.

The first row of Fig. 9 shows how the training loop parameters affect the type of stroke that emerges. With the longest training loop, the paddler converges to a front-to-back stroke every time, but by reducing the number of episodes and learning steps the paddler can learn both front-to-back and back-to-front strokes depending on the randomness of the algorithm (see Fig. 10 for some example strokes).Fig. 10. Different swimming gaits produced at paddle spacing 2 for different training parameters. The top row of strokes are front-to-back and are slightly faster than the bottom row of back-to-front strokes

With $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma $$\end{document}$ reduced to 0.98 (see second row of Fig. 9), we find that the longest training loop resulted in the paddler converging to a suboptimal back-to-front stroke every time. For shorter training loops we again obtain both stroke types, but we also find that for some initializations, the paddler does not learn an effective propulsion strategy. We classify a propulsion strategy as ineffective if it results in a swimming of less than 0.01. For the shortest training loop with $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma =0.98$$\end{document}$ , a majority of the runs resulted in ineffective swimming. We repeat this process once more for $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma =0.97$$\end{document}$ and observe that for all training loops, the paddler can only learn propulsion strategies that we deem ineffective (see row three of Fig. 9).

The transition between the two strokes types occurs around paddle spacing two for all numbers of limbs. At this spacing both stroke types emerged depending on the choice of learning parameters. Other paddle spacings exhibited small variations in the resulting strokes, but did not produce both the front-to-back and back-to-front strokes. For example, at paddle spacing 1, there is slight variation in the length of time that the paddles are not moving within a back-to-front stroke, but the paddler does not learn a front-to-back stroke for any known parameter combination. Similar stroke variation trends are observed for the 3 and 4-paddle cases, but due to the larger state space, it becomes computationally infeasible to perform large numbers of trials. Based on this parameter exploration study, we chose to set the discount factor to values close to 1 to ensure sufficient farsightedness of our swimmer, namely $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma = 0.99$$\end{document}$ for the two-paddle swimmer and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma = 0.999$$\end{document}$ for the three and four-paddle swimmers. These values of $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma $$\end{document}$ are consistent with discount factors used in previous studies on reinforcement learning applied to swimming problems [17, 18, 24].Fig. 11. The top figure shows plots of swimming speed vs. time for different epsilon-greedy policies with error bars depicting a 95% confidence interval. The bottom figure shows plots of swimming speed vs. time for different learning rates with error bars depicting a 95% confidence interval

Exploration and learning rate

Next, we experiment with our epsilon-greedy policy and learning rate with our 3-paddle swimmer at paddle spacing 2, the optimization landscape that we found most challenging in our simulations. Both the exploration rate and the learning rate decay geometrically in time. It is known that the learning rate $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document}$ should decay in time for convergence [19], and similarly, decaying the exploration rate $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon $$\end{document}$ provides better results for stroke optimality [21].

First, we focus on the epsilon-greedy policy and fix the learning rate strategy. The learning rate is initialized at $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha = 1$$\end{document}$ and decays geometrically with rate 0.99 per episode (i.e. reduced by 1% per episode). We test the effectiveness of geometrically decaying epsilon-greedy policies against policies with fixed exploration rates. The policies we test include two geometric decay strategies, with rates 0.75 and 0.99, and three fixed epsilon strategies with $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon = 0.1,$$\end{document}$ 0.5, and 1. We run a learning simulation with 300 episodes, 300,000 learning steps each, and discount factor $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma = 0.999.$$\end{document}$ This simulation is run ten times for each epsilon-greedy policy and the swimming speeds from the resulting strokes are displayed and compared in Fig. 11.

We find that policies with decaying exploration rates result in higher swimming speeds that improve over episodes, and the slower decay rate yields the best performance. With fixed exploration rates, the swimming speed does not increase with the number of episodes, and the resulting speed is below those obtained with decaying exploration rates.

We next explore strategies to update the learning rate, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document}$ , over the course of a simulation. We fix the epsilon-greedy policy to geometric decay with rate 0.99, which we found to be optimal in the previous study, and we compare different strategies for the learning rate. The results for different learning rates are similar to those for different exploration rates. We find that the slowest decay rate 0.99 results in the best overall performance, but only marginally over a rate of 0.75. Both decay strategies result in swimming speeds greater than 0.04, while fixed learning rates show no improvement in swimming speed over time.

Discussion

Through a reinforcement learning approach, we identified propulsion strategies for a paddler with 2, 3 and 4 sets of paddles for swimming at low Reynolds number. For microswimmers with arrays of appendages, effective propulsion mechanisms often take the form of metachronal paddling, with the limbs oscillating in a time-delayed sequence to propel the swimmer forward [3]. With paddles placed tightly spaced along the body, our 2, 3 and 4-paddle swimmers all self-learned a stroke that resembles an antiplectic wave-like stroke. Of the strokes identified by machine learning, this back-to-front stroke was the most efficient for the 2,3, and 4-paddle swimmers across all spacings, however, it was not always the fastest.

In the 2 and 4-paddle cases our paddler self-learned a front-to-back stroke in which pairs of paddles tilt in toward each other. Given the inward tilt, this stroke is only practical with sufficient space between paddles, and consequently, the reinforcement learning algorithm converged to this stroke only for paddle spacings 2 or larger. While this is a less efficient stroke, it is often a faster stroke, particularly in the two-paddle case.

A natural question that arises from these results is whether the same antiplectic wave-like stroke would emerge if we instead optimize for swimming efficiency. However, optimizing swimming efficiency is not as simple as replacing the reward metric with the efficiency of each state-action pair [25]. Two approaches to optimizing the swimming efficiency of a stroke include modifying the action space to include longer sequences of actions [25] and modifying the reward function to positively weight swimming speed while negatively weighing work [26]. These approaches could be adapted to this problem but introduce additional parameters and complexity beyond the scope of the current study.

Any reciprocal motion results in no net displacement at low Reynolds number, so drag-based propulsion strategies rely on asymmetric motion in the form of time or space asymmetries. Our paddling study concentrates on propulsion mechanisms that are strictly driven by the beating rhythm of the paddles. We designed our swimmer with rigid paddles so that it cannot create asymmetries by changing the shape of its limbs mid-stroke. By allowing only one degree of freedom per paddle in the timing of its movements, we isolate the rhythm-based aspect of low Reynolds number metachronal paddling.

Our reinforcement learning results exhibit the types of asymmetries that can drive forward motion in ciliated microswimmers. The antiplectic wave-like strokes demonstrate variation in the phase lag between paddle sets in both the power and return stroke timing. The strokes with pairs of tilted paddles illustrate another effective type of asymmetry in the mean angle, which has not been observed before, to our knowledge.

The antiplectic metachronal stroke commonly observed in nature emerged as the most efficient stroke and the fastest for tightly spaced limbs. It is known that in antiplectic metachrony, the swimming efficiency increases with number of limbs [27]. Furthermore, microswimmers in nature deform their cilia to maximize fluid interaction during the power stroke and minimize drag during the return stroke [28]. Investigating these more complex motions stemming from paddles with bending capabilities, or simply more sets of paddles, will require more sophisticated reinforcement learning methods to handle the rapidly growing computational cost.

With our current discretization, the size of the Q-table scales as $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$33^{n},$$\end{document}$ where n is the number of limbs. Consequently, with even just six sets of limbs, the Q-table has $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1.3 \times 10^{9}$$\end{document}$ entries. Similarly, if we add hinges to our paddles to capture bending mechanics, we are limited in the same way by the size of the Q-table. A natural extension to the learning algorithm for large state spaces is deep Q-learning, which replaces the Q-table with a neural network [29]. Deep Q-learning has demonstrated success with complex swimming tasks, such as learning optimal gaits for navigation with a jellyfish-like swimmer [30] and learning efficient collective swimming strategies for undulatory swimmers [31], but is limited to discrete action spaces. Actor-critic methods are more sophisticated reinforcement learning algorithms that allow for continuous action spaces and more precise movements of a learning agent. Actor-critic methods are frequently used in control problems for biological systems [32], and they have been applied in problems of low Reynolds number locomotion involving navigation [24, 26, 33].

Supplementary Information

Below is the link to the electronic supplementary material.Supplementary file 1 (mp4 471 KB)Supplementary file 2 (mp4 899 KB)Supplementary file 3 (mp4 331 KB)Supplementary file 4 (mp4 612 KB)Supplementary file 5 (mp4 807 KB)

Bibliography8

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1J. Elgeti, R.G. Winkler, G. Gompper, Physics of microswimmers—single particle motion and collective behavior: a review. Rep. Prog. Phys. 78(5), 056601 (2015)10.1088/0034-4885/78/5/05660125919479 · doi ↗ · pubmed ↗
2P. Garnier, J. Viquerat, J. Rabault, A. Larcher, A. Kuhnle, E. Hachem, A review on deep reinforcement learning for fluid mechanics. Comput. Fluids 225, 104973 (2021)
3I. Jebellat, E. Jebellat, A. Amiri-Margavi, A. Vahidi-Moghaddam, H.N. Pishkenari, A reinforcement learning approach to find optimal propulsion strategy for microrobots swimming at low reynolds number. Robot. Auton. Syst. 175, 104659 (2024)
4J. Zhang, L. Zhou, B. Cao, Learning swimming via deep reinforcement learning. ar Xiv preprint ar Xiv:2209.10935 (2022)
5Y. Lai, S. Heydari, O.S. Pak, Y. Man, Navigation of a three-link microswimmer via deep reinforcement learning. ar Xiv preprint ar Xiv:2506.00084 (2025)
6Y. Li, Deep reinforcement learning: an overview. ar Xiv preprint ar Xiv:1701.07274 (2017)
7Y. Chen, Y. Yang, Deep reinforcement learning for tracking a moving target in jellyfish-like swimming. ar Xiv preprint ar Xiv:2409.08815 (2024)
8Y. Jiao, F. Ling, S. Heydari, N. Heess, J. Merel, E. Kanso, Deep dive into model-free reinforcement learning for biological and robotic systems: theory and practice. ar Xiv preprint ar Xiv:2405.11457 (2024)