Model reconstruction from temporal data for coupled oscillator networks

Mark J Panaggio; Maria-Veronica Ciocanel; Lauren Lazarus; Chad M; Topaz; Bin Xu

arXiv:1905.01408·nlin.AO·July 25, 2022

Model reconstruction from temporal data for coupled oscillator networks

Mark J Panaggio, Maria-Veronica Ciocanel, Lauren Lazarus, Chad M, Topaz, Bin Xu

PDF

TL;DR

This paper explores how to reconstruct the underlying interaction network of coupled oscillators from observed transient dynamics using machine learning, enabling simultaneous identification of network structure and oscillator parameters.

Contribution

It introduces a method to infer network topology and oscillator parameters from transient data, advancing understanding of complex coupled systems.

Findings

01

Network topology can be reconstructed from transient dynamics.

02

Machine learning effectively identifies oscillator parameters.

03

Method works for arbitrary coupled oscillator networks.

Abstract

In a complex system, the interactions between individual agents often lead to emergent collective behavior like spontaneous synchronization, swarming, and pattern formation. The topology of the network of interactions can have a dramatic influence over those dynamics. In many studies, researchers start with a specific model for both the intrinsic dynamics of each agent and the interaction network, and attempt to learn about the dynamics that can be observed in the model. Here we consider the inverse problem: given the dynamics of a system, can one learn about the underlying network? We investigate arbitrary networks of coupled phase-oscillators whose dynamics are characterized by synchronization. We demonstrate that, given sufficient observational data on the transient evolution of each oscillator, one can use machine learning methods to reconstruct the interaction network and…

Tables17

Table 1. Table 1: Summary of network model parameters, with the default values and value ranges considered in the parameter sweep experiments of section 3 .

Parameter	Description	Value	Sweep values
$N$	number of oscillators	10	${5, 10, 20, 40}$
$μ$	average intrinsic frequency	1.0	N/A
$σ$	standard deviation of intrinsic frequencies	0.5	${0.01, 0.1, 1.0}$
$p$	network connection probability	0.5	${0.1, 0.2, \dots, 0.9}$
$Γ (x)$	coupling function	$\sin (x)$	See A
dyn_noise	noise in system dynamics	0	${0, 10^{- 5}, 10^{- 4}, \dots, 10^{0}}$

Table 2. Table 2: Summary of algorithm parameters, with the default values and value ranges considered in the parameter sweep experiments of section 3 .

Parameter	Description	Value	Sweep values
$t_{m a x}$	duration of each transient	20	${2, 5, 10, 20, 50}$
$Δ t$	sampling time step	0.1	N/A
noise	observation noise	0	${0, 10^{- 5}, 10^{- 4}, \dots, 10^{0}}$
$N_{r e s}$	number of transients observed	10	${1, 2, 5, 10, 20, 40}$

Table 3. Table 3: Default values and summary of method parameters for the results in section 4 .

Parameter	Description	Value
n_epochs	iterations through the training data	300
batch size	time-steps of data per gradient descent batch	100
$m$	number of inferred Fourier coefficients	5
$λ_{Γ}$	weight of penalty on non-sparse coupling	0.0001
$λ_{A}$	weight of penalty on network connectivity	$10^{- 6}$
$λ_{b d}$	weight of penalty on $A_{k j} \notin [0, 1]$	$10^{5}$

Table 4. Table 4: Coupling functions

Name	Function	Reference
Kuramoto	$\sin x$	[10]
Kuramoto-Sakaguchi	$\sin (x - 0.1)$	[31]
Hodgkin-Huxley	$\begin{matrix} 0.383 & + 1.379 \sin (x + 3.93) \\ + 0.568 \sin (2 x + 0.11) + 0.154 \sin (3 x + 2.387) \end{matrix}$	[32]
Square Wave	$\frac{\sin (x - π / 4)}{\| \sin (x - π / 4) \|}$	N/A

Table 5. Table 5: Sweep through the coupling function Γ Γ \Gamma

Param

Value

Normalized

difference in area

Mean absolute

deviation

Error rate

Area under

ROC curve

Interval

width

(> 90 %)

Γ

Kuramoto

0.0175

\pm

0.0075

0.004

\pm

0.001

0.0

\pm

0.0

1.0

\pm

0.0

0.4338

Γ

Kuramoto-Sakaguchi

0.0169

\pm

0.0054

0.005

\pm

0.002

0.0

\pm

0.0

1.0

\pm

0.0

0.3924

Γ

Hodgkin-Huxley

0.0343

\pm

0.0159

0.011

\pm

0.004

0.0

\pm

0.0

1.0

\pm

0.0

0.7054

Γ

Square wave

0.1518

\pm

0.0044

0.006

\pm

0.002

0.0

\pm

0.0

1.0

\pm

0.0

0.4059

Table 6. Table 6: Sweep of the number of oscillators N 𝑁 N

Param

Value

Normalized

difference in area

Mean absolute

deviation

Error rate

Area under

ROC curve

Interval

width

(> 90 %)

N

5

0.026

\pm

0.0125

0.0052

\pm

0.003

0.0

\pm

0.0

1.0

\pm

0.0

0.1193

N

10

0.0164

\pm

0.007

0.0047

\pm

0.0031

0.0

\pm

0.0

1.0

\pm

0.0

0.4057

N

20

0.015

\pm

0.0057

0.0046

\pm

0.0016

0.140

\pm

0.601

0.9997

\pm

0.0016

0.3041

N

40

0.0189

\pm

0.0104

0.0045

\pm

0.0011

1.231

\pm

1.018

0.9946

\pm

0.0053

0.223

Table 7. Table 7: Sweep of the standard deviation of the oscillator frequencies σ 𝜎 \sigma

Param

Value

Normalized

difference in area

Mean absolute

deviation

Error rate

Area under

ROC curve

Interval

width

(> 90 %)

σ

0.01

0.0198

\pm

0.0052

0.0022

\pm

0.0008

0.0

\pm

0.0

1.0

\pm

0.0

0.8142

σ

0.1

0.0175

\pm

0.0057

0.004

\pm

0.0014

0.0

\pm

0.0

1.0

\pm

0.0

0.8095

σ

1

0.0161

\pm

0.0049

0.0042

\pm

0.0021

0.0

\pm

0.0

1.0

\pm

0.0

0.7975

Table 8. Table 8: Sweep of the level of observation noise

Param

Value

Normalized

difference in area

Mean absolute

deviation

Error rate

Area under

ROC curve

Interval

width

(> 90 %)

noise

0

0.0193

\pm

0.0098

0.0048

\pm

0.002

0.0

\pm

0.0

1.0

\pm

0.0

0.4204

noise

1e-05

0.0202

\pm

0.0108

0.0053

\pm

0.0024

0.0

\pm

0.0

1.0

\pm

0.0

0.3845

noise

0.0001

0.018

\pm

0.0076

0.0048

\pm

0.0023

0.0

\pm

0.0

1.0

\pm

0.0

0.4365

noise

0.001

0.0186

\pm

0.0081

0.0053

\pm

0.002

0.0

\pm

0.0

1.0

\pm

0.0

0.3956

noise

0.01

0.0186

\pm

0.0079

0.0052

\pm

0.0025

0.0

\pm

0.0

1.0

\pm

0.0

0.4135

noise

0.1

0.0385

\pm

0.0138

0.0086

\pm

0.0021

0.0

\pm

0.0

1.0

\pm

0.0

0.3544

noise

1

0.831

\pm

0.1343

0.118

\pm

0.0629

46.296

\pm

8.579

0.5122

\pm

0.0741

0.0472

Table 9. Table 9: Sweep of the level of noise in system dynamics

Param

Value

Normalized

difference in area

Mean absolute

deviation

Error rate

Area under

ROC curve

Interval

width

(> 90 %)

dyn_noise

0

0.0211

\pm

0.0067

0.0055

\pm

0.0018

0.0

\pm

0.0

1.0

\pm

0.0

0.3814

dyn_noise

1e-05

0.0117

\pm

0.0069

0.0032

\pm

0.0023

0.0

\pm

0.0

1.0

\pm

0.0

0.4048

dyn_noise

0.0001

0.0146

\pm

0.0049

0.0032

\pm

0.0013

0.0

\pm

0.0

1.0

\pm

0.0

0.411

dyn_noise

0.001

0.0241

\pm

0.0056

0.0049

\pm

0.0017

0.0

\pm

0.0

1.0

\pm

0.0

0.4517

dyn_noise

0.01

0.0633

\pm

0.0189

0.0128

\pm

0.0034

0.370

\pm

1.025

0.9982

\pm

0.0072

0.3282

dyn_noise

0.1

0.2136

\pm

0.065

0.0312

\pm

0.0055

14.148

\pm

5.003

0.8903

\pm

0.0496

0.2224

dyn_noise

1

0.9716

\pm

0.0399

0.1552

\pm

0.0971

42.444

\pm

8.953

0.5177

\pm

0.1004

0.0406

Table 10. Table 10: Sweep of the network connectivity parameter p 𝑝 p

Param

Value

Normalized

difference in area

Mean absolute

deviation

Error rate

Area under

ROC curve

Interval

width

(> 90 %)

p

0.1

0.1199

\pm

0.2314

0.0054

\pm

0.0047

3.111

\pm

14.97

0.9688

\pm

0.1548

0.3387

p

0.2

0.023

\pm

0.0098

0.004

\pm

0.0022

0.074

\pm

0.41

0.9999

\pm

0.0006

0.5222

p

0.3

0.0186

\pm

0.0073

0.0047

\pm

0.0025

0.0

\pm

0.0

1.0

\pm

0.0

0.5175

p

0.4

0.0207

\pm

0.0089

0.0047

\pm

0.0022

0.0

\pm

0.0

1.0

\pm

0.0

0.5167

p

0.5

0.0189

\pm

0.0097

0.0051

\pm

0.0024

0.0

\pm

0.0

1.0

\pm

0.0

0.4669

p

0.6

0.0183

\pm

0.0072

0.006

\pm

0.0019

0.0

\pm

0.0

1.0

\pm

0.0

0.349

p

0.7

0.0169

\pm

0.0071

0.005

\pm

0.0025

0.0

\pm

0.0

1.0

\pm

0.0

0.3227

p

0.8

0.0194

\pm

0.0077

0.0057

\pm

0.0026

0.0

\pm

0.0

1.0

\pm

0.0

0.2725

p

0.9

0.0289

\pm

0.0271

0.0077

\pm

0.0056

0.0

\pm

0.0

1.0

\pm

0.0

0.2739

Table 11. Table 11: Sweep of the simulation time t max subscript 𝑡 t_{\max} (duration of each transient)

Param

Value

Normalized

difference in area

Mean absolute

deviation

Error rate

Area under

ROC curve

Interval

width

(> 90 %)

t_{\max}

2

0.0204

\pm

0.011

0.0042

\pm

0.002

0.0

\pm

0.0

1.0

\pm

0.0

0.8225

t_{\max}

5

0.0185

\pm

0.0064

0.0047

\pm

0.0022

0.0

\pm

0.0

1.0

\pm

0.0

0.8186

t_{\max}

10

0.0193

\pm

0.0066

0.004

\pm

0.0013

0.0

\pm

0.0

1.0

\pm

0.0

0.8175

t_{\max}

20

0.0192

\pm

0.0053

0.0041

\pm

0.0013

0.0

\pm

0.0

1.0

\pm

0.0

0.565

t_{\max}

50

0.0191

\pm

0.0093

0.0064

\pm

0.0038

0.0

\pm

0.0

1.0

\pm

0.0

0.2964

Table 12. Table 12: Sweep through the number of transients observed for a simulation that re-initializes all oscillator phases from 𝒰 [ 0 , 2 π ] 𝒰 0 2 𝜋 \mathcal{U}[0,2\pi] .

Param

Value

Normalized

difference in area

Mean absolute

deviation

Error rate

Area under

ROC curve

Interval

width

(> 90 %)

N_{r ​ e ​ s}

1

0.0513

\pm

0.0266

0.0237

\pm

0.0127

3.111

\pm

3.531

0.9815

\pm

0.027

0.447

N_{r ​ e ​ s}

2

0.0343

\pm

0.0222

0.0082

\pm

0.007

0.148

\pm

0.564

0.9986

\pm

0.0067

0.6943

N_{r ​ e ​ s}

5

0.019

\pm

0.0094

0.0039

\pm

0.002

0.0

\pm

0.0

1.0

\pm

0.0

0.806

N_{r ​ e ​ s}

10

0.0201

\pm

0.0084

0.0042

\pm

0.0028

0.0

\pm

0.0

1.0

\pm

0.0

0.808

N_{r ​ e ​ s}

20

0.018

\pm

0.0033

0.0042

\pm

0.0017

0.0

\pm

0.0

1.0

\pm

0.0

0.8295

N_{r ​ e ​ s}

40

0.0192

\pm

0.0055

0.0041

\pm

0.0016

0.0

\pm

0.0

1.0

\pm

0.0

0.8178

Table 13. Table 13: Sweep through the number of transients observed for a simulation that selects a fixed oscillator and adds a random Gaussian perturbation to its phase with standard deviation 0.01 0.01 0.01 .

Param

Value

Normalized

difference in area

Mean absolute

deviation

Error rate

Area under

ROC curve

Interval

width

(> 90 %)

N_{p ​ e ​ r ​ t}

1

0.9318

\pm

0.082

0.118

\pm

0.0719

48.667

\pm

7.459

0.2934

\pm

0.0999

0.0252

N_{p ​ e ​ r ​ t}

2

0.9054

\pm

0.1176

0.3284

\pm

0.0747

49.778

\pm

6.920

0.2904

\pm

0.0783

0.0306

N_{p ​ e ​ r ​ t}

5

0.7928

\pm

0.1568

0.3853

\pm

0.1243

51.852

\pm

8.702

0.3281

\pm

0.0895

0.0688

N_{p ​ e ​ r ​ t}

10

0.7828

\pm

0.1698

0.2566

\pm

0.1579

50.593

\pm

5.110

0.3843

\pm

0.0697

0.1403

N_{p ​ e ​ r ​ t}

20

0.9818

\pm

0.0083

0.0024

\pm

0.0011

43.630

\pm

9.383

0.5153

\pm

0.0959

0.1026

N_{p ​ e ​ r ​ t}

40

0.931

\pm

0.0234

0.0019

\pm

0.0007

47.556

\pm

9.403

0.4909

\pm

0.0935

0.0931

Table 14. Table 14: Sweep through the number of transients observed for a simulation that selects a random oscillator and adds a random Gaussian perturbation to its phase with standard deviation 10 10 10 .

Param

Value

Normalized

difference in area

Mean absolute

deviation

Error rate

Area under

ROC curve

Interval

width

(> 90 %)

N_{p ​ e ​ r ​ t}

1

0.9193

\pm

0.1111

0.1161

\pm

0.0643

49.778

\pm

7.042

0.3189

\pm

0.1051

0.0298

N_{p ​ e ​ r ​ t}

2

0.7919

\pm

0.2112

0.343

\pm

0.0724

49.852

\pm

6.331

0.279

\pm

0.0949

0.0409

N_{p ​ e ​ r ​ t}

5

0.4298

\pm

0.1535

0.3328

\pm

0.0691

39.778

\pm

11.032

0.5271

\pm

0.1261

0.2041

N_{p ​ e ​ r ​ t}

10

0.3211

\pm

0.1187

0.1697

\pm

0.0769

22.741

\pm

10.975

0.7388

\pm

0.0974

0.2111

N_{p ​ e ​ r ​ t}

20

0.1311

\pm

0.0931

0.0288

\pm

0.0302

4.667

\pm

5.296

0.9536

\pm

0.0601

0.3584

N_{p ​ e ​ r ​ t}

40

0.0447

\pm

0.0311

0.0036

\pm

0.0026

0.222

\pm

0.679

0.9988

\pm

0.0057

0.4824

Table 15. Table 15: Sweep through the number of transients observed for a simulation that selects 3 fixed oscillators and adds a random Gaussian perturbation to their phases with standard deviation 10 10 10 .

Param

Value

Normalized

difference in area

Mean absolute

deviation

Error rate

Area under

ROC curve

Interval

width

(> 90 %)

N_{p ​ e ​ r ​ t}

1

0.8851

\pm

0.1332

0.1203

\pm

0.0716

50.0

\pm

5.277

0.316

\pm

0.1035

0.0293

N_{p ​ e ​ r ​ t}

2

0.7678

\pm

0.215

0.2857

\pm

0.0916

50.963

\pm

6.879

0.315

\pm

0.0915

0.0326

N_{p ​ e ​ r ​ t}

5

0.286

\pm

0.0821

0.1973

\pm

0.0625

30.963

\pm

9.096

0.6136

\pm

0.1118

0.1767

N_{p ​ e ​ r ​ t}

10

0.1567

\pm

0.0624

0.0788

\pm

0.0609

15.704

\pm

4.246

0.7659

\pm

0.0863

0.3186

N_{p ​ e ​ r ​ t}

20

0.0783

\pm

0.056

0.0268

\pm

0.06

10.815

\pm

4.943

0.8699

\pm

0.0777

0.4238

N_{p ​ e ​ r ​ t}

40

0.0262

\pm

0.0191

0.0041

\pm

0.0078

7.185

\pm

4.837

0.9391

\pm

0.0565

0.5419

Table 16. Table 16: Sweep through the number of transients observed for a simulation that selects 3 random oscillators and resets their phases to 0 0 .

Param

Value

Normalized

difference in area

Mean absolute

deviation

Error rate

Area under

ROC curve

Interval

width

(> 90 %)

N_{p ​ e ​ r ​ t}

1

0.891

\pm

0.1231

0.1058

\pm

0.0706

46.889

\pm

8.444

0.3337

\pm

0.1156

0.0299

N_{p ​ e ​ r ​ t}

2

0.9214

\pm

0.088

0.3684

\pm

0.0897

50.444

\pm

7.031

0.2938

\pm

0.0783

0.0474

N_{p ​ e ​ r ​ t}

5

0.2075

\pm

0.0319

0.0313

\pm

0.0179

23.704

\pm

10.379

0.7769

\pm

0.0874

0.2803

N_{p ​ e ​ r ​ t}

10

0.0294

\pm

0.0061

0.0033

\pm

0.0014

1.926

\pm

2.592

0.9894

\pm

0.0157

0.3855

N_{p ​ e ​ r ​ t}

20

0.0184

\pm

0.003

0.0015

\pm

0.0007

0.074

\pm

0.406

0.9999

\pm

0.0008

0.522

N_{p ​ e ​ r ​ t}

40

0.0116

\pm

0.0028

0.0015

\pm

0.0005

0.0

\pm

0.0

1.0

\pm

0.0

0.6206

Table 17. Table 17: Sweep through the number of transients observed for a simulation that carries out the optimization for a linear system as proposed in [ 28 ] .

Param

Value

Normalized

difference in area

Mean absolute

deviation

Error rate

Area under

ROC curve

Interval

width

(> 90 %)

N_{r ​ e ​ s}

1

0.8188

\pm

0.1236

8633.13

\pm

25993.8

45.852

\pm

8.159

0.5239

\pm

0.0886

2012.5215

N_{r ​ e ​ s}

2

0.8895

\pm

0.2663

3481.08

\pm

9497.15

23.037

\pm

11.635

0.7415

\pm

0.1298

1.6593

N_{r ​ e ​ s}

5

0.2267

\pm

0.3482

2.5746

\pm

11.3136

2.148

\pm

7.063

0.9756

\pm

0.0733

0.75

N_{r ​ e ​ s}

10

0.0238

\pm

0.0242

0.0061

\pm

0.0078

0.074

\pm

0.406

0.9979

\pm

0.0114

0.8436

N_{r ​ e ​ s}

20

0.0099

\pm

0.0037

0.0024

\pm

0.001

0.0

\pm

0.0

1.0

\pm

0.0

0.9131

N_{r ​ e ​ s}

40

0.0077

\pm

0.0033

0.0023

\pm

0.001

0.0

\pm

0.0

1.0

\pm

0.0

0.9329

Equations22

\dot{θ}_{k} = ω_{k} + j = 1 \sum N σ_{k j} A_{k j} Γ (θ_{j} - θ_{k}) .

\dot{θ}_{k} = ω_{k} + j = 1 \sum N σ_{k j} A_{k j} Γ (θ_{j} - θ_{k}) .

Γ (θ) = a_{0} + n = 1 \sum \infty a_{n} cos (n x) + b_{n} sin (n x),

Γ (θ) = a_{0} + n = 1 \sum \infty a_{n} cos (n x) + b_{n} sin (n x),

Γ (θ) = a_{0} + n = 1 \sum m [a_{n} cos (n θ) + b_{n} sin (n θ)]

Γ (θ) = a_{0} + n = 1 \sum m [a_{n} cos (n θ) + b_{n} sin (n θ)]

(v_{j})_{k} = ω_{k} + \frac{K}{N} j = 1 \sum N A_{k j} Γ (θ_{j} - θ_{k}),

(v_{j})_{k} = ω_{k} + \frac{K}{N} j = 1 \sum N A_{k j} Γ (θ_{j} - θ_{k}),

L (θ_{j}) x = v_{j} .

L (θ_{j}) x = v_{j} .

E (ω, A) = \frac{1}{T} j = 1 \sum T ∥ v_{j} - \hat{v}_{j} (ω, A) ∥_{2}^{2}

E (ω, A) = \frac{1}{T} j = 1 \sum T ∥ v_{j} - \hat{v}_{j} (ω, A) ∥_{2}^{2}

Γ_{k j} (θ) = n = 1 \sum m a_{k j n} cos (n θ) + b_{k j n} sin (n θ)

Γ_{k j} (θ) = n = 1 \sum m a_{k j n} cos (n θ) + b_{k j n} sin (n θ)

E (ω, A, a, b)

E (ω, A, a, b)

Area ratio = \frac{\int _{0}^{2 π} ∣Γ ( θ ) - Γ ^ ( θ ) ∣ d θ}{\int _{0}^{2 π} ∣Γ ( θ ) ∣ d θ} .

Area ratio = \frac{\int _{0}^{2 π} ∣Γ ( θ ) - Γ ^ ( θ ) ∣ d θ}{\int _{0}^{2 π} ∣Γ ( θ ) ∣ d θ} .

Mean absolute deviation = \frac{1}{N} k = 1 \sum N ∣ ω_{k} - \overset{ω}{^}_{k} ∣.

Mean absolute deviation = \frac{1}{N} k = 1 \sum N ∣ ω_{k} - \overset{ω}{^}_{k} ∣.

(\frac{precision ^{- 1} + recall ^{- 1}}{2})^{- 1} = 2 \times \frac{precision \times recall}{precision + recall} .

(\frac{precision ^{- 1} + recall ^{- 1}}{2})^{- 1} = 2 \times \frac{precision \times recall}{precision + recall} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Model reconstruction from temporal data for coupled oscillator networks

Mark J. Panaggio1, Maria-Veronica Ciocanel2, Lauren Lazarus3, Chad M. Topaz4, Bin Xu5

1Department of Mathematics, Hillsdale College, Hillsdale, MI 49242, USA

2Mathematical Biosciences Institute, The Ohio State University, Columbus, OH 43210, USA

3Department of Mathematics, Trinity College, Hartford, CT 06106, USA

4Department of Mathematics and Statistics, Williams College, Williamstown, MA 01267, USA

5Department of Applied and Computational Mathematics and Statistics, University of Notre Dame, Notre Dame, IN 46556, USA

[email protected]

Abstract

In a complex system, the interactions between individual agents often lead to emergent collective behavior like spontaneous synchronization, swarming, and pattern formation. The topology of the network of interactions can have a dramatic influence over those dynamics. In many studies, researchers start with a specific model for both the intrinsic dynamics of each agent and the interaction network, and attempt to learn about the dynamics that can be observed in the model. Here we consider the inverse problem: given the dynamics of a system, can one learn about the underlying network? We investigate arbitrary networks of coupled phase-oscillators whose dynamics are characterized by synchronization. We demonstrate that, given sufficient observational data on the transient evolution of each oscillator, one can use machine learning methods to reconstruct the interaction network and simultaneously identify the parameters of a model for the intrinsic dynamics of the oscillators and their coupling.

\floatsetup

[figure]style=plain,subcapbesideposition=top

Keywords: nonlinear dynamics, phase oscillators, Kuramoto oscillators, network reconstruction, network topology, machine learning, computational methods

1 Introduction

Nature and society brim with systems of coupled oscillators, including pacemaker cells in the heart, insulin-secreting cells in the pancreas, neural networks in the brain, fireflies that synchronize their flashing, chemical reactions, Josephson junctions, power grids, metronomes, and applause in human crowds, to name merely a few [1, 2, 3, 4, 5, 6, 7, 8, 9]. The dynamics of coupled oscillators in complex networks have been studied extensively. In particular, networks of interacting oscillators governed by the seminal Kuramoto model [10, 11, 12] or its variants are known to exhibit a rich variety of behaviors including spontaneous synchronization, phase transitions, and pattern formation [13, 14]. The connections in the network can play a pivotal role in determining these dynamics.

Given oscillators where the governing equations and network topology are known, it is fairly straightforward to explore the dynamics of the system using numerical methods and, in special cases, analytical methods. Unfortunately, in many systems, the topology of the interaction network and the intrinsic oscillator properties can be difficult or impossible to observe directly. Imagine, for example, experimental results that track neuronal cell network or gene regulatory network activity for a large number of oscillators, giving rise to time series data. One might like to infer model information from these time series.

It is crucial to distinguish between functional network connections and structural network connections. Functional connectivity refers to the temporal correlation of the oscillators and is often directly observable. For instance, neurons that fire synchronously are functionally connected. In contrast, structural connectivity, also called network topology, refers to the underlying connections present in the network. For instance, neurons that are connected by synapses or neurotransmitters are structurally connected. In many systems these structural connections can be difficult or impossible to observe.

Network reconstruction is an active area of research [15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29]. In section 2.2, we review the literature most closely related to our approach. In our work, we solve an inverse problem using observed time series data of the coupled oscillators to reconstruct both the network topology, i.e. the structural network connections, and the intrinsic oscillator dynamics. For the remainder of this paper, when we refer to inferring or reconstructing the model, we mean both of the aforementioned components: network connections and intrinsic oscillator properties.

In this paper, our primary contribution is an algorithm that addresses certain challenges of existing methods mentioned above. One such challenge is that existing methods set up the inverse problem as a linear system that involves many unknown parameters. Consequently, a large amount of time series data is necessary. In contrast, our algorithm results in a system of equations that is smaller, and is nonlinear. We solve these using optimization tools designed for neural networks. As a result, we are able to infer the model with a much smaller amount of data. A second challenge, as mentioned in [21, 17, 28], is that it is typically quite difficult to infer the model for networks that are synchronized. For such networks, we explore what perturbations of the synchronized state are needed to enable accurate inference of the model.

The mathematical setting of our study is oscillator networks composed of limit-cycle oscillators. These are also known as phase oscillators, meaning they are characterized by a single phase variable defined on the circle. We explore four different choices of oscillator models, all of which fall into the general framework for uniformly coupled phase oscillators developed in [30]: the classic Kuramoto model [10]; a Kuramoto-like model with square-wave coupling function; the Kuramoto-Sakaguchi model [31]; and a phase-field reduction of weakly-coupled Hodgkin-Huxley oscillators [32] (see A). We take all networks to be Erdös-Rényi random graphs [33]. We attempt to infer the network topology, the intrinsic frequency of each model oscillator, and the so-called coupling function that specifies the influence of one oscillator on others.

To carry out the model reconstruction, we begin with simulated time series for the oscillator phases and estimated phase velocities. We set up an optimization problem involving the mean squared error for the predicted phase velocities generated by a system of nonlinear differential equations, which can be represented as a convolutional neural network. We then use computational methods designed for neural networks to estimate the optimal parameters, thereby inferring the adjacency matrix for network connectivity, the oscillator coupling strength, the oscillator frequencies, and the Fourier coefficients of the coupling function. This approach is effective for a variety of different networks and model parameters. More specifically, we find that:

•

Accurate inference of the coupling network, frequencies and coupling functions is possible independent of the model when the system does not synchronize or when sufficient perturbations from synchronization are permitted.

•

Model reconstruction is possible with a smaller amount of data than previous approaches that set up the problem as a large system of linear equations.

•

Computational methods designed for neural networks, such as mini-batch gradient-descent implemented in TensorFlow [34], can be used to solve the optimization problem associated with model reconstruction.

•

Synchronizing networks can be reconstructed using random phase resets or sufficiently large phase perturbations to a small randomly-chosen set of oscillators.

•

Synchronizing networks can also be reconstructed using perturbations to a sufficiently large fixed subset of oscillators.

The rest of this paper is organized as follows. In section 2 we introduce the Kuramoto model (a standard model for the dynamics of coupled oscillators), set up the inverse problem for reconstruction, and discuss approaches for solving this inverse problem. In section 3 we describe a series of experiments used to test the effectiveness of these reconstruction techniques over a wide range of model parameters. In section 4, we present the results of the aforementioned experiments and discuss perturbation strategies for improving the reconstruction when the system synchronizes. Finally, in section 5, we summarize our primary findings and discuss extensions to our methodology.

2 Models and Methods

The Kuramoto model is a standard model for the dynamics of coupled oscillators. We consider $N$ oscillators with phases $\theta_{k}\in[\,0,2\pi)$ for $k=1,2,\ldots,N$ , each with an intrinsic frequency $\omega_{k}$ . These oscillators are coupled through an interaction network with adjacency matrix $A$ , with the entry $A_{kj}\in\{0,1\}$ determining whether oscillators $k$ and $j$ are coupled. The dynamics of $\theta_{k}$ are governed by the equation

[TABLE]

Here $\sigma_{kj}$ represents the strength of the coupling between oscillators $j$ and $k$ , and $\Gamma(\theta)$ represents the coupling function. If the coupling function $\Gamma(\theta)$ is continuous and $2\pi-$ periodic, and $\Gamma^{\prime}(\theta)$ is piecewise continuous, then it can be represented using a uniformly convergent Fourier series

[TABLE]

where the coefficients satisfy $a_{n},b_{n}\leq M/n^{2}$ for some $M$ . Therefore, these coefficients decay to [math] for large $n$ and this function can be approximated to arbitrary accuracy using a truncated Fourier series

[TABLE]

with sufficiently large $m$ . Note that within the model, one can set $a_{0}=0$ without loss of generality, using the change of variables $\omega_{k}+\sum_{j=1}^{N}\sigma_{kj}A_{kj}a_{0}\rightarrow\omega_{k}$ .

In Kuramoto’s original formulation [10], the oscillators were globally coupled ( $A_{kj}=1$ ) with coupling strengths $\sigma_{kj}=K/N$ where $K$ scales the global coupling strength. He used the coupling function $\Gamma(\theta)=\sin(\theta)$ , and showed that, for $K>K_{C}$ where $K_{C}$ is a critical value related to the width of the distribution of intrinsic frequencies, the oscillators begin to synchronize, achieving identical phase velocities with phases distributed around the population mean. Analogous results were obtained later for systems with periodic coupling functions possessing only odd harmonics [30, 11] and arbitrary complex networks [35].

2.1 Simulated data generation

We investigate a method for inferring the parameters of the Kuramoto model for phase oscillators on random graphs with uniform coupling strengths $\sigma_{kj}=\frac{K}{N}$ . In our experiments, we focus on undirected Erdős-Rényi graphs, where each possible edge in the network is present with probability of $p$ . It is straightforward to extend our approach to arbitrary networks with non-uniform coupling strengths, though this would likely require a larger amount of time series data. The details of our method appear in sections 2.2 and 2.3.

In order to test our method, we generate data according to the following procedure:

Generate an Erdős-Rényi network for a fixed connection probability $p$ where the nodes represent individual oscillators with natural frequencies sampled from a normal distribution with mean $\mu$ and standard deviation $\sigma$ . 2. 2.

Use numerical integration to generate a time series for the evolution of this system for $t\in[0,t_{max}]$ , starting from initial phases drawn from a uniform distribution on $[0,2\pi]$ . For numerical integration, we use an explicit Runge-Kutta method of order 5(4) [36] or a stochastic Runge-Kutta method of order 2 [37]. 3. 3.

Compute the phases $\theta_{k}(t_{n})$ at times $t_{n}=n\Delta t$ with timestep $\Delta t$ and for $n=0,1,\ldots T$ where $T=t_{max}/\Delta t$ to obtain observations that are evenly spaced in time. 4. 4.

Estimate the phase velocities $v_{k}(t_{n})=\dot{\theta}_{k}(t_{n})$ using central differencing in time with Savitsky-Golay filtering [38]. We used a window length of $5$ with first degree polynomials. 5. 5.

Repeat steps (ii-iv) $N_{res}$ times with different uniform random initial conditions to obtain sufficient data during the transient evolution of the system.

See tables 1 and 2 for the network parameters and numerical solution parameters used in this procedure.

The aforementioned process is intended to produce simulated data mimicking that which experimentalists might collect when observing real world networks. Note however, that it may not be possible to control the initial phases in an experiment. We therefore evaluate the reconstruction methods for varying $N_{res}$ , as well as for cases in which small perturbations are used instead of different initial conditions (see section 4.1).

2.2 Inverse problem formulation

We will formulate a set of equations whose least squares solution can be used to estimate the natural frequencies $\omega_{k}$ of each oscillator, the adjacency matrix $A_{kj}$ of the coupling network, the coupling strength $K$ , and the coupling function $\Gamma(\theta)$ , from observed phases $\theta_{k}(t_{n})$ and phase velocities $v_{k}(t_{n})$ generated using the method outlined above. This work builds on prior work by Shandilya and Timme [17], where the adjacency matrix for a network of oscillators is estimated from a time series. We summarize their approach below.

Define vectors $\bm{\theta}_{j}=[\theta_{1}(t_{j}),\theta_{2}(t_{j}),\ldots,\theta_{N}(t_{j})]^{T}$ and $\bm{v}_{j}=\left[\dot{\theta}_{1}(t_{j}),\dot{\theta}_{2}(t_{j}),\ldots,\dot{\theta}_{N}(t_{j})\right]^{T}$ consisting of the oscillator phases and phase velocities. When the coupling strength $K$ and coupling function $\Gamma$ are known, the phase velocity for oscillator $k$ ,

[TABLE]

is linear in the unknown parameters $\omega_{k}$ and $A_{kj}$ . Therefore, one can infer both the natural frequencies and adjacency matrix by solving a linear system of $N\cdot T$ equations, where $T$ is the number of time steps and each timestep $j$ provides $N$ equations of the form

[TABLE]

Here $L(\bm{\theta}_{j})$ is an $N\times(N+N^{2})$ matrix where each row is determined from (4) for a particular oscillator, and $\bm{x}$ is an $(N+N^{2})\times 1$ vector with the unknown natural frequencies $\bm{\omega}$ and the entries in the adjacency matrix $A$ . The number of unknowns in $A$ can be reduced to $N+N(N-1)/2$ if one assumes $A_{jj}=0$ and $A_{kj}=A_{jk}$ .

Since the number of equations in the linear system is determined by the number of observed timesteps $T$ and the number of oscillators $N$ , one can ensure that the system is over-determined by collecting enough observations so that $T>1+N$ . One can then estimate the parameters by minimizing a loss function such as the mean squared error,

[TABLE]

where $\bm{v}_{j}$ denotes the observed value and $\hat{\bm{v}}_{j}$ denotes the predicted value of the velocity.

As long as the system remains far from synchronization, increasing the number of observations $T$ will provide additional linearly independent equations to aid with reconstruction. However, as networks synchronize, i.e. $(\bm{v}_{j})_{k}=\dot{\theta}_{k}\rightarrow\dot{\theta}$ , additional observations become (nearly) linearly dependent. This causes the system of equations to become highly ill-conditioned and makes numerical solutions sensitive to noise and rounding errors. As a result, it is often necessary to perturb the system away from equilibrium to ensure that a sufficient number of linearly independent observations can be collected.

Although the approach of [17] outlined above is effective, it is limited by the requirement that the coupling function be known a priori. In [28], Pikovsky addresses this limitation by expressing $\Gamma$ as a Fourier series so that it, too, can be estimated by solving an analogous optimization problem. However, a challenge remains. If one represents the coupling function as in (3), then the system of equations is no longer linear as terms of the form $A_{kj}a_{n}$ and $A_{kj}b_{n}$ appear. Pikovsky [28] circumvents this issue by defining distinct coupling functions

[TABLE]

for each pair of oscillators and by setting $A_{kj}=1$ . In this formulation, if oscillators $k$ and $j$ are uncoupled, then $a_{kjn}=b_{kjn}=0$ for all $n$ . This modification preserves the linearity of the system of equations and allows for the description of a more general class of networks with distinct coupling functions for each pair of oscillators. Unfortunately, it comes at a cost: the number of unknown parameters increases dramatically from $N+N^{2}$ to $N+2mN^{2}$ . Therefore, even longer observation times are required. Additionally, as with the method of [17], if the system synchronizes, there may be numerical difficulties when inferring parameters. Finally, the large number of free parameters makes this model particularly prone to overfitting. As before, one could assume that connections in the network are symmetric, i.e., that $\Gamma_{kj}(\theta)=\Gamma_{jk}(\theta)$ and that self edges are not included $\Gamma_{jj}(\theta)=0$ , allowing for a reduction in the number of parameters to $N+mN(N-1)$ .

We propose an alternative approach. We use a single coupling function represented by a Fourier series as described in (3) for all coupling terms and attempt to infer the $2m$ Fourier coefficients $\bm{a}=[a_{1},a_{2},\ldots,a_{m}]^{T}$ and $\bm{b}=[b_{1},b_{2},\ldots,b_{m}]^{T}$ along with the intrinsic frequencies $\bm{\omega}$ , adjacency matrix $A$ , and global coupling strength $K$ . This leads to a total $2m+N+N^{2}+1$ inferred parameters or $2m+N+N(N-1)/2+1$ with symmetry constraints and no self-connections. This is therefore a modest increase of $2m+1$ parameters over the case considered by Shandilya and Timme [17] without requiring prior knowledge of the coupling function.

As mentioned previously, this leads to a set of nonlinear equations due to the appearance of terms of the form $KA_{kj}a_{n}$ and $KA_{kj}b_{n}$ in (4). Since $K$ always appears in products involving $a_{n}$ and $b_{n}$ , these parameters are not structurally identifiable. One could set $K=1$ without loss of generality. Instead, we include $K$ as an inferred parameter and introduce penalty terms to the objective function we seek to minimize:

[TABLE]

The inclusion of a penalty function ensures the existence of a non-degenerate local minimum. In this objective function, the first term above represents the mean squared error. The remaining terms are penalty terms with weights $\lambda_{\Gamma}=0.0001$ , $\lambda_{A}=10^{-6}$ , and $\lambda_{bd}=10^{5}$ . These hyper-parameter values were tuned initially to provide satisfactory reconstruction performance, and then kept constant in all subsequent experiments (see also table 3). Additional tuning of these parameters for specific networks could further improve the accuracy of the model reconstruction. The second term with $\lambda_{\Gamma}$ introduces $L_{2}$ regularization which favors smaller estimates of the parameter values $a_{n}$ and $b_{n}$ . This is useful for combating overfitting and promoting sparse representations, and is analogous to using a Bayesian prior centered at 0 for the Fourier coefficients [39]. The last term with $\lambda_{A}$ and $\lambda_{bd}$ penalizes adjacency values that are far from 0 and 1 as well as those that are negative or greater than 1 to ensure that the estimates fall within the desired range of $[0,1]$ . We do not penalize $K$ and instead allow it to be as large as necessary to counteract the parameter shrinkage caused by the penalty terms.

2.3 Inverse problem solution

Minimizing (2.2) poses a number of challenges. First of all, due to the nonlinearity of (4), this function may not be convex and there are no guarantees that local optimization methods will converge to a global minimum. Secondly, when the number of observations $T$ is large, this function may be costly to compute, and minimizing the number of function evaluations is paramount.

Fortunately, the theoretical challenge of nonconvexity does not prevent our method from obtaining consistently reliable model reconstruction, as we show in section 4. The computational difficulties can be addressed by using tools designed for neural networks and implemented in TensorFlow to efficiently compute gradients and perform minimization with a variant of mini-batch gradient descent [40].

To elucidate the connection with neural networks, we comment that (3) can be viewed as a 2-d convolutional neural network with a convolution of size $1\times 1$ and stride 1 applied to the $T\times N\times N$ tensor consisting of phase differences $\theta_{k}-\theta_{j}$ . The hidden layer consists of the $2m$ harmonics of the form $\cos n\theta$ and $\sin n\theta$ applied to each entry in this tensor. To avoid redundant weights, we fix the weights in the first hidden layer to be 1 and the biases to be 0. In the second hidden layer, each harmonic is assigned a fixed bias of 0 and a variable weight $a_{n}$ or $b_{n}$ corresponding to the Fourier coefficients. Figure 1 provides a schematic of this neural network. Once the coupling terms have been computed, the rest of (4) and the resulting loss function (2.2) are straightforward to compute using vectorized operations.

We initialize the inferred parameters as follows: the adjacency matrix $A$ has initial entries drawn from $\mathcal{N}(0.5,1/N)$ , the frequencies $\omega_{k}$ are initialized from $\mathcal{N}(0,1/N)$ , the coupling strength $K$ is drawn from $\mathcal{N}(1,1/N)$ , and the Fourier coefficients $\bm{a},\bm{b}$ of the coupling function are initialized at [math].

TensorFlow maintains a computational graph for these operations which allows one to automatically compute gradients. One can therefore use gradient descent methods to compute the optimal estimates ( $\hat{a}_{n}$ , $\hat{b}_{n}$ , $\hat{\omega}_{k}$ , $\hat{A}_{kj}$ , $\hat{K}$ ) for the parameters. We used a mini-batch gradient descent method with batch size of $100$ , i.e., we randomly assign the data to batches of 100 time-steps. For each batch we calculate the gradient of the loss function and update the estimated parameters by taking a small step in a direction determined from the gradient. Once this has been repeated for all batches (one epoch), we pass through the data again for a total of $200$ or more epochs. We determined the number of epochs experimentally by iterating until the mean squared error no longer decreased. Typically, 300 epochs were sufficient. However for certain sweeps examining the role of $t_{max}$ , $N_{res}$ , and $\sigma$ , the number of epochs required for convergence was as high as 10000.

By using small random batches of time steps, the gradient can be estimated quickly. This has the added benefit of introducing stochasticity into gradients which makes the algorithm less susceptible to getting caught in local minima. We use AdamOptimizer, which is a gradient descent method with momentum and an adaptive learning rate [41], with default parameter values for optimization in TensorFlow.

There is minor variability in the success of the algorithm due to the random initialization of the inferred parameters and to the batching of the observed data. In order to ensure an accurate network reconstruction, we can retrain the model with new initial values several times and choose the result which has the smallest mean squared error for the velocity predictions on the validation set, which was not considered during the training process. For any individual model network, this validation error is correlated with the accuracy of the inferred parameters. In all examples below, we attempt to reconstruct each network five times before choosing the best reconstruction.

2.4 Post-processing

The method described in section 2.3 produces continuous estimates for the parameters ( $\hat{a}_{n}$ , $\hat{b}_{n}$ , $\hat{\omega}_{k}$ , $\hat{A}_{kj}$ , $\hat{K}$ ) which could be used to predict the dynamics of the system under a variety of initial conditions. However, we are interested in evaluating whether these parameter estimates are an accurate reconstruction of the original model. Before these values can be compared to the model parameters used to generate the data, additional post-processing is needed. We therefore redefine our parameter estimates via the transformations $\hat{K}\hat{A}_{kj}\rightarrow\hat{A}_{kj}$ , $c_{0}+c_{1}\hat{\Gamma}\rightarrow\hat{\Gamma}$ where $c_{0}$ and $c_{1}$ are selected to minimize $\displaystyle\int_{0}^{2\pi}|\Gamma(\theta)-\hat{\Gamma}(\theta)|d\theta$ , and $\hat{\omega}_{k}-K{c_{0}}/{N}\sum_{j=1}^{N}A_{kj}\rightarrow\hat{\omega}_{k}.$ These transformation are necessary to address the aforementioned identifiability issues with $K$ and to allow $\Gamma$ to have nonzero mean.

In the original model, the adjacency matrix entries are either zero or one. However, we treat the entries as continuous variables during optimization and then choose a threshold $\epsilon$ so that $A_{kj}<\epsilon$ is chosen to be 0 and $A_{kj}\geq\epsilon$ is chosen to be 1. In practice, one could fix $\epsilon$ or select $\epsilon$ so that the reconstructed model minimizes the mean squared error for the data set. However, we explore a range of threshold values using ROC curves and report the reconstruction error rates using the value of $\epsilon$ that yields the largest $F_{1}$ score for the adjacency matrix reconstruction (see B). These optimal values are robust and good performance is typically obtained over a wide range of thresholds (see section 4 for details).

3 Experimental design

In order to validate the robustness of our approach, we test the reconstruction method on data simulated with a variety of networks and parameter values. We explore this high-dimensional parameter space by fixing all of the model parameters except for one and then sweeping the remaining parameter over a wide range of values. See tables 1 and 2 for a list of default values as well as the ranges considered. For each set of parameter values, we compute 30 networks with random initial phases and intrinsic frequencies. We then attempt to reconstruct the underlying model for each network and compute various performance metrics for each reconstruction.

Coupling function — We evaluate the inferred coupling function $\hat{\Gamma}(\theta)$ by comparing it to the true coupling function $\Gamma(\theta)$ via the normalized difference in area defined as follows:

[TABLE]

This represents the area between the true and estimated coupling function curves, weighted by the area under the curve of the true function. This quantity serves as a measure of the error in the reconstruction. A perfect reconstruction would have a normalized difference in area of zero. In the Kuramoto case $\Gamma(\theta)=\sin(\theta)$ , the initial estimate $\hat{\Gamma}(\theta)=0$ corresponds to a value of 1. Therefore values significantly lower than 1 indicate progress towards a correct reconstruction.

Intrinsic Frequencies — We compare the inferred intrinsic frequencies $\hat{\omega}_{k}$ to the true intrinsic frequencies $\omega_{k}$ using the mean absolute deviation defined as follows:

[TABLE]

Values near zero indicate accurate reconstructions. We considered alternative metrics such as relative deviations and the correlation between true and inferred frequencies, but these were less informative because they tend to amplify errors when the the intrinsic frequencies are close to zero.

Adjacency matrix — We investigate several methods for evaluating the success of adjacency matrix reconstruction. The accuracy can be inspected visually by plotting the true and reconstructed matrices along with the absolute differences in a grid where values range from [math] (black) to $1$ (white); see figure 2(a-c). Since we are primarily interested in discrete adjacency values, we interpret reconstruction as a classification problem and compute three standard evaluation metrics: the $F_{1}$ * score*, the classification error rate for the connections, and the area under the ROC curve. See B for precise definitions of these metrics.

4 Results

Here we discuss the key results from the parameter sweeps outlined in section 3.

Our method can successfully reconstruct models with a variety of coupling functions. Figure 3 and table 5 show the results for four different coupling functions (see A for details), three of which come from popular coupled oscillator models: the Kuramoto model, the Kuramoto-Sakaguchi model, and a phase reduction of the Hodgkin-Huxley model, and a fourth, a square wave which represents a generic discontinuous function with high frequency Fourier harmonics. We note that for the square wave coupling function, the accuracy of the reconstruction suffers since the reconstruction includes only five Fourier harmonics and the true coupling function contains an infinite number of harmonics with amplitudes that decay slowly due to the discontinuities. Despite these limitations, the adjacency matrix was still estimated with a high degree of accuracy.

Counterintuitively, as the number of oscillators increases (figure 3 and table 6), we find that reconstruction accuracy improves. This can be explained by the observation that although increasing the number of oscillators increases the number of unknown parameters, it also provides a greater number of pairwise phase differences which can aid in reconstruction of the coupling function. The resulting modest improvements in the estimate of the coupling function can then allow for better inference of the adjacency matrix and intrinsic frequencies as well. This is illustrated by the normalized difference in area plot in figure 3.

In figure 4 and table 7, we consider simulations where the standard deviation $\sigma$ of the oscillator frequencies ranges from $0.01$ to $1$ . The normalized difference in area of the coupling function and the mean absolute deviation of the inferred frequencies both show that we can infer the network well for $\sigma\leq 1$ . For larger standard deviations, the quality of the reconstruction suffers slightly, since the large frequencies dominate the coupling terms within the phase velocity relationship.

We also explore the impact of noise on the model reconstruction since both observation noise and dynamic noise would be present in experiments. In table 8 and table 9, we demonstrate that our model reconstruction method is robust to moderate amounts of both types of noise. As in [17], we introduce dynamic noise using stochastic differential equations with Gaussian noise. Observation noise was also Gaussian and was added after integrating the governing differential equations. For both types of noise, we used mean [math] and standard deviations between [math] and $1$ . It is worth noting that dynamic noise with a standard deviation less than $10^{-3}$ helps with the reconstruction of both coupling function and frequencies. This can be explained by the fact that a small amount of noise keeps the system from reaching equilibrium, effectively increasing the duration of the transients that provide useful information about the structure of the network.

We carry out additional parameter sweeps with varying network connectivity $p$ (table 10), maximum simulation time $t_{\max}$ (table 11) and the number of simulation restarts (table 12) with all other parameters set using the default values given in tables 1, 2, and 3. In each case, we are consistently able to reconstruct the coupling function, the underlying network and the intrinsic frequencies model over a wide range of model parameters. The tables in C illustrate averaged numerical results for 30 trials of each of these parameter sweeps in terms of evaluation metrics such as normalized difference in area, mean absolute deviation, error rate, area under ROC curve, and a range of thresholds that yield $F_{1}$ scores within $90\%$ of the largest value.

4.1 Results for perturbations of synchronous dynamics

As both [17] and [28] point out, when a network remains synchronized, i.e. when $\dot{\theta}_{1}=\dot{\theta}_{2}=\ldots=\dot{\theta}_{N}$ , one cannot infer the model parameters due to the fact that the observed phases no longer provide linearly independent equations. Furthermore, even with precise knowledge of the adjacency matrix, one could not hope to reconstruct the coupling function $\Gamma(\theta_{k}-\theta_{j})$ without data over a wide range of phase differences $\theta_{k}-\theta_{j}$ .

As such we investigated a method for using small perturbations to introduce brief transients into the dynamics to allow for successful model reconstruction. To test this, we used nearly identical oscillators with frequency standard deviation $\sigma=0.0001$ , which causes the system to quickly converge to a synchronized state for almost all initial conditions. We then initialized $\bm{\theta}(0)=\bm{0}$ so that the system begins from perfect synchrony. Then, at times $kt_{max}/N_{pert}$ for $k=1,2,\ldots N_{pert}$ we added a phase perturbation to a subset of the oscillators. Each perturbation causes some of the oscillators to briefly become desynchronized. In this way, the total number of observed phases remains constant, but the fraction of those observations that occur during transient dynamics is proportional to the number of perturbations, $N_{pert}$ . The observed phases during these transients provide meaningful data about the structure of the network. Figure 6 demonstrates that, as expected, with too few or too small perturbations, model reconstruction is unsuccessful. However, as the number of perturbations $N_{pert}$ increases, the accuracy of the estimated coupling function, adjacency matrix, and the oscillator frequencies improves.

We explored two methods for selecting which oscillators to perturb: fixed subsets, in which the subset of perturbed oscillators was selected at the beginning of the experiment and these same oscillators were perturbed repeatedly, and random subsets, in which a random subset of the system was selected for each perturbation. Although a perturbation to the phase of a particular oscillator does propagate to the phases of neighboring oscillators through the coupling terms, those perturbations decay quickly and are therefore most effective for revealing the local structure of the network. As such, although perturbations to a fixed subset might be more practical in a physical experiment, one must typically perturb a larger fraction of the system to obtain comparable performance to that which is obtained with perturbations to random subsets.

We also considered two types of phase perturbations: phase resets, in which selected oscillators had their phases reset to 0, and phase shifts, in which selected oscillators had their phases modified by adding a random shift $\eta_{pert}\sim\mathcal{N}(0,\sigma^{2}_{pert})$ . Phase resets may be more feasible from an experimental perspective, but have the drawback of preserving the mutual synchrony of the subset of oscillators that are perturbed. As such one will typically need to use random subsets in tandem with phase resets in order to be able to resolve the connections between the perturbed oscillators.

In figure 6(a-c), we used phase shifts with $\sigma_{pert}=0.01$ to a fixed subset of size 1, i.e. a single oscillator out of $10$ . This gives poor reconstruction regardless of the number of perturbations due to the small size of the perturbation. On the other hand, in figure 6(d-f) we perturb a fixed subset (3 out of 10 oscillators) with random phase shifts with $\sigma_{pert}=10$ 111This is virtually indistinguishable from uniform perturbations $X\sim\mathcal{U}[-\pi,\pi]$ . In this case, the perturbations affect a sufficiently large proportion of the oscillators and performance begins to improve once there are 5 or more perturbations. The results in figure 6(g-i) illustrate that one only needs to perturb 1 out of 10 oscillators to obtain similar performance when the oscillators selected are chosen randomly. In figure 6(j-l), we show the results using phase resets to random subsets of 3 oscillators. Again, performance begins to improve dramatically once 5 or more perturbations are used. The tables summarizing the results of these perturbation strategies are provided in C.

4.2 Comparison with reference [28]

The amount of transient data necessary for reconstruction is also useful in comparing our method to previous approaches. As discussed in section 2.2, Pikovsky proposes an alternative method where distinct coupling functions are considered for the interaction of each pair of oscillators [28]. This ensures that the system of equations in the optimization is linear, however it also increases the number of unknown coefficients significantly. Our approach relies on the assumption that the same coupling function $\Gamma$ is used for all pairs of oscillators. This is a reasonable approximation for physical systems when the physical mechanisms governing the interactions between oscillators are the same. Pikovsky [28] uses a more general model in which the coupling functions may be different for each pair of oscillators.

Figure 7 shows that when the coupling functions are identical, our method provides more accurate reconstructions of both the network topology and the coupling function with smaller amounts of transient data. As the amount of transient data increases, both methods ultimately achieve perfect network reconstruction while the approach in [28] ultimately obtains slightly better estimates for the coupling function. Therefore, our approach would be preferred under circumstances where the amount of data is limited and when the coupling functions are nearly identical.

5 Discussion and conclusions

In this paper, we have designed a method for reconstructing models of coupled oscillator networks including both the network connections and the intrinsic oscillator properties. We hope analysts might investigate (non)convexity of our penalty function and properties of minimizers. These issues aside, after testing our method with many different parameters, we conclude that our algorithm can successfully infer the model.

A common challenge that our method and many related ones encounter is that the procedure fails if the system synchronizes too quickly. One remedy is to have a sufficient number of observations during the desynchronized transients. As we demonstrate, one can ensure that this data is available by repeating experiments with multiple initial conditions or by adjusting the model parameters to inhibit synchronization by instituting large frequency variability ( $\sigma$ ) or dynamic noise. An alternative remedy is to move the system away from synchrony using perturbations, preferably ones that are physically realizable. The introduction of these perturbations provides useful transient data. By perturbing a sufficiently large subset of oscillators with a large enough change, we are able to infer the model accurately. Our hope is that this method will be adopted by experimentalists and used with experimental data to aid with the construction of interpretable models for the dynamics of networks of coupled oscillators.

Although the numerical experiments outlined here involve Erdös-Rényi networks, the method can be applied more generally to a broad class of networks. Preliminary testing suggests that similar results can be obtained for other topologies such as star, small-world, scale-free networks, and clique networks.

It is also straightforward to extend this approach to other models for coupled phase oscillators such as the Winfree model [42] or even more general oscillator models such as the Stuart-Landau model [43] in which both phase and amplitude variations are permitted. Indeed, any system where the unknown functions are periodic can be represented using our technique.

For models containing unknown functions that are not periodic such as the Hodgkin-Huxley model [44], a Fourier series representation is not possible. In these cases, one could represent the coupling functions using feed-forward neural networks, which are capable of representing continuous functions to arbitrary accuracy using a finite number of parameters [45]. Given a sufficiently rich data set, one could still use our approach with back-propagation to learn the structure of the neural network approximation for the coupling function.

This material is based upon work supported by the Mathematics Research Communities of the American Mathematical Society, under National Science Foundation grant DMS-1321794. The project was initiated during the Mathematics Research Community (MRC) on Agent-based Modeling in Biological and Social Systems (2018). A follow-up visit was also supported by a collaboration travel grant from the AMS MRC program. MVC was supported by The Ohio State University President’s Postdoctoral Scholars Program and by the Mathematical Biosciences Institute at The Ohio State University. MJP is supported by Hillsdale College. CMT is supported by National Science Foundation grant DMS-1813752 and by Williams College. BX is supported by the Robert and Sara Lumpkins Endowment for Postdoctoral Fellows in Applied and Computational Math and Statistics at the University of Notre Dame. We are grateful to Henry Adams, Kelsey Houston-Edwards, and Lori Ziegelmeier for contributions during the formative stages of this work.

Appendix A Coupling functions

The coupling functions investigated are described in table 4. The first three classes of functions were selected due to their use in popular coupled oscillator models. The last was an example of a generic coupling function with higher order harmonics that was tested to verify that reconstruction of the adjacency matrix is possible even with an imperfect approximation to the coupling function.

Appendix B Evaluation metrics for classification

The $F_{1}$ * score* is defined as

[TABLE]

Here, $\mathrm{precision}$ is the fraction of true positives among all inferred positives, while $\mathrm{recall}$ is the fraction of true positives among all positives. The $F_{1}$ score is a value between [math] and $1$ ; when reporting the error rate for the reconstructed adjacency matrix, we use the threshold $\epsilon=\epsilon_{max}$ that corresponds to the largest $F_{1}$ score. We also determine the interval of thresholds that yield $F_{1}$ scores within 90% of this largest value. The width of this interval is reported in the tables in C as “Interval width $(>90\%)$ ”.

The error rate is defined as the percentage of entries in $A$ that are incorrectly classified. We report the error rate for the optimal threshold $\epsilon$ discussed in section 2.4.

The ROC curve, or receiver operating characteristic, is a parametric curve that uses the classification threshold as a parameter and uses the true positive rate and the false positive rate as the the variables [46]. The area under the ROC curve measures the quality of a classifier independent any particular threshold. A value of $1/2$ is consistent with blind guesses, and a value of $1$ indicates a perfect classifier since it implies the existence of a threshold for which the rate of true positives is $1$ and the rate of false positives is [math]. Our ROC curves were generated using the scikit-learn package in Python [47].

Appendix C Tables of results

Here we include several tables which report the averaged numerical results of the experimental parameter sweeps described in section 3 and reported on in section 4. The performance metrics used are defined in B. Values given in the tables represent the mean plus or minus one standard deviation of the corresponding performance metric over the 30 trials run at that parameter value. We note that some of the distributions, such as the error rate (which is nonnegative by definition), are skewed.

Bibliography47

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Trautwein W and Kassebaum D G 1961 J. Gen. Phys. 45 317–330
2[2] Stagner J I, Samols E and Weir G C 1980 J. Clin. Invest. 65 939–942
3[3] Buzsaki G and Draguhn A 2004 Science 304 1926–1929
4[4] Buck J 1988 Q. Rev. Biol. 63 265–289
5[5] Tinsley M R, Nkomo S and Showalter K 2012 Nat. Phys. 8 662–665
6[6] Hadley P, Beasley M and Wiesenfeld K 1988 Phys. Rev. B 38 8712–8719
7[7] Motter A E, Myers S A, Anghel M and Nishikawa T 2013 Nat. Phys. 9 191–197
8[8] Pantaleone J 2002 Am. J. Phys. 70 992–1000