A cooperative game for automated learning of elasto-plasticity knowledge   graphs and models with AI-guided experimentation

Kun Wang; WaiChing Sun; Qiang Du

arXiv:1903.04307·cs.LG·April 15, 2020

A cooperative game for automated learning of elasto-plasticity knowledge graphs and models with AI-guided experimentation

Kun Wang, WaiChing Sun, Qiang Du

PDF

TL;DR

This paper presents a multi-agent AI framework that automates the discovery and optimization of elasto-plastic material models through graph-based modeling, reinforcement learning, and AI-guided experimentation, emulating scientific collaboration.

Contribution

It introduces a novel graph-theoretic approach combined with reinforcement learning for automated model selection and experiment design in material science.

Findings

01

Successful automatic generation of constitutive models

02

Effective optimization of experimental design

03

Demonstration of AI-guided collaboration in modeling

Abstract

We introduce a multi-agent meta-modeling game to generate data, knowledge, and models that make predictions on constitutive responses of elasto-plastic materials. We introduce a new concept from graph theory where a modeler agent is tasked with evaluating all the modeling options recast as a directed multigraph and find the optimal path that links the source of the directed graph (e.g. strain history) to the target (e.g. stress) measured by an objective function. Meanwhile, the data agent, which is tasked with generating data from real or virtual experiments (e.g. molecular dynamics, discrete element simulations), interacts with the modeling agent sequentially and uses reinforcement learning to design new experiments to optimize the prediction capacity. Consequently, this treatment enables us to emulate an idealized scientific collaboration as selections of the optimal choices in a…

Tables1

Table 1. Table 1: Five classes of the constitutive models generated during the deep reinforcement learning.

Model Class	Number of Models	Mean Score	Standard deviation	Generalized Plasticity ’GP’	Critical State ’CS’	Classical pressure dependent elasto-plasticity ’DP’	Others ’O’
1	22	0.603	0.054	✓	✓
2	25	0.565	0.051	✓
3	13	0.295	0.028		✓	✓
4	19	0.450	0.086			✓
5	33	0.163	0.063				✓

Equations112

T_{n}

T_{n}

T_{t}

\overline{Δ}

\overline{Δ}

\overline{T} (\overline{Δ})

\overline{Δ}

\overline{Δ}

\overline{T} (\overline{Δ})

T_{n}

T_{n}

T_{t}

ϕ^{f} = ϕ_{o}^{f} (1 + Δ_{n} Δ_{t})

ϕ^{f} = ϕ_{o}^{f} (1 + Δ_{n} Δ_{t})

V

V

E

E_{1}

E_{2}

E_{3}

L_{V}

L_{E}

l_{i} maximize

l_{i} maximize

f_{i} (i_{i}) = 0, i = 1, \dots, m .

r (s) = {1, if T_{c} \in T_{c}^{max} and COST (T_{c}) \leq COST (\forall T_{c}^{i} \in T_{c}^{max}) 0, otherwise,

r (s) = {1, if T_{c} \in T_{c}^{max} and COST (T_{c}) \leq COST (\forall T_{c}^{i} \in T_{c}^{max}) 0, otherwise,

\dot{ϵ} = \overset{ϵ}{˙}_{11} sym \overset{ϵ}{˙}_{12} \overset{ϵ}{˙}_{22} \overset{ϵ}{˙}_{13} \overset{ϵ}{˙}_{23} \overset{ϵ}{˙}_{33}, \dot{σ} = \overset{σ}{˙}_{11} sym \overset{σ}{˙}_{12} \overset{σ}{˙}_{22} \overset{σ}{˙}_{13} \overset{σ}{˙}_{23} \overset{σ}{˙}_{33} .

\dot{ϵ} = \overset{ϵ}{˙}_{11} sym \overset{ϵ}{˙}_{12} \overset{ϵ}{˙}_{22} \overset{ϵ}{˙}_{13} \overset{ϵ}{˙}_{23} \overset{ϵ}{˙}_{33}, \dot{σ} = \overset{σ}{˙}_{11} sym \overset{σ}{˙}_{12} \overset{σ}{˙}_{22} \overset{σ}{˙}_{13} \overset{σ}{˙}_{23} \overset{σ}{˙}_{33} .

Δ σ_{n + 1}

Δ σ_{n + 1}

Δ ϵ_{n + 1}^{e}

Δ ϵ_{n + 1}^{p}

Δ λ_{n + 1}

{Δ σ_{n + 1}^{e} : n_{n}^{l o a d} Δ σ_{n + 1}^{e} : n_{n}^{l o a d} \neq = 0 \to plastic loading = 0 \to elastic loading,

{Δ σ_{n + 1}^{e} : n_{n}^{l o a d} Δ σ_{n + 1}^{e} : n_{n}^{l o a d} \neq = 0 \to plastic loading = 0 \to elastic loading,

{f (σ_{n} + Δ σ_{n + 1}^{e}, q_{n}^{p i v} (ξ_{n}^{p i v})) f (σ_{n} + Δ σ_{n + 1}^{e}, q_{n}^{p i v} (ξ_{n}^{p i v})) > 0 \to plastic loading \leq 0 \to elastic loading,

{f (σ_{n} + Δ σ_{n + 1}^{e}, q_{n}^{p i v} (ξ_{n}^{p i v})) f (σ_{n} + Δ σ_{n + 1}^{e}, q_{n}^{p i v} (ξ_{n}^{p i v})) > 0 \to plastic loading \leq 0 \to elastic loading,

\leavevmode t o 14.18 pt \vbox t o 14.18 pt \pgfpicture \makeatletter \lower -7.09111ptto0.0pt \pgfsys@beginscope \pgfsys@invoke \definecolor pgfstrokecolor rgb 0,0,0 \pgfsys@color@rgb@stroke 000 \pgfsys@invoke \pgfsys@color@rgb@fill 000 \pgfsys@invoke \pgfsys@setlinewidth 0.4pt \pgfsys@invoke \nullfont to0.0pt \pgfsys@beginscope \pgfsys@invoke \pgfsys@beginscope \pgfsys@invoke \pgfsys@moveto 6.89111pt 0.0pt \pgfsys@curveto 6.89111pt 3.8059pt 3.8059pt 6.89111pt 0.0pt 6.89111pt \pgfsys@curveto -3.8059pt 6.89111pt -6.89111pt 3.8059pt -6.89111pt 0.0pt \pgfsys@curveto -6.89111pt -3.8059pt -3.8059pt -6.89111pt 0.0pt -6.89111pt \pgfsys@curveto 3.8059pt -6.89111pt 6.89111pt -3.8059pt 6.89111pt 0.0pt \pgfsys@closepath \pgfsys@moveto 0.0pt 0.0pt \pgfsys@stroke \pgfsys@invoke \pgfsys@beginscope \pgfsys@invoke \pgfsys@transformcm 1.0 0.0 0.0 1.0 -2.5pt -3.22221pt \pgfsys@invoke \definecolor pgfstrokecolor rgb 0,0,0 \pgfsys@color@rgb@stroke 000 \pgfsys@invoke \pgfsys@color@rgb@fill 000 \pgfsys@invoke 5 \pgfsys@invoke \lxSVG@closescope \pgfsys@endscope \pgfsys@invoke \lxSVG@closescope \pgfsys@endscope \pgfsys@invoke \lxSVG@closescope \pgfsys@endscope \hss \pgfsys@discardpath \pgfsys@invoke \lxSVG@closescope \pgfsys@endscope \hss \lxSVG@closescope \endpgfpicture ⎩ ⎨ ⎧ λ_{n} \overset{ϵ}{ˉ}_{n}^{p} \overset{ϵ}{ˉ}_{v_{n}}^{p} \overset{ϵ}{ˉ}_{s_{n}}^{p} e_{n} = \int_{0}^{t_{n}} \dot{λ} d t = \int_{0}^{t_{n}} ∣∣ \dot{ϵ^{p}} ∣∣ d t = \int_{0}^{t_{n}} tr (\dot{ϵ^{p}}) d t = \int_{0}^{t_{n}} ∣∣ \dot{ϵ^{p}} - \frac{1}{3} tr (\dot{ϵ^{p}}) ∣∣ d t = e_{0} + \int_{0}^{t_{n}} \overset{e}{˙} d t = e_{0} + \int_{0}^{t_{n}} (1 + e) \overset{ϵ}{˙}_{v} d t,

\leavevmode t o 14.18 pt \vbox t o 14.18 pt \pgfpicture \makeatletter \lower -7.09111ptto0.0pt \pgfsys@beginscope \pgfsys@invoke \definecolor pgfstrokecolor rgb 0,0,0 \pgfsys@color@rgb@stroke 000 \pgfsys@invoke \pgfsys@color@rgb@fill 000 \pgfsys@invoke \pgfsys@setlinewidth 0.4pt \pgfsys@invoke \nullfont to0.0pt \pgfsys@beginscope \pgfsys@invoke \pgfsys@beginscope \pgfsys@invoke \pgfsys@moveto 6.89111pt 0.0pt \pgfsys@curveto 6.89111pt 3.8059pt 3.8059pt 6.89111pt 0.0pt 6.89111pt \pgfsys@curveto -3.8059pt 6.89111pt -6.89111pt 3.8059pt -6.89111pt 0.0pt \pgfsys@curveto -6.89111pt -3.8059pt -3.8059pt -6.89111pt 0.0pt -6.89111pt \pgfsys@curveto 3.8059pt -6.89111pt 6.89111pt -3.8059pt 6.89111pt 0.0pt \pgfsys@closepath \pgfsys@moveto 0.0pt 0.0pt \pgfsys@stroke \pgfsys@invoke \pgfsys@beginscope \pgfsys@invoke \pgfsys@transformcm 1.0 0.0 0.0 1.0 -2.5pt -3.22221pt \pgfsys@invoke \definecolor pgfstrokecolor rgb 0,0,0 \pgfsys@color@rgb@stroke 000 \pgfsys@invoke \pgfsys@color@rgb@fill 000 \pgfsys@invoke 5 \pgfsys@invoke \lxSVG@closescope \pgfsys@endscope \pgfsys@invoke \lxSVG@closescope \pgfsys@endscope \pgfsys@invoke \lxSVG@closescope \pgfsys@endscope \hss \pgfsys@discardpath \pgfsys@invoke \lxSVG@closescope \pgfsys@endscope \hss \lxSVG@closescope \endpgfpicture ⎩ ⎨ ⎧ λ_{n} \overset{ϵ}{ˉ}_{n}^{p} \overset{ϵ}{ˉ}_{v_{n}}^{p} \overset{ϵ}{ˉ}_{s_{n}}^{p} e_{n} = \int_{0}^{t_{n}} \dot{λ} d t = \int_{0}^{t_{n}} ∣∣ \dot{ϵ^{p}} ∣∣ d t = \int_{0}^{t_{n}} tr (\dot{ϵ^{p}}) d t = \int_{0}^{t_{n}} ∣∣ \dot{ϵ^{p}} - \frac{1}{3} tr (\dot{ϵ^{p}}) ∣∣ d t = e_{0} + \int_{0}^{t_{n}} \overset{e}{˙} d t = e_{0} + \int_{0}^{t_{n}} (1 + e) \overset{ϵ}{˙}_{v} d t,

\leavevmode t o 14.18 pt \vbox t o 14.18 pt \pgfpicture \makeatletter \lower -7.09111ptto0.0pt \pgfsys@beginscope \pgfsys@invoke \definecolor pgfstrokecolor rgb 0,0,0 \pgfsys@color@rgb@stroke 000 \pgfsys@invoke \pgfsys@color@rgb@fill 000 \pgfsys@invoke \pgfsys@setlinewidth 0.4pt \pgfsys@invoke \nullfont to0.0pt \pgfsys@beginscope \pgfsys@invoke \pgfsys@beginscope \pgfsys@invoke \pgfsys@moveto 6.89111pt 0.0pt \pgfsys@curveto 6.89111pt 3.8059pt 3.8059pt 6.89111pt 0.0pt 6.89111pt \pgfsys@curveto -3.8059pt 6.89111pt -6.89111pt 3.8059pt -6.89111pt 0.0pt \pgfsys@curveto -6.89111pt -3.8059pt -3.8059pt -6.89111pt 0.0pt -6.89111pt \pgfsys@curveto 3.8059pt -6.89111pt 6.89111pt -3.8059pt 6.89111pt 0.0pt \pgfsys@closepath \pgfsys@moveto 0.0pt 0.0pt \pgfsys@stroke \pgfsys@invoke \pgfsys@beginscope \pgfsys@invoke \pgfsys@transformcm 1.0 0.0 0.0 1.0 -2.5pt -3.22221pt \pgfsys@invoke \definecolor pgfstrokecolor rgb 0,0,0 \pgfsys@color@rgb@stroke 000 \pgfsys@invoke \pgfsys@color@rgb@fill 000 \pgfsys@invoke 6 \pgfsys@invoke \lxSVG@closescope \pgfsys@endscope \pgfsys@invoke \lxSVG@closescope \pgfsys@endscope \pgfsys@invoke \lxSVG@closescope \pgfsys@endscope \hss \pgfsys@discardpath \pgfsys@invoke \lxSVG@closescope \pgfsys@endscope \hss \lxSVG@closescope \endpgfpicture ⎩ ⎨ ⎧ p_{n} q_{n} θ_{n} = \frac{tr ( σ _{n} )}{3} = 3 J_{2} = \frac{3}{2} ∣∣ s_{n} ∣∣ = \frac{1}{3} sin^{- 1} (- \frac{3 3}{2} \frac{J _{3}}{J _{2}^{3/2}}), - \frac{π}{6} \leq θ \leq \frac{π}{6}

\leavevmode t o 14.18 pt \vbox t o 14.18 pt \pgfpicture \makeatletter \lower -7.09111ptto0.0pt \pgfsys@beginscope \pgfsys@invoke \definecolor pgfstrokecolor rgb 0,0,0 \pgfsys@color@rgb@stroke 000 \pgfsys@invoke \pgfsys@color@rgb@fill 000 \pgfsys@invoke \pgfsys@setlinewidth 0.4pt \pgfsys@invoke \nullfont to0.0pt \pgfsys@beginscope \pgfsys@invoke \pgfsys@beginscope \pgfsys@invoke \pgfsys@moveto 6.89111pt 0.0pt \pgfsys@curveto 6.89111pt 3.8059pt 3.8059pt 6.89111pt 0.0pt 6.89111pt \pgfsys@curveto -3.8059pt 6.89111pt -6.89111pt 3.8059pt -6.89111pt 0.0pt \pgfsys@curveto -6.89111pt -3.8059pt -3.8059pt -6.89111pt 0.0pt -6.89111pt \pgfsys@curveto 3.8059pt -6.89111pt 6.89111pt -3.8059pt 6.89111pt 0.0pt \pgfsys@closepath \pgfsys@moveto 0.0pt 0.0pt \pgfsys@stroke \pgfsys@invoke \pgfsys@beginscope \pgfsys@invoke \pgfsys@transformcm 1.0 0.0 0.0 1.0 -2.5pt -3.22221pt \pgfsys@invoke \definecolor pgfstrokecolor rgb 0,0,0 \pgfsys@color@rgb@stroke 000 \pgfsys@invoke \pgfsys@color@rgb@fill 000 \pgfsys@invoke 6 \pgfsys@invoke \lxSVG@closescope \pgfsys@endscope \pgfsys@invoke \lxSVG@closescope \pgfsys@endscope \pgfsys@invoke \lxSVG@closescope \pgfsys@endscope \hss \pgfsys@discardpath \pgfsys@invoke \lxSVG@closescope \pgfsys@endscope \hss \lxSVG@closescope \endpgfpicture ⎩ ⎨ ⎧ p_{n} q_{n} θ_{n} = \frac{tr ( σ _{n} )}{3} = 3 J_{2} = \frac{3}{2} ∣∣ s_{n} ∣∣ = \frac{1}{3} sin^{- 1} (- \frac{3 3}{2} \frac{J _{3}}{J _{2}^{3/2}}), - \frac{π}{6} \leq θ \leq \frac{π}{6}

n^{l o a d} = \frac{\partial f}{\partial σ} ∣∣ \frac{\partial f}{\partial σ} ∣ ∣^{- 1},

n^{l o a d} = \frac{\partial f}{\partial σ} ∣∣ \frac{\partial f}{\partial σ} ∣ ∣^{- 1},

n^{l o a d} = n_{v}^{l o a d} n_{v} + n_{s}^{l o a d} n_{s} .

n^{l o a d} = n_{v}^{l o a d} n_{v} + n_{s}^{l o a d} n_{s} .

⎩ ⎨ ⎧ n_{v} n_{s} = \frac{\partial p}{\partial σ} = \frac{1}{3} I = \frac{\partial q}{\partial σ} = \frac{3}{2 J _{2}} S .

⎩ ⎨ ⎧ n_{v} n_{s} = \frac{\partial p}{\partial σ} = \frac{1}{3} I = \frac{\partial q}{\partial σ} = \frac{3}{2 J _{2}} S .

m^{f l o w} = \frac{\partial g}{\partial σ} ∣∣ \frac{\partial g}{\partial σ} ∣ ∣^{- 1} .

m^{f l o w} = \frac{\partial g}{\partial σ} ∣∣ \frac{\partial g}{\partial σ} ∣ ∣^{- 1} .

m^{f l o w} = m_{v}^{f l o w} n_{v} + m_{s}^{f l o w} n_{s} .

m^{f l o w} = m_{v}^{f l o w} n_{v} + m_{s}^{f l o w} n_{s} .

SCORE = F (A_{1}, A_{2}, A_{3}, ..., A_{n}),

SCORE = F (A_{1}, A_{2}, A_{3}, ..., A_{n}),

SCORE = (j = 1 \prod n_{crit} A_{j}^{crit}) \cdot (i = 1 \sum n_{pfm} w_{i} A_{i}^{pfm}),

SCORE = (j = 1 \prod n_{crit} A_{j}^{crit}) \cdot (i = 1 \sum n_{pfm} w_{i} A_{i}^{pfm}),

MSE_{i} = \frac{1}{N _{feature}} j = 1 \sum N_{feature} [S_{j} (Y_{i_{j}}^{data}) - S_{j} (Y_{i_{j}}^{model})]^{2},

MSE_{i} = \frac{1}{N _{feature}} j = 1 \sum N_{feature} [S_{j} (Y_{i_{j}}^{data}) - S_{j} (Y_{i_{j}}^{model})]^{2},

F_{N} (MSE) = ⎩ ⎨ ⎧ 0, \frac{r}{N}, 1, MSE < MSE_{1}, MSE_{r} \leq MSE < MSE_{r + 1}, r = 1, ..., N - 1, MSE_{N} \leq MSE,

F_{N} (MSE) = ⎩ ⎨ ⎧ 0, \frac{r}{N}, 1, MSE < MSE_{1}, MSE_{r} \leq MSE < MSE_{r + 1}, r = 1, ..., N - 1, MSE_{N} \leq MSE,

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

A cooperative game for automated learning of elasto-plasticity knowledge graphs and models with AI-guided experimentation

Kun Wang Department of Civil Engineering and Engineering Mechanics, Columbia University, New York, NY 10027. [email protected]

WaiChing Sun Department of Civil Engineering and Engineering Mechanics, Columbia University, New York, NY 10027. [email protected] (corresponding author)

Qiang Du Department of Applied Physics and Applied Mathematics, and Data Science Institute, Columbia University, New York, NY 10027. [email protected]

Abstract

We introduce a multi-agent meta-modeling game to generate data, knowledge, and models that make predictions on constitutive responses of elasto-plastic materials. We introduce a new concept from graph theory where a modeler agent is tasked with evaluating all the modeling options recast as a directed multigraph and find the optimal path that links the source of the directed graph (e.g. strain history) to the target (e.g. stress) measured by an objective function. Meanwhile, the data agent, which is tasked with generating data from real or virtual experiments (e.g. molecular dynamics, discrete element simulations), interacts with the modeling agent sequentially and uses reinforcement learning to design new experiments to optimize the prediction capacity. Consequently, this treatment enables us to emulate an idealized scientific collaboration as selections of the optimal choices in a decision tree search done automatically via deep reinforcement learning.

1 Introduction

In single-physics solid mechanics problems, the balance of linear momentum is often used to provide constraints for the motion of a body in the space-time continuum, while a constitutive law is often supplied to replicate constitutive responses at a selected material point of the body. Many successful commercial and open-source codes now introduce mechanisms or gateways that simplify the incorporation of material point constitutive models into predefined solid mechanics solvers (e.g. UMAT in ABAQUS) (Hibbitt et al., 2001; Rutqvist et al., 2011; Sun et al., 2013b; Sun, 2015; Salinger et al., 2016; Wollny et al., 2017; Na and Sun, 2018; Choo and Sun, 2018b, a). Once a constitutive law is formulated, algorithms are then designed to approximate the mathematical model such that a computer can be used to run simulations. The algorithms that approximate or enforce the constitutive laws are then verified, validated and eventually used in engineering practice (Kirchdoerfer and Ortiz, 2016; Ibañez et al., 2018; Wang and Sun, 2018).

Conventionally, a constitutive model that replicates the relation between the kinetic and kinematics quantities is derived from a finite set of fundamental principles, assumptions and phenomenological equations (Truesdell, 1959; Truesdell and Noll, 2004). For instance, the laws of thermodynamics, material frame indifference, and balance laws are universal principles that are widely believed to be true for all materials under common circumstances. After enforcing those universal principles, there often remains a finite set of choices a modeler can make to construct a constitutive model. In particular, different types of experiments are designed such that a proper set of additional constraints can be generated. These constraints may not be fully explained by universal principles but are added to ensure the compatibility between observed and simulated mechanical responses. In reality, the universal principles alone are insufficient to complete most of the constitutive laws, regardless of the spatial scales they are designed for. As a result, phenomenological relations are introduced such that all the constraints imposed by principles and observations can be enforced.

1.1 Rationales of phenomenological relations

Even though phenomenological relations cannot be fully justified via universal-principle arguments, it is understandable that proposers of these phenomenological relations often seek justifications by introducing new theories or incorporating microstructural information as constraints. For instance, the most commonly used family of soil models, the critical-state plasticity, relies on the existence of a critical state line in the state path (i.e. the plot of specific volume against the natural logarithmic of effective mean pressure) such that soil in the numerical simulations may develop plastic shear strain without volumetric deformation when reaching the critical state and exhibit the plastic dilatancy or contraction at different void ratio and overconsolidation ratio (Schofield and Wroth, 1968; Casagrande, 1976; Been et al., 1991; Ling and Yang, 2006; Sun, 2013; Na and Sun, 2017). Experimental evidences are then sought to either justify the claim (cf. Wood (1990)), or redefine the applicability of the theory in light of new evidences (c.f. Mooney et al. (1998); Li and Dafalias (2011); Zhao and Guo (2013)). The incorporation of fabric tensors in critical state plasticity is another example where sub-scale information is incorporated to enhance forward prediction quality (Li and Dafalias, 2011; Wang and Sun, 2016). Other types of information incorporated into the constitutive law may come from microstructural attributes or the kinematics of microstructures. A classical example is crystal plasticity where the kinematics of the plastic flow is restricted by the orientations of the slip systems (Asaro, 1983; Miehe and Schröder, 2001; Borja, 2013; Na and Sun, 2018).

Finally, for practical reasons or due to lack of sufficient experimental evidences to prove otherwise, assumptions are sometimes made to interpret a phenomenological relation. A classical example for this type of phenomenological approach is the effective displacement theory commonly used in traction-separation models where one assumes that a scalar kinematic measure, often a weighted norm of normal and tangential displacements, can be used to determine a scalar traction measure that leads to the traction vector (Ortiz and Pandolfi, 1999; Park and Paulino, 2011). Nevertheless, the distinction between phenomenology that only enhances curve-fitting in calibration and the counterpart that leads to more accurate, robust and reliable predictions is often a blurred line and might be subject to debate (Truesdell and Noll, 2004; Wang et al., 2016; Wang and Sun, 2019b, a). Furthermore, the popularity of a model in the short term is also not necessarily purely based on the prediction quality, but also ties to the difficulty in calibration and interpretation of the model (Lange, 2012), the demand of experimental data (Wang et al., 2016; Olivier and Smyth, 2018), as well as the social, cultural and personal influences (cf. Malmgren et al. (2010)), among other factors . In the case where a limited subset of data might be chosen to make a constitutive law or theory sound plausible or consistent with a physical phenomenon, the true forward prediction quality of the model might take a toll while the apparent capacity of the model could be exaggerated Munafò et al. (2017). The underlying problem is that this issue is very difficult to detect unless all the models are compared objectively in the same benchmark study and subjected to a universally agreed validation metric (Boyce et al., 2014; Pack et al., 2014). Hence, a validation procedure that employs blind predictions is critical, regardless of the type of models used for predictions.

1.2 Data-driven approaches as alternatives

An alternative to the conventional modeling approach is the data-driven modeling in which constitutive responses are predicted primarily based on the available data either by black-box neural networks (Furukawa and Yagawa, 1998; Ghaboussi et al., 1998; Lefik and Schrefler, 2003; Wang and Sun, 2017, 2018) or via minimization problems in the phase space (Kirchdoerfer and Ortiz, 2016). While the latter approach, as outlined in Kirchdoerfer and Ortiz (2016) and Kirchdoerfer and Ortiz (2017), has shown great promises to handle hyperelasticity problems, the extension to plasticity problems likely requires either imposing further constraints (e.g. perfect plasticity Ibañez et al. (2018)) or creating a sufficiently large database to capture the phase space of the history-dependent responses. On the other hand, Lefik and Schrefler (2003) has demonstrated that a neural network can generate cyclic elasto-plastic responses with some level of success. Nevertheless, despite the fact that a multi-layer neural network can be considered as a universal approximator, as pointed out in Hornik et al. (1989), this does not imply that the training of the neural network is always successful. In fact, failure to complete the training is quite common and it might be caused by, for example, (1) higher demand of data for the neural network training compared to the material parameter identification in conventional modeling, (2) the curse of high dimensionality that leads to inconsistency between calibration and forward prediction performance, (3) issues related to under-fitting and over-fitting, and (4) the vanishing gradient issues that make the algorithm unable to locate the global minimizer of the loss function (Wang and Sun, 2018). Furthermore, without special treatment to extend the database for training the neural network, the resultant models often exhibit dependence on coordinate systems. Even though this issue has been addressed recently using the spectral decomposition of tensorial inputs and outputs in recurrent neural networks (Wang and Sun, 2018), this lack of consistency with theory indicates that the domain expertise remains critical for evaluating the quality of the machine learning model and finding remedies for issues not immediately apparent for nonspecialists.

In the aforementioned data-driven approach, the demand for big data remains an ongoing challenge (Smith et al., 2016; Tang et al., 2018; Liu et al., 2018). In particular, machine learning models, especially those in most generic forms (i.e. model-free approaches), may suffer a lack of constraints imposed by material theory, thus increasing the demand for data to generate the constraints. Hence, it is important for modelers to be able to estimate the least amount of data required to complete the training of a specific model. The introduction of the two-player cooperative game in this paper can provide a practical solution to find the required amount and type of data for path-dependent materials.

1.3 The hybridized theoretical/data-driven approach

In this paper, our goal is to (1) introduce a meta-modeling method to generate algorithms that hybridize theory, phenomenological relations, and universal principles to automatically generate constitutive laws to fulfill a specific objective defined by the loss (objective function) in a quantitatively optimal manner and (2) incorporate the reinforcement learning technique to select experiments that lead to improvement in prediction capacity. We do not limit ourselves to the approach in which the neural network model is either used to replace the entire constitutive law or not being used at all (cf. (Ghaboussi et al., 1991; Kirchdoerfer and Ortiz, 2016; Wang and Sun, 2018)). Instead, our goal is to find the optimal way out of all the possible choices to construct a constitutive law for a given material data.

To reach our goal, we employ two techniques of discrete mathematics that are less commonly used in computational mechanics, the directed multigraph and decision tree learning. First, the directed multigraph is used to recast the available choices of constitutive laws as a family of possible ways to configure a graph of information flow from the upstream (the source or input, such as the relative displacement or strain) to the downstream (the target or output, such as the traction or stress). A model is a path (in the terminology of graph theory) of this directed multigraph that optimizes an objective function. As such, a model is associated with a collection of physical quantities (vertices in the directed graph) linked by either mathematical expressions or machine learning models that connect the upstream to the downstream (edges in the directed graph) (cf. Wang and Sun (2018)).

Within our framework, a black-box neural networks model, for instance, is simply a model in which there are no human-interpretable quantities connecting the input and output. Many classical neural network models such as Ghaboussi et al. (1991); Lefik and Schrefler (2003) and Wang and Sun (2018) all belong to this category, as neurons are the only media that propagate the information flow. Meanwhile, a classical theory-based constitutive law can be viewed as a directed graph (or a particular path of the directed multigraph) in which all the edges are mappings that can be written as mathematical expressions formulated by human. On the other hand, a hybridized model could have a subset of neural network edges while having the rest edges theoretically based.

Since the optimal configuration of the directed graph for a given problem and the corresponding objective function is not known a priori, we introduce mechanisms to hierarchically explore the possible modeling choices using a decision tree. A decision tree is simply an explicit representation of all possible scenarios such that the sequence of decisions (in our case the modeling choices and data explorations) is evaluated by an agent who then takes account of the possible observations (e.g. experimental observations), and state changes (e.g. the changes of validation metrics or loss function values) to estimate the best choices.

In this work, our major contributions are threefold. First, we introduce the concept of directed multigraph to enable the hybridization of theory-based and data-driven models to yield optimal forward predictions. Second, we create a model to emulate the process of formulating constitutive laws as an optimization problem for modeling choices, rather than performances. This treatment gives us hierarchical information that helps understand the causal connection among events and mechanisms. The importance of the usage of multigraph is that it enables us to form complex idea, knowledge, prediction, inference and response with a rather small set of simple elements. This kind of application of the principle of combinatorial generalization has long been regarded as the key signature of intelligence Chomsky (1965); Humboldt (1999); Battaglia et al. (2018). Third, we also introduce a cooperative mechanism to integrate the data exploration into the modeling process. In this way, the framework can not only generate constitutive models to make the best predictions among the limited data, but also estimate the most efficient way to select experiments such that the most needed information is included to generate the knowledge closure.

1.4 Content organization

The rest of the paper is organized as follows. We first introduce the meta-modeling cooperative game, including the method to recast a model as directed multigraph, and the generation of decision tree (Section 2). Following this, we will introduce the detailed design of the data collection/meta-modeling game for modeling the collaboration of the AI data agent and the AI modeler agent (Section 3). In Section 4, we then review the multi-agent reinforcement learning algorithms that enable us to find the optimal decision for constitutive models, as well as the corresponding optimal actions the data agent takes to maximize the prediction quality of the AI-generated model. We then present numerical experiments to assess the accuracy and robustness of the blind predictions of the model generated via our meta-modeling algorithm operated on the directed multigraph. To check whether our approach is able to deal with a wide spectrum of situations and can be generalized for different materials, the multigraph meta-modeling algorithm is tested with distinctive types of data (e.g. synthetic data from elasto-plastic models and discrete element simulations). To aid the reproducibility of our numerical experiments by the third party, these data will be open source upon the publication of this article.

1.5 Notations and terminologies

For convenience, we provide a minimal review of the essential terminologies and concepts from graph theory that are used throughout this paper. Their definitions can be found in, for instance, Graham et al. (1989); West et al. (2001); Bang-Jensen and Gutin (2008).

Definition 1.

A $\boldsymbol{n}$ -tuple is a sequence or ordered list that consists of $n$ element where $n$ is a non-negative integer and that (unlike a set) may contain multiple instances of the same element.

Definition 2.

A directed graph (digraph) is an ordered pair (2-tuple) $\mathbb{G}=(\mathbb{V},\mathbb{E})$ where $\mathbb{V}$ is a nonempty set of vertices and $\mathbb{E}$ is a set of ordered pairs of vertices (directed edges) where each edge in $\mathbb{E}$ connects a pair of source (beginning) and target (end) vertices in a specific direction. Both vertices connected by an edge in $\mathbb{E}$ must be elements of $\mathbb{V}$ and the edge connecting them must be unique.

Definition 3.

A directed acyclic graph is a directed graph where edges do not form any directed cycle. In a directed acyclic graph, there is no path that can start from a vertex and eventually loop back to the same vertex.

Definition 4.

A directed multigraph with a distinctive edge identity (also called multi digraph) is an ordered 4-tuple $\mathbb{G}=(\mathbb{V},\mathbb{E},\boldsymbol{s},\boldsymbol{t})$ where $\mathbb{V}$ is a set of vertices, $\mathbb{E}$ is a set of edges that connect source and target vertices, $\boldsymbol{s}:\mathbb{E}\rightarrow\mathbb{V}$ is a mapping that maps each edge to its source node, and $\boldsymbol{t}:\mathbb{E}\rightarrow\mathbb{V}$ is a mapping that maps each edge to its target node.

Definition 5.

An underlying graph $\mathbb{U}$ of a directed multigraph $\mathbb{G}$ is a multigraph whose edges are without directions.

Definition 6.

A subgraph $\mathbb{G}^{\prime}$ of a directed multigraph $\mathbb{G}$ is a directed multigraph whose vertex set $\mathbb{V}^{\prime}$ is a subset of $\mathbb{V}$ ( $\mathbb{V}^{\prime}\subseteq\mathbb{V}$ ), and whose edge set $\mathbb{E}^{\prime}$ is a subset of $\mathbb{E}$ ( $\mathbb{E}^{\prime}\subseteq\mathbb{E}$ ).

Definition 7.

A labeled directed multigraph is a directed multigraph with labeled vertices and edges which can be mathematically expressed as an 8-tuple $\mathbb{G}=(\mathbb{L_{V}},\mathbb{L_{E}},\mathbb{V},\mathbb{E},\boldsymbol{s},\boldsymbol{t},\boldsymbol{n_{V}},\boldsymbol{n_{E}})$ where $\mathbb{V}$ and $\mathbb{E}$ are the sets of vertices and edges, $\mathbb{L_{V}}$ and $\mathbb{L_{E}}$ are the sets of labels for the vertices and edges, $s:\mathbb{E}\rightarrow\mathbb{V}$ and $t:\mathbb{E}\rightarrow\mathbb{V}$ are the mappings that map the edges to the source and target vetrices, $n_{V}:\mathbb{V}\rightarrow\mathbb{L_{V}}$ and $n_{E}:\mathbb{E}\rightarrow\mathbb{L_{E}}$ are the mappings that give the vertices and edges the corresponding labels in $\mathbb{L_{V}}$ and $\mathbb{L_{E}}$ accordingly.

As for notations and symbols, bold-faced letters denote tensors (including vectors which are rank-one tensors); the symbol ’ $\cdot$ ’ denotes a single contraction of adjacent indices of two tensors (e.g. $\boldsymbol{a}\cdot\boldsymbol{b}=a_{i}b_{i}$ or $\boldsymbol{c}\cdot\boldsymbol{d}=c_{ij}d_{jk}$ ); the symbol ‘:’ denotes a double contraction of adjacent indices of tensor of rank two or higher ( e.g. $\boldsymbol{C}:\boldsymbol{\epsilon^{e}}$ = $C_{ijkl}\epsilon_{kl}^{e}$ ); the symbol ‘ $\otimes$ ’ denotes a juxtaposition of two vectors (e.g. $\boldsymbol{a}\otimes\boldsymbol{b}=a_{i}b_{j}$ ) or two symmetric second order tensors (e.g. $(\boldsymbol{\alpha}\otimes\boldsymbol{\beta})_{ijkl}=\alpha_{ij}\beta_{kl}$ ). Moreover, $(\boldsymbol{\alpha}\oplus\boldsymbol{\beta})_{ijkl}=\alpha_{jl}\beta_{ik}$ and $(\boldsymbol{\alpha}\ominus\boldsymbol{\beta})_{ijkl}=\alpha_{il}\beta_{jk}$ . We also define identity tensors $(\boldsymbol{I})_{ij}=\delta_{ij}$ , $(\boldsymbol{I}^{4})_{ijkl}=\delta_{ik}\delta_{jl}$ , and $(\boldsymbol{I}^{4}_{\text{sym}})_{ijkl}=\frac{1}{2}(\delta_{ik}\delta_{jl}+\delta_{il}\delta_{kj})$ , where $\delta_{ij}$ is the Kronecker delta. As for sign conventions, unless specified otherwise, we consider the direction of the tensile stress and dilative pressure as positive.

2 Meta-modeling: deriving material laws from a directed multigraph

In this section, we describe the concepts behind the proposed automated meta-modeling procedure and the mechanism of the learning process. The key departures of our newly proposed method via the neural network approaches for constitutive laws (e.g. Ghaboussi et al. (1991, 1998); Lefik and Schrefler (2002, 2003); Wang and Sun (2018)) is the introduction of labeled directed multigraph that represents all possible theories under consideration for modeling a physical process, the acyclic directed graph that represents the most plausible knowledge on the relationships among physical quantities, and the data agent which enables users to estimate the amount of data required to reach the point where additional information no longer enhances prediction capacity for a given action space. In this paper, our focus is limited to the class of materials that exhibits elasto-plastic responses while damage can be neglected. We assume that the deformation is infinitesimal and the material is under isothermal condition. The proposed methodology, however, can be extended to other more complex materials.

2.1 Material modeling algorithm as a directed multigraph

The architecture of an algorithm is often considered as a directed multigraph (Dabrowski et al., 2011). In essence, a material model can be thought as a procedure that employs organized knowledge to make predictions such that relationships of components and the universally accepted principles governs the outcomes of predictions. For instance, we may consider a traction-separation model as an information flow in a directed graph where physical attributes, such as porosity, plastic flow, permeability, are considered as vertices and their relationships are considered as edges (Wang and Sun, 2018). The input and output of the models (e.g. relative displacement history and traction) are then considered as the sources and targets of the directed graph.

However, in some circumstances, a physical relation can be modeled by more than one methods, theories or constitutive relations. To reflect the availability of options, a generalized representation of the thought process is needed when we try to use artificial intelligence algorithm to replace human to write constitute models. This generalized thought process, which we refer as metal-modeling (i.e. modeling the process of writing a model), can be recast as a labeled directed multigraph. The latter can be used where a pair of connected vertices are not necessarily connected by one edge but by multiple edges, each represents a specific model that connects two physical quantities (e.g. porosity-permeability relationship). A formal statement can be written as follows:

Possible configurations of constitutive laws as a labeled directed multigraph. Given a data set which measures a set of physical quantities defined as $\mathbb{V}$ with a corresponding set of labels $\mathbb{L_{V}}$ where $n_{\mathbb{V}}:\mathbb{V}\rightarrow\mathbb{L_{E}}$ is a bijective mapping that maps the vertices to the labels. Let $\mathbb{V}_{R}\subset\mathbb{V}$ and $\mathbb{V}_{L}\subset\mathbb{V}$ be the source(s) and target(s) of the directed multigraph. All possible ways to write constitutive laws that map the input $V_{R}$ (e.g. strain history) to output $V_{L}$ (e.g. stress) as information flow can be defined by the sets of directed edges where each edge that links two physical quantities $\mathbb{E}$ , the mappings $\boldsymbol{s}:\mathbb{E}\rightarrow\mathbb{V}$ and $\boldsymbol{t}:\mathbb{E}\rightarrow\mathbb{V}$ that provide the direction of the information flow, and the surjective mapping $\boldsymbol{n}_{\mathbb{E}}:\mathbb{E}\rightarrow\mathbb{L_{E}}$ that assigns the edge labels (names) to the edges.

Example 1.

Traction-separation Law. Given a pre-defined objective function, assume that the only known theoretical traction-separation model incorporated in the labeled directed multigraph are the Tvergaard model (cf.Tvergaard (1990)) and the Ortiz-Pandolfi model (cf. Pandolfi et al. (1999)). In addition, we also consider using a neural network that incorporates porosity to predict traction-separation relations. Define the labeled directed multi-graph that provides all the options available.

First, we convert the traction-separation laws into directed graphs where the relative displacement vector is the input and the traction is the output. Notice that both Tvergaard (1990) and Pandolfi et al. (1999) are effective displacement models where an effective displacement $\overline{\Delta}$ is used as additional input to predict the traction. In Tvergaard (1990),

[TABLE]

and the effective displacement and effective traction are scalars defined as,

[TABLE]

where $\delta_{n}$ and $\delta_{t}$ are the characteristic length corresponding to the fracture energy and cohesive strength of the normal and tangential opening modes, $\alpha$ is a non-dimensional material parameter. As pointed out in Park and Paulino (2011), the traction-separation model in Pandolfi et al. (1999) can be expressed in the forms of Eq. (1) and (2) with the alternative definition of effective displacement and traction separation law, i.e.,

[TABLE]

where $k$ is typically negative and $c$ is the effective cohesive strength. Fully we consider a neural network model in which the traction depends on the porosity $\phi^{f}$ (Coussy, 2004; Sun et al., 2013a; Wang and Sun, 2016), i.e.,

[TABLE]

where the exact expression of the function $f^{\text{LSTM}}$ and $g^{\text{LSTM}}$ are determined by adjusting the weight of the neurons in the recurrent neural network (Koeppe et al., 2017; Wang and Sun, 2018). Assuming that the solid constituent is incompressible, the porosity reads,

[TABLE]

The multi-graph that combines all the possible choices of the three traction separation laws can therefore be defined by multi-graph statement with the following sets,

[TABLE]

Since $\boldsymbol{n}_{\mathbb{V}}$ is a bijective mapping, the labeling of the vertices is trivial. The rest of the mappings, i.e. $\boldsymbol{s}$ , $\boldsymbol{t}$ and $\boldsymbol{n}_{\mathbb{E}}$ can be visualized in a labeled directed multigraph as shown in Figure 1. Essentially, the process of creating the directed multigraph is to mathematically represent all the possible options modelers can have when they are tasked to create a constitutive model for a data set. ∎

2.2 Recasting the process of writing constitutive laws as selecting subgraphs in a directed multigraph

In the first meta-modeling game introduced in this work, we consider a scenario where a set of experimental data is given. This experimental data include measurement of different physical quantities, but the inherent relationships are unknown to the modeler. Furthermore, in the process of writing the constitutive law, the modeler must follow a set of rules coined as universal principles (e.g. thermodynamic principles, material frame indifference) (Kirchdoerfer and Ortiz, 2016; Wang and Sun, 2018). Here, we first assume that an objective of writing the constitutive model is well defined and hence a score system is available for the deep Q-learning. We then idealize the process of writing a constitutive law with a fixed set of data as a two-step process. First, we consider all the possible ways to write a constitutive law and represent all these possibilities in a labeled directed multigraph. This labeled directed multigraph define the action space of the meta-modeling game. Second, among all the possible ways to write a constitutive law, i.e., on the labeled directed multigraph, we seek the optimal configuration that will lead to the best outcome measured by an objective function. If the total number of possible configurations is sufficiently small, then the optimal configuration can be sought by building all the possible configurations and comparing their performance afterward. However, this brute force approach becomes infeasible when the total number of configurations is very large as in the case of the game of chess and Go (Silver et al., 2017a, c). As a result, the proposed procedure of finding the optimal configuration of a constitutive law is given as follow.

Instants of constitutive laws are considered as directed graphs. Given a dataset that contains the time history of measurable physical quantities of $n$ types of data stored in the vertices labeled by the vertex label $l_{i}\in\mathbb{L_{V}}$ and the labeled direct graph defined by the 8-tuple $\mathbb{G}=(\mathbb{L_{V}},\mathbb{L_{E}},\mathbb{V},\mathbb{E},\boldsymbol{s},\boldsymbol{t},\boldsymbol{n_{V}},\boldsymbol{n_{E}})$ , and objective function SCORE and constraints to enforce universal principles. Find an subgraph $\mathbb{G}^{\prime}$ of $\mathbb{G}$ consists of vertices $\boldsymbol{V}\in\mathbb{V}^{s}\subseteq\mathbb{V}$ and edges $\boldsymbol{E}\in\mathbb{E}^{s}\subseteq\mathbb{E}$ such that 1) $\mathbb{G}^{\prime}$ is a directed acyclic graph, 2) a score metric is maximized under a set of $m$ constraints $f_{i}(l_{1},l_{2},\ldots,l_{n})=0,i=1,\ldots,m$ where , i.e.,

[TABLE]

Example 2.

Game Action for traction-separation Laws. Given an 8-tuple $\mathbb{G}=(\mathbb{L_{V}},\mathbb{L_{E}},\mathbb{V},\mathbb{E},\boldsymbol{s},\boldsymbol{t},\boldsymbol{n_{V}},\boldsymbol{n_{E}})$ with elements defined in (10), (11), (16), (15). Find the subgraph $\mathbb{G}^{\prime}$ of $\mathbb{G}$ such that this subgraph becomes the directed acyclic graph that maximizes the blind prediction accuracy defined by an objection function. ∎

3 Two-player meta-modeling game for the discovery of elasto-plastic models through modeling and automated experiments

In this work, we conceptualize the process of writing, calibrating and validating constitutive laws as a cooperative two-player game played by one modeler and one experimentalist (data) agent. These two agents, in theory, can be played by either a human or an artificial intelligence (AI) machine. To simplify the problems, we consider only virtual experiments such as discrete element simulations (Sun et al., 2013a; Zohdi, 2013; Liu et al., 2016b; Xin et al., 2017; Ulven and Sun, 2018; Wang and Sun, 2018) and that the agents are not constrained by the number of virtual experiment tests they might conduct. The control of the experimental cost and the ability to automate the execution of experiments are important topics but are both out of the scope of this work.

As such two AI agents must be able to cooperate such that they can find the hierarchical relationships among available data and (2) come up with the experiment plan that helps improve the performance of the blind predictions made by the directed graph model, as shown in Figure 2. This lead to a multi-agent multi-objective problem that can be solved by deep reinforcement learning (Tan, 1993; Raileanu et al., 2018).

3.1 Data collection game for experimentalist agent

This section presents a design of the data collection game involving the common decision-making activities of experimentalists in testing the mechanical properties of a material. The goal of this game is for the experimentalist agent to find the optimal subset of tests for model generation and parameter calibration within a set of candidate tests on the material. The key ingredients of the game are detailed as follows.

3.1.1 Game Board for Experimentalist

Consider a set of possible mechanical experiments on a material $\textbf{T}=\{T_{1},T_{2},T_{3},...,T_{n}\}$ . The experiments can be divided into two types: (1) a subset $\textbf{T}_{c}$ of calibration experiments for material parameter identification in a constitutive model, (2) a subset $\textbf{T}_{v}$ of validation experiments for testing the forward prediction accuracy of the constitutive model. $\textbf{T}=\textbf{T}_{c}\cup\textbf{T}_{v}$ , $\textbf{T}_{c}\cap\textbf{T}_{v}=\emptyset$ , $\textbf{T}_{c}\neq\emptyset$ and $\textbf{T}_{v}\neq\emptyset$ . Suppose the experimentalist has a priori preselected the elements in both categories: $\textbf{T}_{c}=\textbf{T}_{c}^{0}=\{T_{c1},T_{c2},T_{c3},...,T_{cn}\}$ and $\textbf{T}_{v}=\textbf{T}_{v}^{0}=\{T_{v1},T_{v2},T_{v3},...,T_{vn}\}$ . This selection could be based on the availability of laboratory equipment, i.e., $\textbf{T}_{c}^{0}$ includes all tests that the experimentalist can perform in the laboratory, while $\textbf{T}_{v}^{0}$ includes other tests that can only be acquired from literature or third-party laboratories. The experimentalist then chooses the final set of experiments $\textbf{T}_{c}\subset\textbf{T}_{c}^{0}$ which could generate necessary and sufficient data for the modeler agent to develop and calibrate a constitutive model with the highest model score. The final validation set $\textbf{T}_{v}$ contains both experiments in $\textbf{T}_{v}^{0}$ and those not selected in $\textbf{T}_{c}$ , i.e., $\textbf{T}_{v}=\textbf{T}_{v}^{0}\cup(\textbf{T}_{c}^{0}\setminus\textbf{T}_{c})$ . Hence the set $\textbf{T}_{c}^{0}$ constitutes the ”game board” for the experimentalist agent to play the data collection game.

3.1.2 Game State for Experimentalist

The mathematical description of the current state of the game board is a list of binary indicators $s=[i_{c1},i_{c2},i_{c3},...,i_{cn},i_{terminate}]$ representing whether a test $T_{ci}\in\textbf{T}_{c}^{0}$ is selected to be one of the calibration tests, and also whether the game is terminated. If $T_{ci}\in\textbf{T}_{c}$ , the corresponding indicator $i_{ci}=1$ , if $T_{ci}\notin\textbf{T}_{c}$ $i_{ci}=0$ . If $i_{terminate}=1$ , the game reaches the end, otherwise the experimentalist can continue. The initial state of the game is $i_{ci}=0,\ \forall T_{ci}\in\textbf{T}_{c}^{0}$ and $i_{terminate}=0$ . A special final state in which $i_{ci}=0,\ \forall T_{ci}\in\textbf{T}_{c}^{0}$ and $i_{terminate}=1$ indicates that no data is available for model generation and calibration, hence the reward for this state is set to [math].

3.1.3 Game Action for Experimentalist

At each state $s$ , the experimentalist can select the next calibration test $T_{ci}\in\textbf{T}_{c}$ , by changing the indicator $i_{ci}$ from 0 to 1, or decide to stop the selection immediately, by changing $i_{terminate}$ from 0 to 1.

3.1.4 Game Rule for Experimentalist

Generally, there are no specific rules constraining the selection of experiments for model parameter calibration. But the game designer could always customize certain rules that prohibit the coexistence of certain experiments in $\textbf{T}_{c}$ . The game rule can be reflected by a list of binaries $LegalActions(s)=[ii_{c1},ii_{c2},ii_{c3},...,ii_{cn},ii_{terminate}]$ , indicating whether an element $i_{ci}$ of the state $s$ can be changed in the next action step.

$\bullet$ If $i_{ci}=0$ in the current state $s$ , then $ii_{ci}=1$ in $LegalActions(s)$ .

$\bullet$ If $i_{ci}=1$ , then $ii_{ci}=0$ .

$\bullet$ if $i_{terminate}=0$ , then $ii_{terminate}=1$ .

We enforce a game rule that require the two tests $T_{ci}$ and $T_{cj}$ are mutually exclusive in $\textbf{T}_{c}$ .

$\bullet$ If $i_{ci}=1$ , then $ii_{cj}=0$ , and vice versa.

The initial legal actions are $ii_{ci}=1,\ \forall T_{ci}\in\textbf{T}_{c}^{0}$ and $ii_{terminate}=1$ .

3.1.5 Game Reward for Experimentalist

The reward from the game environment to the experimentalist agent should consider the scores of the constitutive models generated by the modeler, given the calibration data and validation data prepared by the experimentalist. For each result of the data collection game $\textbf{T}_{c}$ (hence its pair $\textbf{T}_{v}=\textbf{T}\setminus\textbf{T}_{c}$ ), the modeler could generate a number of different constitutive models with scores $[\text{SCORE}_{i,\ i=1,2,3,...}]_{\textbf{T}_{c}}$ . The reward should also consider the total cost of the calibration tests $\textbf{T}_{c}$ . This can be measured by a weighted sum $\text{COST}(\textbf{T}_{c})=\sum^{\textbf{T}_{c}^{0}}w^{cost}_{ci}*i_{ci}$ , where $w^{cost}_{ci}$ is the normalized cost for test $T_{ci}\in\textbf{T}_{c}^{0}$ , $\sum^{\textbf{T}_{c}^{0}}w^{cost}_{ci}=1$ , $w^{cost}_{ci}\in[0,1]$ .

If the experimentalist and the modeler are fully cooperative on generating the constitutive model with the highest score, the reward $r$ is a function of the maximum model scores for all possible $\textbf{T}_{c}\subset\textbf{T}_{c}^{0}$ and the total experimental cost of $\textbf{T}_{c}$ . Suppose that since the beginning of the two-payer cooperative game (Figure 2), the experimentalist have experienced a number of calibration test sets $\textbf{T}_{c}$ (they constitute a set $\mathbb{T}_{c}^{\text{history}}$ ), and the modeler have generated constitutive models and evaluated their scores for these calibration test sets ( $[\text{SCORE}_{i,\ i=1,2,3,...}]_{\textbf{T}_{c}},\ \forall\textbf{T}_{c}\in\mathbb{T}_{c}^{\text{history}}$ ). Then both agent have the knowledge of the highest model score for each $\textbf{T}_{c}$ : $\text{SCORE}_{\textbf{T}_{c}}^{\max}=\max([\text{SCORE}_{i,\ i=1,2,3,...}]_{\textbf{T}_{c}})$ . Thus they know the highest model score in the history of self-played games: $\text{SCORE}^{\max}=\max(\text{SCORE}_{\textbf{T}_{c}}^{\max}),\ \forall\textbf{T}_{c}\in\mathbb{T}_{c}^{\text{history}}$ . Then the agents can identify a set $\mathbb{T}_{c}^{\text{max}}\subset\mathbb{T}_{c}^{\text{history}}$ in which the elements are all calibration test sets that can lead to maximum scores comparable to the highest score, i.e., $\textbf{T}_{c}\in\mathbb{T}_{c}^{\text{max}}$ , if $|\text{SCORE}_{\textbf{T}_{c}}^{\max}-\text{SCORE}^{\max}|\leq\text{TOL}$ , where TOL is a small tolerance criteria.

From the perspective of the experimentalist agent, for a fully cooperative game, $\textbf{T}_{c}$ (represented by the state $s$ ) is winning the data collection game if it is an element of the set $\mathbb{T}_{c}^{\text{max}}$ , and if its total cost is the lowest among all elements in $\mathbb{T}_{c}^{\text{max}}$ . Hence the reward is designed as

[TABLE]

3.1.6 Game Choices for Experimentalist

The elements in the set $\textbf{T}=\{T_{1},T_{2},T_{3},...,T_{n}\}$ could be all possible mechanical experiments on a material. For example, for granular materials, the candidates can include the following common types of tests in soil laboratories:

Drained conventional triaxial test ( $\dot{\epsilon}_{11}\neq 0$ , $\dot{\sigma}_{22}=\dot{\sigma}_{33}=\dot{\sigma}_{12}=\dot{\sigma}_{23}=\dot{\sigma}_{13}=0$ ). 2. 2.

Drained true triaxial test ( $\dot{\epsilon}_{11}\neq 0$ , $b=\frac{\sigma_{22}-\sigma_{33}}{\sigma_{11}-\sigma_{33}}$ , $\dot{\sigma}_{33}=\dot{\sigma}_{12}=\dot{\sigma}_{23}=\dot{\sigma}_{13}=0$ ). 3. 3.

Undrained triaxial test ( $\dot{\epsilon}_{11}\neq 0$ , $\dot{\epsilon}_{11}+\dot{\epsilon}_{22}+\dot{\epsilon}_{33}=0$ , $\dot{\sigma}_{22}=\dot{\sigma}_{33}$ , $\dot{\sigma}_{12}=\dot{\sigma}_{23}=\dot{\sigma}_{13}=0$ ). 4. 4.

One-dimensional test ( $\dot{\epsilon}_{11}\neq 0$ , $\dot{\epsilon}_{22}=\dot{\epsilon}_{33}=\dot{\epsilon}_{12}=\dot{\epsilon}_{23}=\dot{\epsilon}_{13}=0$ ). 5. 5.

Simple shear test ( $\dot{\epsilon}_{12}>0$ , $\dot{\sigma}_{11}=\dot{\sigma}_{22}=\dot{\epsilon}_{33}=\dot{\epsilon}_{23}=\dot{\epsilon}_{13}=0$ ).

The loading conditions are represented by constraints on the components of the stress rate and strain rate tensors

[TABLE]

Remarks on implementation In the numerical testing of the constitutive models, the above material test conditions are applied via a linearized integration technique for loading constraints of laboratory experiments $\boldsymbol{S}d\boldsymbol{\sigma}+\boldsymbol{E}d\boldsymbol{\epsilon}=d\boldsymbol{Y}$ , combined with incremental constitutive equations, as proposed in (Bardet and Choucair, 1991).

3.2 Meta-modeling game for modeler agent

This section presents a design of the constitutive modeling game involving the common decision-making activities of modelers in developing models to approximate the mechanical properties of a material. The goal of this game is for the modeler agent to find the optimal configuration of the directed graph from a predefined directed multigraph (Section 2) with its structure inherited from the graphs of the classical infinitesimal strain elasto-plasticity models. The key ingredients of the meta-modeling game consist of game agents, game board, game state, game actions, game Rules, game reward and game choices such that it constitutes an agent-environment interactive system (Bonabeau, 2002; Wang and Sun, 2019a) which are detailed as follows.

3.2.1 Game Board for Modeler

A constitutive model in the generalized elasto-plasticity framework (Pastor et al., 1990; Zienkiewicz et al., 1999) requires four essential components of ”phenomenological relations” : (1) elasticity law (2) loading direction (3) plastic flow direction (4) hardening modulus. The process of obtaining a directed graph (the final state of the game) from the game board, i.e., the direct multigraph of the proposed framework is presented in Figure 3. The quantities are presented in the incremental form at discrete time steps. A quantity $a$ at the current time step $t_{n}$ is denoted as $a_{n}=a(t_{n})$ . The next time step is $t_{n+1}$ with the time increment $\Delta t=t_{n+1}-t_{n}$ . Then the increment of the quantity $a$ within $\Delta t$ is denoted as $\Delta a_{n+1}=a_{n+1}-a_{n}$ . The essential ”definition” edges in the direct multigraph are written as

[TABLE]

where $\Delta\lambda_{n+1}$ is the plastic multiplier and $H_{n}$ is the generalized plastic modulus.

The ”elastic loading” and ”plastic loading” states are determined via the projection of the trial elastic stress increment $\Delta\boldsymbol{\sigma}^{e}_{n+1}=\boldsymbol{C}^{e}_{n}:\Delta\boldsymbol{\epsilon}_{n+1}$ on the loading direction $\boldsymbol{n}^{load}_{n}$ . If there is no assumed yield surface, then

[TABLE]

or if there exists a yield surface $f(\boldsymbol{\sigma},\boldsymbol{q}^{piv}_{n}(\boldsymbol{\xi}^{piv}_{n}))$ , then

[TABLE]

where $\boldsymbol{\xi}^{piv}_{n}$ is a vector of strain-like plastic internal variables and $\boldsymbol{q}^{piv}_{n}$ is a vector of stress-like plastic internal variables conjugate to $\boldsymbol{\xi}^{piv}_{n}$ . $\boldsymbol{\xi}^{piv}_{n}$ may include the following internal state variables accumulated during the deformations from the initial time $t_{0}$ to the current time $t_{n}$ ,

[TABLE]

where $\bar{\epsilon}^{p}$ , $\bar{\epsilon}^{p}_{v}$ and $\bar{\epsilon}^{p}_{s}$ are accumulated total, volumetric and deviatoric plastic strains, respectively. $e$ is the void ratio for granular materials, defined as the ratio between volume of the void and the solid constituent. We assume that the yield function is isotropic and therefore can be expressed in terms of stress invariants (Borja, 2013). As a result, the phenomenological relations can be represented as functions of a stress invariants $\boldsymbol{\sigma}^{ivr}_{n}$ , which may include

[TABLE]

where $J_{2}=\frac{1}{2}\text{trace}(\boldsymbol{s}_{n}^{2})$ , $J_{3}=\frac{1}{3}\text{trace}(\boldsymbol{s}_{n}^{3})$ , $\boldsymbol{s}_{n}=\boldsymbol{\sigma}_{n}-p_{n}\boldsymbol{I}$ and $\theta_{n}$ is the Lode’s angle, the smallest angle between the line of pure shear and the projection of stress tensor in the deviatoric plane (Malcher et al., 2009). The constitutive relation between the loading direction $\boldsymbol{n}^{load}$ and the state variables $\boldsymbol{\xi}^{piv}_{n}$ , $\boldsymbol{\sigma}^{ivr}_{n}$ can be defined either by formulating a yield surface $f$ such that,

[TABLE]

or, in the case yield surface is absence, directly inferred from observations as those in the generalized plasticity framework (cf. Lubliner and Auricchio (1996); Pastor et al. (1990); Ling and Yang (2006)),

[TABLE]

where

[TABLE]

Similarly, the constitutive relation between the plastic flow direction $\boldsymbol{m}^{flow}$ and the state variables $\boldsymbol{\xi}^{piv}_{n}$ , $\boldsymbol{\sigma}^{ivr}_{n}$ can be defined either by formulating a plastic potential surface $g$ such that,

[TABLE]

or directly inferred from observations as those in the generalized plasticity framework (cf. Lubliner and Auricchio (1996); Pastor et al. (1990); Ling and Yang (2006))

[TABLE]

3.2.2 Game State for Modeler

The mathematical description of the current state of the game board is a list of binary indicators $s=[i_{e1},i_{e2},i_{e3},...,i_{en}]$ representing whether a labeled edge $E_{ei}$ in the labeled edge set $\mathbb{L_{E}}$ of the directed multigraph $\mathbb{G}$ is selected in the final generated directed graph $\mathbb{G}^{\prime}$ . If $E_{ei}$ is included in $\mathbb{G}^{\prime}$ , the corresponding indicator $i_{ei}=1$ , otherwise $i_{ei}=0$ . The initial state of the game is $i_{ei}=0,\ \forall E_{ei}\in\mathbb{L_{E}}$ .

3.2.3 Game Action for Modeler

At each state $s$ , the modeler can select the next labeled edge $E_{ei}\in\mathbb{L_{E}}$ , by changing the indicator $i_{ei}$ from 0 to 1.

3.2.4 Game Rule for Modeler

The modeling choices for the four essential components in an elasto-plasticity model are not fully compatible with each other. For example, a J2 yield surface only has the yield stress as the stress-like plastic internal variable, while a strain hardening law for a Drucker–Prager yield surface has both frictional and cohesion hardening laws. These restrictions on compatible edge choices are specified by a list of binaries $LegalActions(s)=[ii_{1},ii_{2},ii_{3},...,ii_{n}]$ of legal choices for each state. Another set of game rules consist of universal principles on the constitutive models. For example, thermodynamic consistency states that the rate of mechanical dissipation must be non-negative, for isothermal process $\mathcal{D}=\boldsymbol{\sigma}:\dot{\boldsymbol{\epsilon}}-\frac{d\psi}{dt}\geq 0$ . This game rule is incorporated in the game by the definition of the model score. If the final model in an episode violates this rule, the final model score is set to be 0. This low score is then used as training material for the mastermind modeler agent such that it reduces the policy probabilities of the choices that violate universal principles as shown in Figure 4. As the training of the constitutive law can only be completed if the score of the best candidate model is sufficiently high, this prevents the meta-modeling algorithm from generating any model that violates the first principles.

3.2.5 Game Reward for Modeler

A score system must be introduced to evaluate the generated directed graphs for constitutive models such that the accuracy and credibility in replicating the mechanical behavior of real-world materials can be assessed. This score system may also serve as the objective function that defines the rewards for the deep reinforcement learning agent to improve the generated digraphs and resultant constitutive laws. In this work, we define the score as a positive real-valued function of the range $[0,1]$ which depends on the measures $A_{i}$ $(i=1,2,3,...,n)$ of $n$ important features of a constitutive model,

[TABLE]

where $0\leq A_{i}\leq 1$ . Some features are introduced to measure the performance of a model such as the accuracy and computation speed. Other features are introduced to enforce constraints to ensure the admissibility of a constitutive model, such as the frame indifference and the thermodynamic consistency. Suppose there are $n_{\text{pfm}}$ measures of performance features $A^{\text{pfm}}_{i}$ and $n_{\text{crit}}$ measures of critical features $A^{\text{crit}}_{i}$ in the measure system of constitutive models, the score takes the form,

[TABLE]

where $w_{i}\in[0,1]$ is the weight associated with the measure $A^{\text{pfm}}_{i}$ , and $\sum_{i=1}^{n_{\text{pfm}}}w_{i}=1$ .

For example, for measures of accuracy $A_{\text{accuracy}}$ of calibrations and forward predictions, we introduce a cross-validation procedure in which the dataset used for training the models (e.g. identifying material parameters (e.g. Wang et al. (2016); Liu et al. (2016a)) or adjusting weights of neurons in recurrent neural networks (e.g. Lefik and Schrefler (2002); Wang and Sun (2018)) is mutually exclusive to the testing dataset used to evaluate the quality of blind predictions. Both calibration and blind prediction results are compared against the target data. The mean squared error (MSE) commonly used in statistics and also as objective function in machine learning is chosen as the error measure for each data sample $i$ in this study, i.e.,

[TABLE]

where $Y_{i_{j}}^{\text{data}}$ and $Y_{i_{j}}^{\text{model}}$ are the values of the $j$ th feature of the $i$ th data sample, from target data value and predictions from constitutive models, respectively. $N_{\text{feature}}$ is the number of output features. $\mathcal{S}_{j}$ is a scaling operator (standardization, min-max scaling, …) for the output feature $\{Y_{i_{j}}\},\ i\in[1,N_{\text{data}}]$ .

The empirical cumulative distribution functions (eCDFs) are computed for MSE of the entire dataset $\{\text{MSE}_{i}\},\ i\in[1,N_{\text{data}}]$ , for MSE of the training dataset $\{\text{MSE}_{i}\},\ i\in[1,N_{\text{traindata}}]$ and for MSE of the test dataset $\{\text{MSE}_{i}\},\ i\in[1,N_{\text{testdata}}]$ , with the eCDF defined as (Kendall et al., 1946),

[TABLE]

where $N=N_{\text{data}}$ , or $N_{\text{traindata}}$ , or $N_{\text{testdata}}$ , and all $\{\text{MSE}_{i}\}$ are arranged in increasing order. A measure of accuracy is proposed based on the above statistics,

[TABLE]

where $\varepsilon_{P\%}$ is the $P$ th percentile (the MSE value corresponding to $P\%$ in the eCDF plot) of the eCDF on the entire, training or test dataset. $\varepsilon_{\text{crit}}\ll 1$ is the critical MSE chosen by users such that a model can be considered as ”satisfactorily accurate” when $\varepsilon_{P\%}\leq\varepsilon_{\text{crit}}$ .

Once a complete constitutive model is generated, the model score is evaluated. The final reward is defined as: if the current score is higher than the average score of models from a group of already played games by the agent, then the current game wins and $r_{T}=1$ , otherwise, the current game loses and $r_{T}=-1$ . The average score can be initialized to 0 for the first game.

3.2.6 Game Choices for Modeler

This section specifies the candidate edges in the directed multi-graph of elasto-plasticity models (Fig. 3) for the modeler agent to choose during deep reinforcement learning. The edges are categorized into four groups representing the four essential constitutive relations in the model. The edges $\boldsymbol{\sigma}^{ivr}_{n}\rightarrow\boldsymbol{C}^{e}_{n}$ and $\boldsymbol{\xi}^{piv}_{n}\rightarrow\boldsymbol{C}^{e}_{n}$ represent the elasticity law. The edges $\boldsymbol{\sigma}^{ivr}_{n}\rightarrow\boldsymbol{n}^{load}_{n}$ and $\boldsymbol{\xi}^{piv}_{n}\rightarrow\boldsymbol{n}^{load}_{n}$ represent the definition of the loading direction. The edges $\boldsymbol{\sigma}^{ivr}_{n}\rightarrow\boldsymbol{m}^{flow}_{n}$ and $\boldsymbol{\xi}^{piv}_{n}\rightarrow\boldsymbol{m}^{flow}_{n}$ represent the definition of the plastic flow direction. The edges $\boldsymbol{\sigma}^{ivr}_{n}\rightarrow H_{n}$ and $\boldsymbol{\xi}^{piv}_{n}\rightarrow H_{n}$ represent the hardening law. Each edge allows multiple choices extracted from the phenomenological relations developed in the computational plasticity literature. In this paper, for simplicity of illustration of the meta-modeling game framework, the edge choices are not exhaustive. The following lists only contain common representative choices for geomaterials. But the designer of the meta-modeling game is always free to add more edge choices to expand the action space.

The edges for elasticity law ( $\boldsymbol{\sigma}^{ivr}_{n}\rightarrow\boldsymbol{C}^{e}_{n}$ and $\boldsymbol{\xi}^{piv}_{n}\rightarrow\boldsymbol{C}^{e}_{n}$ ) represent the definition and evolution of the elastic stiffness tensor

[TABLE]

where $K$ is the elastic bulk modulus and $G$ is the elastic shear modulus.

Three common formulations of the elastic stiffness tensor for granular materials are available for model choice:

(E1)

Linear elasticity

[TABLE]

where $K_{0}$ and $G_{0}$ are constants.

(E2)

Nonlinear elasticity with dependence on the mean pressure $p$ (Manzari and Dafalias, 1997)

[TABLE]

where $p_{at}$ is the atmospheric pressure ( $\approx$ -100 kPa) and $a$ is a material constant.

(E3)

Nonlinear elasticity with dependence on the mean pressure $p$ and the void ratio $e$ (Dafalias and Manzari, 2004)

[TABLE]

where $\nu$ is the constant Poisson’s ratio.

The edges ( $\boldsymbol{\sigma}^{ivr}_{n}\rightarrow\boldsymbol{n}^{load}_{n}$ and $\boldsymbol{\xi}^{piv}_{n}\rightarrow\boldsymbol{n}^{load}_{n}$ ) represent the definition and evolution of the loading direction. $\boldsymbol{n}^{load}_{n}$ can be either derived from an assumed yield surface $f\leq 0$ or defined explicitly in the space of stress invariants $\boldsymbol{\sigma}^{ivr}_{n}$ .

The following common formulations of loading direction for granular materials are considered for model choices:

(L1)

Yield surface of J2 plasticity $f=q-\sigma_{y}$ and linear hardening law

[TABLE]

where $\sigma_{y0},H_{0}$ are material parameters.

(L2)

Yield surface of J2 plasticity $f=q-\sigma_{y}$ and $\sigma_{y}$ is the solution of the power law equation

[TABLE]

where $\sigma_{y0},n$ are material parameters, $G$ is the elastic shear modulus.

(L3)

Yield surface of J2 plasticity $f=q-\sigma_{y}$ and Voce hardening law

[TABLE]

where $\sigma_{y0},H_{0},H_{\infty},b$ are material parameters.

(L4)

Yield surface of Drucker–Prager plasticity $f=q+\alpha p$ and $\alpha$ evolves according to

[TABLE]

where $a_{0},a_{1},a_{2},a_{3}$ are material parameters (Tu et al., 2009).

(L5)

Yield surface of Drucker–Prager plasticity $f=q+\alpha p$ and $\alpha$ evolves according to

[TABLE]

where $a_{0},a_{1},k$ are material parameters (Borja, 2013).

(L6)

Yield surface of three-invariant Matsuoka–Nakai model (Borja et al., 2003)

[TABLE]

where $c_{0},a_{1},a_{2},a_{3},m$ are material parameters.

(L7)

Yield surface of Nor-Sand (Jefferies, 1993; Andrade and Borja, 2006)

[TABLE]

where $\rho,N,\bar{N},M,h,e_{c0},\tilde{\lambda},a$ are material parameters.

(L8)

Yield surface in the shape of a small cone (Dafalias and Manzari, 2004)

[TABLE]

where $\rho,m,M,n^{b},h,e_{c0},\tilde{\lambda},a$ are material parameters.

(L9)

Loading direction defined as (Pastor et al., 1990; Zienkiewicz et al., 1999)

[TABLE]

where $\alpha,M_{f}$ are material parameters.

(L10)

Loading direction defined as (Ling and Yang, 2006)

[TABLE]

where $\alpha,M_{f},m_{f}$ are material parameters.

(L11)

Loading direction given by a neural network trained with data inversely computed from experimental data (described later in the definition of plastic modulus edges).

The edges ( $\boldsymbol{\sigma}^{ivr}_{n}\rightarrow\boldsymbol{m}^{flow}_{n}$ and $\boldsymbol{\xi}^{piv}_{n}\rightarrow\boldsymbol{m}^{flow}_{n}$ ) represent the definition and evolution of the plastic flow direction. $\boldsymbol{m}^{flow}_{n}$ can be either derived from an assumed plastic potential surface $g=0$ or defined explicitly in the space of stress invariants $\boldsymbol{\sigma}^{ivr}_{n}$ .

The following common formulations of the plastic flow direction for granular materials are considered for model choices:

(P1)

Plastic potential surface of J2 plasticity $g=q-c_{g}$ and $c_{g}$ is a parameter to ensure that the stress point is on the potential surface when the plastic deformation occurs.

(P2)

Plastic potential surface of Drucker–Prager plasticity $g=q+\beta p-c_{g}$ and $\beta=\alpha-\beta_{0}$ , where $\alpha$ can be defined through Eq. (42) or (43), and $\beta_{0}$ is an additional material parameter.

(P3)

Plastic potential surface of three-invariant Matsuoka–Nakai model (Borja et al., 2003)

[TABLE]

where $\kappa_{1}$ can be defined through Eq. 44 and $\beta_{0}$ is an additional material parameter.

(P4)

Plastic potential surface of Nor-Sand (Jefferies, 1993; Andrade and Borja, 2006)

[TABLE]

where $\bar{\rho},\bar{N},M$ are material parameters and $\bar{p}_{i}$ is a free parameter to ensure $g=0$ when the material is undergoing plastic deformation.

(P5)

Plastic flow direction defined as (Dafalias and Manzari, 2004)

[TABLE]

where $\rho,m,M,n^{d},A_{d},e_{c0},\tilde{\lambda},a$ are material parameters.

(P6)

Plastic flow direction defined as (Pastor et al., 1990; Zienkiewicz et al., 1999)

[TABLE]

where $\alpha,M_{g}$ are material parameters.

(P7)

Plastic flow direction defined as (Ling and Yang, 2006)

[TABLE]

where $\alpha,M_{g},m_{g},e_{c0},\tilde{\lambda},a$ are material parameters.

(P8)

Plastic flow direction given by a neural network trained with data inversely computed from experimental data (described later in the definition of plastic modulus edges).

The edges ( $\boldsymbol{\sigma}^{ivr}_{n}\rightarrow H_{n}$ and $\boldsymbol{\xi}^{piv}_{n}\rightarrow H_{n}$ ) represent the definition of the generalized hardening modulus. $H_{n}$ can be either derived from an assumed yield surface $f\leq 0$ or defined explicitly.

The following common formulations of hardening modulus for granular materials are considered for model choices:

(H1)

Hardening modulus derived from classical yield surface $f(\boldsymbol{\sigma},\boldsymbol{\epsilon}^{p})$ and a chosen $\boldsymbol{m}^{flow}$ .

[TABLE]

(H2)

Hardening modulus defined as (Pastor et al., 1990; Zienkiewicz et al., 1999)

[TABLE]

where $\alpha_{f},M_{f},H_{0},e_{c0},M_{g},\beta_{0},beta_{1}$ are material parameters.

(H3)

Hardening modulus defined as (Ling and Yang, 2006)

[TABLE]

where $\alpha_{f},M_{f},H_{L0},m_{0},M_{g},m_{b},e_{c0},\tilde{\lambda},a$ are material parameters.

(H4)

Hardening modulus given by a neural network trained with data inversely computed from experimental data.

The stress increment at each time step is known from the experimental data $\Delta\boldsymbol{\sigma}^{data}_{n+1}=\boldsymbol{\sigma}^{data}_{n+1}-\boldsymbol{\sigma}^{data}_{n}$ . For a chosen elasticity law $\boldsymbol{C}^{e}_{n}(\boldsymbol{\sigma}^{ivr}_{n},\boldsymbol{\xi}^{piv}_{n})$ , the data of incremental plastic strain at each time step is given by (using Eq. (20))

[TABLE]

Then the incremental plastic multiplier is $\Delta\lambda_{n+1}=||\Delta\boldsymbol{\epsilon}^{p}_{n+1}||$ and the plastic flow direction is obtained by $\boldsymbol{m}^{flow}_{n}=\Delta\boldsymbol{\epsilon}^{p}_{n+1}/\Delta\lambda_{n+1}$ . Assuming associative flow rule, then $\boldsymbol{n}^{load}_{n}=\boldsymbol{m}^{flow}_{n}$ . In this way, the plastic modulus can be uniquely inversely computed as

[TABLE]

3.2.7 Game Choice alternatives: training neural network edges

In addition to the mathematical edges described above, we also consider the possibility of replacing any part of the elasto-plastic model with machine learning edges. In this framework, the machine learning models are not used to directly map strain history to stress, but are used for each individual edge in the directed graph to map the input vertices to the output vertices. For instance, the mapping of variables in the generalized plasticity framework can be obtained by training a recurrent neural network that represents the path-dependent constitutive relation between the history of input vertices of $\boldsymbol{\sigma}^{ivr}_{n}$ ( $p,q,\theta$ ) and $\boldsymbol{\xi}^{piv}_{n}$ ( $\bar{\epsilon}^{p},\bar{\epsilon}_{v}^{p},\bar{\epsilon}_{s}^{p},e$ ) and the output vertices of $\boldsymbol{n}^{load}_{n}$ , $\boldsymbol{m}^{flow}_{n}$ and $H_{n}$ . The details of training data preparation, network design, training and testing are specified in the previous work on the meta-modeling framework for traction-separation models with data of microstructural features (Wang and Sun, 2019a). In this framework, all neural network edges are generated using the same neural network architecture, i.e., two hidden layers of 64 GRU(Gated recurrent unit) neurons in each layer, and the output layer as a dense layer with linear activation function. All input and output data are pre-processed by standard scaling using mean values and standard deviations. Each input feature considers its current value and 19 history values prior to the current loading step. Each neural network is trained for 1000 epochs using the Adam optimization algorithm, with a batch size of 256. Finally, it should be noticed that one can further generalize the meta-modeling game by considering multiple neural network architectures as possible edges in the meta-modeling game. This generalization will be considered in the future but is out of the scope of the current study.

Remarks on implementation An elasto-plasticity model, once generated from AI, needs to be numerically integrated to compute the predicted stresses under different types of tests. Since the loading directions, plastic flow directions and hardening modulus can have a large number of options and may be exceedingly complex, we adopt a general-purpose explicit integration algorithm for all AI generated models, instead of using different implicit integration techniques necessary for different models. This algorithm is a combination of (1) the explicit integration with sub-stepping and automatic error control (Sloan, 1987; Sloan et al., 2001) (2) explicit integration of (potentially non-smooth) hardening laws (Tu et al., 2009) (3) integration of generalized plasticity models (de Borst and Heeres, 2002; Mira et al., 2009) (4) linearized integration for loading constraints (Bardet and Choucair, 1991). The algorithm is detailed in Algorithm 1. This explicit scheme is versatile and stable, but not as accurate as fully implicit return mapping algorithms, hence for the evaluation of model accuracy scores, small time steps are required for the numerical integration.

4 Deep reinforcement learning for the two-player meta-modeling game

With the two-player game completely defined in the previous section, a deep reinforcement learning (DRL) algorithm is employed as a guidance of taking actions of both experimentalist and modeler in the game to maximize the final model score (Figure 4). The learning is completely free of human interventions after the game settings. This tactic is considered one of the key ideas leading to the major breakthrough in AI playing the game of Go (AlphaGo Zero) (Silver et al., 2017a), Chess and shogi (Alpha Zero) (Silver et al., 2017b) and many other games. In (Wang and Sun, 2019a), the key ingredients (Policy/Value network, confidence bound for Q-value, Monte Carlo Tree Search) of the DRL technique are detailed and applied to a meta-modeling game for modeler agent only, focusing on finding the optimal topology of physical relations from fixed training/testing datasets. In this work, the game design is further extended that (1) the modeling game also involves the ”component selection” from a set of candidate edge choices having the same source and target nodes (derive a directed graph from a directed multigraph) and (2) the choice of training dataset is carried out by an additional experimentalist agent. Since DRL needs to figure out the optimal strategies for two agents, the algorithm is extended to multi-agent multi-objective DRL (Tan, 1993; Foerster et al., 2016; Tampuu et al., 2017). The AI for experimentalist and modeler agents are separate, each has its own Policy/Value network and decision tree search. But their intelligence are improved simultaneously during the self-plays of the entire Data collection/Meta-modeling game, according to the individual rewards they receive from the game environment and the communications between themselves (Figure 4). The strategies of both agents can be cooperative or competitive of different degrees, depending on the design of the game reward system (for example, the video game of Pong in (Tampuu et al., 2017)). In this work, we consider only the learning of fully cooperative strategies, as shown in the game reward system designed in Sections 3.1 and 3.2.

The pseudocode of the reinforcement learning algorithm to play the two-player meta-modeling game is presented in Algorithm 2. This is an extension of the algorithm in (Wang and Sun, 2019a). As demonstrated in Algorithm 2, each complete DRL procedure involves $numIters$ number of training iterations and one final iteration for generating the final converged digraph model. Each iteration involves $numEpisodes$ number of game episodes that construct the training example set $trainExamples$ for the training of the policy/value network $f_{\theta}$ . For decision makings in each game episode, the action probabilities are estimated from $numMCTSSims$ times of MCTS simulations.

Remark: Non-cooperative meta-modeling game and Nash equilibrium. In the case of the cooperative game where both agents share the same goal or score system, there is no need to determine the Nash equilibrium as the joint actions of the experimentalist/modeler group takes a collective of payoffs. However, in many realistic situations in modern-day research, it is possible that the data and modeler agents may have different or even conflicting goals and hence finding the best strategies the two agents take is equivalent to finding the Nash equilibrium. The meta-modeling model, in this case, is not only helpful for generating models but also helps us understand the relationships among objectives between the data and modeler agents, the resultant actions taken by both players, and the outcomes, assuming each player is acting in a rational manner.

5 Numerical Experiments

In this section, we present two cooperative modeling games with different data to demonstrate the intelligence, robustness and efficiency of the deep reinforcement learning algorithm on improving the accuracy and consistency of the generated elasto-plasticity models through self-plays. In the first example, synthetic data computed from selected J2, Drucker-Prager and Matsuoka–Nakai plasticity are used to train the data and model agents, to validate the meta-modeling framework and show that the AI has the ability to react appropriately such that the correct interpretation (i.e. the model itself if the data is generated from that model) can be recovered from the data. In the second example, sub-scale discrete element simulations (DEM) are used to generate synthetic benchmark data for model calibrations and blind prediction evaluations to mimic data from real-world granular materials.

5.1 Numerical Experiment 1: Testing the ability of AI for reverse engineering constitutive laws

The correctness of the proposed meta-modeling framework is first verified through a series of tests on ”virtual materials” having exact elasto-plastic constitutive behaviors. The goal of this example is to show that the framework can exactly recover all edge components of a pre-selected directed graph for elasto-plastic constitutive model, based on the data from AI-selected experiments. In other words, the purpose of this numerical experiment is a verification exercise that tests whether both agents can automatically derive the right strategies to recover the models from data without explicit input of human intervention during the training, under the idealized condition that the data does not contain any noise. The test models for the AI to recover are, respectively,

J2 model with Von Mises yield function and an isotropic hardening with power law. 2. 2.

Drucker-Pager model with frictional hardening. 3. 3.

Three-invariant Matsuoka–Nakai model.

In the reverse engineering numerical experiments, we first implement three implicit return mapping algorithm for the aforementioned models. The experimentalist agent is then given the executable files of the return mapping algorithms of these models and run these executable files to generate data. Meanwhile, the modeler agent uses the data generated from the experimentalist agent as input. In each iteration of a training session, the experimentalist agent can decide to terminate the numerical tests at any time. The experimental test choices available for the experimentalist agent consist of

T1:

One-dimensional extension test ( $\dot{\epsilon}_{11}>0$ , $\dot{\epsilon}_{22}=\dot{\epsilon}_{33}=\dot{\epsilon}_{12}=\dot{\epsilon}_{23}=\dot{\epsilon}_{13}=0$ , $p0=-200kPa$ )

T2:

One-dimensional compression test ( $\dot{\epsilon}_{11}<0$ , $\dot{\epsilon}_{22}=\dot{\epsilon}_{33}=\dot{\epsilon}_{12}=\dot{\epsilon}_{23}=\dot{\epsilon}_{13}=0$ , $p0=-200kPa$ ).

T3:

Drained triaxial extension test ( $\dot{\epsilon}_{11}>0$ , $\dot{\sigma}_{22}=\dot{\sigma}_{33}=\dot{\sigma}_{12}=\dot{\sigma}_{23}=\dot{\sigma}_{13}=0$ , $p0=-200kPa$ ).

T4:

Drained triaxial compression test ( $\dot{\epsilon}_{11}<0$ , $\dot{\sigma}_{22}=\dot{\sigma}_{33}=\dot{\sigma}_{12}=\dot{\sigma}_{23}=\dot{\sigma}_{13}=0$ , $p0=-200kPa$ ).

T5:

Undrained triaxial extension test ( $\dot{\epsilon}_{11}>0$ , $\dot{\epsilon}_{11}+\dot{\epsilon}_{22}+\dot{\epsilon}_{33}=0$ , $\dot{\sigma}_{22}=\dot{\sigma}_{33}$ , $\dot{\sigma}_{12}=\dot{\sigma}_{23}=\dot{\sigma}_{13}=0$ , $p0=-200kPa$ ).

T6:

Undrained triaxial compression test ( $\dot{\epsilon}_{11}<0$ , $\dot{\epsilon}_{11}+\dot{\epsilon}_{22}+\dot{\epsilon}_{33}=0$ , $\dot{\sigma}_{22}=\dot{\sigma}_{33}$ , $\dot{\sigma}_{12}=\dot{\sigma}_{23}=\dot{\sigma}_{13}=0$ , $p0=-200kPa$ ).

T7:

Simple shear test ( $\dot{\epsilon}_{12}>0$ , $\dot{\sigma}_{11}=\dot{\sigma}_{22}=\dot{\epsilon}_{33}=\dot{\epsilon}_{23}=\dot{\epsilon}_{13}=0$ , $p0=-200kPa$ ).

The modeling choices available for the modeler agent are specified in the Game Choices of the Section 3.2. The model score is defined as:

[TABLE]

where $P\%=80\%$ and $\varepsilon_{\text{crit}}=1e^{-5}$ for Eq. (34) of accuracy evaluations. The DRL meta-modeling procedure (Algorithm 2) contains $numIters=10$ training iterations of ”exploration and exploitation” of game strategies, by setting the temperature parameter $\tau$ to 1. Then an iteration of ”competitive gameplay” ( $\tau=0.01$ ) is conducted to showcase the performance of the final trained AI agent. Each iteration consists of $numEpisodes=30$ self-play episodes of the game. Hence one execution of the entire DRL procedure contains $numIters*numEpisodes=10*30=300$ game episodes for training the policy/value neural network. Each game starts with a randomly initialized neural network for the policy/value predictions, and each play step requires $numMCTSSims=30$ MCTS simulations. Then the play steps and corresponding final game rewards are appended to the set of training examples for the training of the policy/value network.

Figures 5, 6 and 7 present the example model predictions and calibration tests during the DRL improvement of the experimentalist and modeler agents. Both agents try out different combinations of calibration data and model choices, and evaluate their model scores and individual game rewards. The agents learn from all the gameplay results that they have experienced and converge their individual strategies to the optimal ones that eventually generate the optimal set of experiment tests for model calibration and exactly recover the plasticity model used to generate the synthetic data. The ”cooperative” convergence of the strategies of both agents is of crucial importance, since the calibration dataset and the selected model must be simultaneously optimal for the final model score to be maximum. Although the gameplays could be different in each separate run of the two-player DRL algorithm due to the randomness in initial Policy/Value networks and the action possibilities involved in Monte Carlo Tree Search, the optimal strategies are always recovered if the exploration is sufficient. This is confirmed in Figure 8 and 9 in which the statistics of the gameplay scores of 20 separate executions of the two-player DRL algorithm are analyzed.

The fact that the two-player meta-modeling game is able to reach a perfect score in blind prediction indicates that it has successfully reverse engineered the constitutive law. The ability to automatically reverse-engineering a constitutive model could be of potential commercial value, as it allows one to understand attributes of legacy or proprietary software even when only the executable is available. Even in the case when reverse engineering fails to recover the constitutive responses perfectly, the score can indicate how close the DRL-generated model replicates the constitutive law in the legacy or proprietary codes.

Furthermore, the fact that the training is able to recover the model also enables us to use a different architecture for computational mechanics software in which the material model library does not necessarily contain multiple constitutive laws categorized by labels or model names. Instead, any new model in the literature that contains new ”action” not available in the previous constitutive law can be decomposed into directed graphs and subsequently be merged with the existing pool of actions such that the modeler agent can have more tools to generate new models that optimize objective functions. Since (1) new actions that complete the model will only be picked by the modeler agent if they can help it achieve a higher score,and (2) should this happens, the improvement in prediction quality is quantified by the increase in the score, the meta-modeling game can be used as a tool to evaluate the true benefit of any new action that departs from the state-of-the-art.

5.2 Numerical Experiments 2: Testing the ability of AI for forward predictions

In this numerical experiment, we examine the ability of the proposed meta-modeling agents to (1) generate the knowledge and model represented by the directed graph from given data, (2) decide the set of experiments that aids data-driven discovery and (3) terminate the learning process when further experiments no longer benefit predictions.

In this test, we consider an idealized situation in which the data is generated from discrete element simulations for granular materials (Cundall and Strack, 1979; Kuhn et al., 2015; Wang and Sun, 2019b). While the constitutive responses from the discrete element simulations may contain fluctuation, we do not introduce any contaminated noise on purpose to test how the meta-modeling procedure might be affected by noise. While this could be addressed using dropout layers as shown in Wang and Sun (2018), a comprehensive study on learning with noisy data is out of the scope in this study but will be considered in the future. The data for calibration and evaluation of prediction accuracy of the deep-reinforcement-learned constitutive models are generated by numerical simulations on a representative volume element (RVE) of densely-packed spherical DEM particles. The open-source discrete element simulation software YADE for DEM is used by the experimentalist agent to generate data, including the homogenized stress and strain measures and the geometrical and microstructural attributes such as coordination number, fabric tensor, porosity (Šmilauer et al., 2010; Sun et al., 2014). The discrete element particles in the RVE have radii between $1\pm 0.3$ mm with a uniform distribution. The Cundall’s elastic-frictional contact model ((Cundall and Strack, 1979)) is used for the inter-particle constitutive law. The material parameters are: interparticle elastic modulus $E_{eq}=0.5$ GPa, ratio between shear and normal stiffness $k_{s}/k_{n}=0.3$ , frictional angle $\varphi=$ [math], density $\rho=2600$ $kg/m^{3}$ , Cundall damping coefficient $\alpha_{damp}=0.6$ .

The test data constitute of triaxial tests on DEM samples with different initial confining pressure and void ratio $\dot{\sigma}_{33}=\dot{\sigma}_{12}=\dot{\sigma}_{23}=\dot{\sigma}_{13}=0,b=\frac{\sigma_{22}-\sigma_{33}}{\sigma_{11}-\sigma_{33}}$ .

T1:

$\dot{\epsilon}_{11}<0$ , $b=0$ , $p_{0}=-300kPa$ , $e_{0}=0.539$ .

T2:

$\dot{\epsilon}_{11}<0$ , $b=0$ , $p_{0}=-400kPa$ , $e_{0}=0.536$ .

T3:

$\dot{\epsilon}_{11}<0$ , $b=0$ , $p_{0}=-500kPa$ , $e_{0}=0.534$ .

T4:

$\dot{\epsilon}_{11}>0$ , $b=0$ , $p_{0}=-300kPa$ , $e_{0}=0.539$ .

T5:

$\dot{\epsilon}_{11}>0$ , $b=0$ , $p_{0}=-400kPa$ , $e_{0}=0.536$ .

T6:

$\dot{\epsilon}_{11}>0$ , $b=0$ , $p_{0}=-500kPa$ , $e_{0}=0.534$ .

T7:

$\dot{\epsilon}_{11}<0$ , $b=0.5$ , $p_{0}=-300kPa$ , $e_{0}=0.539$ .

T8:

$\dot{\epsilon}_{11}<0$ , $b=0.5$ , $p_{0}=-400kPa$ , $e_{0}=0.536$ .

T9:

$\dot{\epsilon}_{11}<0$ , $b=0.5$ , $p_{0}=-500kPa$ , $e_{0}=0.534$ .

T10:

$\dot{\epsilon}_{11}>0$ , $b=0.5$ , $p_{0}=-300kPa$ , $e_{0}=0.539$ .

T11:

$\dot{\epsilon}_{11}>0$ , $b=0.5$ , $p_{0}=-400kPa$ , $e_{0}=0.536$ .

T12:

$\dot{\epsilon}_{11}>0$ , $b=0.5$ , $p_{0}=-500kPa$ , $e_{0}=0.534$ .

T13:

$\dot{\epsilon}_{11}<0$ , $b=0.1$ , $p_{0}=-300kPa$ , $e_{0}=0.539$ .

T14:

$\dot{\epsilon}_{11}<0$ , $b=0.1$ , $p_{0}=-400kPa$ , $e_{0}=0.536$ .

T15:

$\dot{\epsilon}_{11}<0$ , $b=0.1$ , $p_{0}=-500kPa$ , $e_{0}=0.534$ .

T16:

$\dot{\epsilon}_{11}>0$ , $b=0.1$ , $p_{0}=-300kPa$ , $e_{0}=0.539$ .

T17:

$\dot{\epsilon}_{11}>0$ , $b=0.1$ , $p_{0}=-400kPa$ , $e_{0}=0.536$ .

T18:

$\dot{\epsilon}_{11}>0$ , $b=0.1$ , $p_{0}=-500kPa$ , $e_{0}=0.534$ .

T19:

$\dot{\epsilon}_{11}<0$ , $b=0.25$ , $p_{0}=-300kPa$ , $e_{0}=0.539$ .

T20:

$\dot{\epsilon}_{11}<0$ , $b=0.25$ , $p_{0}=-400kPa$ , $e_{0}=0.536$ .

T21:

$\dot{\epsilon}_{11}<0$ , $b=0.25$ , $p_{0}=-500kPa$ , $e_{0}=0.534$ .

T22:

$\dot{\epsilon}_{11}>0$ , $b=0.25$ , $p_{0}=-300kPa$ , $e_{0}=0.539$ .

T23:

$\dot{\epsilon}_{11}>0$ , $b=0.25$ , $p_{0}=-400kPa$ , $e_{0}=0.536$ .

T24:

$\dot{\epsilon}_{11}>0$ , $b=0.25$ , $p_{0}=-500kPa$ , $e_{0}=0.534$ .

T25:

$\dot{\epsilon}_{11}<0$ , $b=0.75$ , $p_{0}=-300kPa$ , $e_{0}=0.539$ .

T26:

$\dot{\epsilon}_{11}<0$ , $b=0.75$ , $p_{0}=-400kPa$ , $e_{0}=0.536$ .

T27:

$\dot{\epsilon}_{11}<0$ , $b=0.75$ , $p_{0}=-500kPa$ , $e_{0}=0.534$ .

T28:

$\dot{\epsilon}_{11}>0$ , $b=0.75$ , $p_{0}=-300kPa$ , $e_{0}=0.539$ .

T29:

$\dot{\epsilon}_{11}>0$ , $b=0.75$ , $p_{0}=-400kPa$ , $e_{0}=0.536$ .

T30:

$\dot{\epsilon}_{11}>0$ , $b=0.75$ , $p_{0}=-500kPa$ , $e_{0}=0.534$ .

The candidate tests for the calibration data generation include $\textbf{T}_{c}^{0}=\{T1,T2,T3,...,T11,T12\}$ and the validation tests are $\textbf{T}_{v}^{0}=\{T13,T14,T15,...,T19,T30\}$ . As explained in Section 3.1, the tests not selected in the final calibration set by the experimentalist agent will be moved to the final validation set to evaluate the blind prediction performance. The parameters for the DRL procedure are identical to the settings in Example 1. The statistics of the gameplay results from 5 separate runs of the DRL procedure are presented in Figure 10. We observe efficient improvements in the generated elasto-plastic models over the DRL training iterations with the discrete element simulation data.

Figure 11 presents the example model predictions and calibration tests during the DRL improvement of the experimentalist and modeler agents. The final converged calibration test set chosen by the AI experimentalist after the DRL procedure consists of the triaxial extension and compression tests with $b=0$ and $b=0.5$ under initial pressures of -300 kPa and -500 kPa. Accordingly, the final converged elasto-plastic model generated by the AI modeler after the DRL procedure is composed of the non-linear elasticity of Eq. (37), the loading direction defined as Eq. (48), the plastic flow direction defined as Eq. (53), and the hardening modulus defined as Eq. (55). The resultant model is a generalized plasticity model (without explicitly defined yield surface and plastic potential) combined with the critical state plasticity theory (dependence on the $p,q,\theta$ stress invariants and the void ratio $e$ ). Figure 12 presents five representative examples of blind predictions of this selected model and the selected calibration data. This optimal model for the given action space is generated from data obtained from 9 experiments in the following order: [T1, T3, T4, T5, T7, T9, T10, T11, T12].

One interesting aspect revealed in this numerical experiment is the potential of using the meta-modeling game as a tool to evaluate and analyze of relative policy values of the ingredients of constitutive laws in a prediction task. For instance, this numerical experiment reveals that the optimal configuration of the constitutive model for predicting the behavior of monotonic loading triaxial compression test should not contain any neural network edge (Eq. (57), (58) in Section 3.2) This could be attributed to the facts that the training data of the loading directions, plastic flow directions and hardening moduli from the DEM experimental data contain high-frequency fluctuations and that our testing data, which are used to evaluate the forward prediction performance, contain only monotonic stress paths. Since the high-frequency fluctuation makes the neural network easily to exhibit overfitting responses, and the relatively simple stress paths make it less advantageous to use a high-dimensional universal approximator like a neural network in any component of the constitutive models, the edges that map input from the output vertices through mathematical expressions are revealed to have higher policy values as the game progresses and ultimately become the selected models.

Note that this result is in sharp contrast with the meta-modeling game results of the traction-separation law in which the neural network edges become dominant and yield a consistently good forward predictions (Wang and Sun, 2018, 2019a). Comparing the choices the agents made in the two games reveal that the autonomous agents are capable of adjusting their decisions based on the availability of the data and the type of the forward prediction tasks. In other words, the agents are able to make judgments such that it employs edges that contain low-dimensional mathematical expression when the regularization (avoiding the curse of high dimensionality) is more critical than high-dimensionality afforded by the large numbers of neural network nodes (in this case), but also able to select the high-dimensional neural network options when the advantages of the options outweigh the drawbacks (in the traction-separation law game in (Wang and Sun, 2019a)). Note that this optimal configuration sought by the meta-modeling game is sensitive to the available actions. For instance, the improvements of the neural network could be achieved by introducing techniques to filter out the noisy data and employing advanced neural networks with noise-resistant architecture (Sun et al., 2018). These changes can impose adjustments in the policy values and therefore affect the optimal configuration. The incorporation of de-noising mechanisms and the investigation of the influence of data quality on the meta-modeling game framework will be conducted in the future study.

This automated strategy change by the AI agents is significant as it demonstrates that the agent system is able to adapt to the environment (availability of calibration data and the types of testing data) to make rational choices like a human modeler should when given different prediction tasks of different complexities.

Another important implication of the meta-modeling game is its ability to quantitatively analyze the performance of families of models currently (or historically, if possible to be inferred from reverse engineering) available in the literature for an intended prediction task. Table 1 shows the post-game analysis of the performance of the 112 models automatically generated from the two-player game. The resultant models are grouped into five different classes based on the types of the edges used in the game. The interesting aspect of the data in Table 1 is that it provides users a quantitative measure that configurations based on generalized plasticity and critical state outperform all the other 90 configurations. This result is consistent with the convention understanding from soil mechanics in which the classical critical state plasticity theory and the resultant plastic dilatancy/contraction predictions is regarded as the key ingredient for predictive constitutive laws. Examinations on models in Class 1 also reveals that three-invariant generalized plasticity with critical state perform the best in the blind predictions, especially when the material states of the granular materials in the calibration tests (e.g. confining pressure, initial void ratio, stress path) are significantly different than the ones in blind predictions.

However, comparisons of the results in Classes 1, 2,3 and 4 shown in Figure 13 reveal a somewhat surprising conclusion in which the generalized plasticity seems to be consistently the more important ingredient than the critical state theory for yielding predictive models. Although it is important to stress that this conclusion must be interpreted with respect to the types and amount of data available and the intended prediction task, this result does provide further evidence to support the speculations that the generalized plasticity, if calibrated properly, does likely to improve the accuracy of blind predictions of the responses of granular materials in the monotonic triaxial compression tests (Zienkiewicz and Mroz, 1984; Pastor et al., 1990; Ling and Liu, 2003; Sánchez et al., 2005).

In conclusion, this numerical experiment shows that the meta-modeling game can provide three important types of knowledge, the knowledge on the hierarchy of information flow, the estimation on the amount of data required to reach the state-of-the-art performance for a given action space and specified objective, and the relative values/benefits/importance of each model/theoretical/data-driven components revealed in the post-game analysis.

Remark. Note that applying the meta-modeling game to predicting responses of granular materials under different water drainage conditions may likely yield a very different conclusion where machine learning edges could be more widely used in the optimal configuration. This is because of the lack of a constitutive model that can quantitatively capture the constitutive responses of granular materials in drained and undrained conditions (Gens and Potts, 1988; Manzari and Dafalias, 1997; Zienkiewicz et al., 1999; Pestana et al., 2002; Sun, 2013). The creation of models for more generic purposes and the estimation of the trusted range of application are both important issues, which will be considered in future studies but are out of the scope of this paper.

5.3 Numerical Experiment 3: AI-generated material models in finite element simulations

To demonstrate the applicability of the AI-generated models from the plays of the data collection/meta-modeling game presented in Numerical Experiment 2, we conduct finite element simulations of a plane strain compression test on a rectangular specimen. The geometry, mesh and the boundary conditions of the simulations are given in Figure 14. The specimen is initially consolidated to isotropic pressure of $p_{0}$ =-400kPa, and have a uniform initial void ratio of 0.536. The specimen is compressed from the top surface, while the constant pressure $p_{0}$ are maintained on the lateral surfaces. Slight imperfection is introduced at the middle of the specimen to trigger heterogeneous deformation and shear bands. Three simulations are performed with the material properties given by the three example models generated during the DRL in Numerical Experiment 2 (1st, 3rd and 4th digraphs in Figure 11).

The finite element implementation of the AI generated digraph-based model is simple and convenient. All modeling choices (Section 2) and the general purpose integration scheme (Algorithm 2) are already implemented in a single material model class. The FEM program is free to switch to other models simply by loading the digraphs and the corresponding calibrated parameters from the gameplay into this material class. The local distribution of the deviatoric strain $\epsilon_{s}$ and the volumetric strain $\epsilon_{v}$ in the specimen from the three models are compared in Figure 15 and Figure 16, respectively. The global differential stress - axial strain and volumetric strain - axial strain curves are compared in Figure 17. The results demonstrate that all the local constitutive models, regardless of the quality, can all be implemented in the finite element solver. As mentioned previous, this meta-modeling game can be easily incorporated in a new finite element solver architecture in which material library commonly used in the current paradigm is replaced by one single labeled directed multigraph and the conventional material identification process is replaced by the meta-modeling game such that both the optimal combination of model components and material parameters are simultaneously selected. Furthermore, the results also indicate that the qualities of the constitutive laws are continuously improved in each iteration of the meta-modeling game. In particular, we see that the correct type of shear band for dense granular assembles (dilatant shear band) is reproduced in the numerical specimens after 5th iterations (cf. Aydin et al. (2006); Sun (2013)), and the shear band mode converges in the 8th iteration.

6 Conclusion and future Perspectives

We introduce a new multi-agent meta-modeling game in which the experimental task, i.e. the generation of data, and the modeling task, the interpretation of data, are handled by two artificial intelligence agents. Mincing the collaboration of a pair of experimentalist and modeler collaborating to derive, implement, calibrate and validate a model to explain a path-dependent process, these two agents interact with each other sequentially and exchange information until either the model and data they reach the objectives or when further action does not generate a further reward. The major contribution of this research is as follows. First, we introduce the idea of using labeled directed multi-graph to mathematically represent the action space of the modeling agent. This action space can be expanded by adding plausible actions invented by previous human modelers or by generated new actions from deep neural networks or other machine learning methods. This invention therefore enables us to idealize the process of writing constitutive models as a continuous decision-making process in an action space of very high dimensions such that a pre-defined objective function can be maximized. As shown previously in work such as AlphaGo Silver et al. (2017c, b), using deep neural networks in a deep reinforcement learning framework to search proper actions from a very large number of possible moves is shown to achieve superior performance. To the best knowledge of the authors, this is the first time the ideas of using deep reinforcement learning applied on generating the knowledge graph and constitutive laws for history-dependent responses of materials.

The introductions of the graph, directed graph and labeled directed multigraph in the meta-modeling game enables us to derive a meta-modeling game more closely resemble a more human-like iterative cyclical scientific process through which information is continuously gathered, hypotheses are continuously tested and the plausible understanding is continually revised. The major elements of scientific methods used by human, including characterization (observation and measurement stored in vertices, definition stored in edges), hypotheses (selection of a particular form of edges and edge sets), predictions (the information flow from root to leave of the directed graph obtained from the meta-modeling game) and experiments are all incorporated and automated. This new approach produces a forecast engine that can make predictions, but more importantly has the ability to generate human-interpretable knowledge on the relationships amount different measurable physical quantities. This feature is significantly unique among other neural network approaches which often produce black-box models with no easy way to interpret the rationale of the predictions. It should be pointed out that models generated from the meta-modeling game do not discriminate the types of the edges used. They can be any operator that links the input to output, including but not limited to regression, support vector machine, neural network, mathematical expression or a bootstrapped version of them. These edges are only being formed by the AI when they are estimated to have higher policy value according to a specific objective function.

By introducing a gateway to merge existing and new models and introduce a seamless integration of data generation and data-driven discovery. Since the meta-modeling game stops when further action does not yield reward, this framework enables one to determine the best configuration of model one can possibly obtain within a well-defined action space for a given set of data. As shown in Section 5.2, this ability is not only important in making better predictions, but also help us identify the limitation of the action space. Given that modern constitutive laws have become increasingly complex and are often combined products of multiple material theories, concepts and assumptions created by different schools or theoretical backgrounds, the quantifiable policy values of the edges in the edge label set, if used properly, may enable us to pin down the relative values of each component of the constitutive laws while avoiding any potential implicit bias. As a result, the cooperative game enables us to make forward predictions while controlling the cost of generating the experimental data. Unlike other AI field which is largely driven by the exponential growth of available data, extracting an adequate amount of reliable experimental data remains a challenging task for the field of mechanics. The cooperative game designed in this paper does not only provide a tool to optimize the collaborations of the AI agents, but also shed lights on how to make productive scientific discovery through emulating the research progress in a setting where data generation can be costly.

In this work, we assume that the data obtained from experiments are perfect and without any significant noise. Furthermore, the meta-modeling game is also operated in a setting where the vertex set and the corresponding label are fixed. Future work will consider how to introduce quantifiable assurance of the meta-modeling game, incorporate sensitivity analysis in the validation and predictions, and quantify different types of uncertainties. For instance, one trains Bayesian neural network to generate edges that deliver not only deterministic predictions but also perform variational inference. By quantifying the sensitivity of the predictions, one may explore the weakness of the existing action space for both the modeler and experimentalist agents and use this knowledge to generate new actions. Work in this area is currently in progress.

7 Acknowledgments

The work of KW and WCS is supported by the Earth Materials and Processes program from the US Army Research Office under grant contract W911NF-18-2-0306, the Dynamic Materials and Interactions Program from the Air Force Office of Scientific Research under grant contract FA9550-17-1-0169, the nuclear energy university program from Department of Energy under grant contract DE-NE0008534, the Mechanics of Material program at National Science Foundation under grant contract CMMI-1462760, and the Columbia SEAS Interdisciplinary Research Seed Grant. The work of QD is supported in part by NSF CCF-1704833, DMS-1719699, DMR-1534910, and ARO MURI W911NF-15-1-0562. These supports are gratefully acknowledged. The views and conclusions contained in this document are those of the authors, and should not be interpreted as representing the official policies, either expressed or implied, of the sponsors, including the Army Research Laboratory or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation herein.

Bibliography102

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Andrade and Borja [2006] José E Andrade and Ronaldo I Borja. Capturing strain localization in dense sands with random density. International Journal for Numerical Methods in Engineering , 67(11):1531–1564, 2006.
2Asaro [1983] Robert J Asaro. Crystal plasticity. Journal of applied mechanics , 50(4b):921–934, 1983.
3Aydin et al. [2006] Atilla Aydin, Ronaldo I Borja, and Peter Eichhubl. Geological and mathematical framework for failure modes in granular rock. Journal of Structural Geology , 28(1):83–98, 2006.
4Bang-Jensen and Gutin [2008] Jørgen Bang-Jensen and Gregory Z Gutin. Digraphs: theory, algorithms and applications . Springer Science & Business Media, 2008.
5Bardet and Choucair [1991] JP Bardet and W Choucair. A linearized integration technique for incremental constitutive equations. International Journal for Numerical and Analytical Methods in Geomechanics , 15(1):1–19, 1991.
6Battaglia et al. [2018] Peter W Battaglia, Jessica B Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Vinicius Zambaldi, Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, Ryan Faulkner, et al. Relational inductive biases, deep learning, and graph networks. ar Xiv preprint ar Xiv:1806.01261 , 2018.
7Been et al. [1991] K Been, MG Jefferies, and J Hachey. Critical state of sands. Geotechnique , 41(3):365–381, 1991.
8Bonabeau [2002] Eric Bonabeau. Agent-based modeling: Methods and techniques for simulating human systems. Proceedings of the National Academy of Sciences , 99(suppl 3):7280–7287, 2002.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

A cooperative game for automated learning of elasto-plasticity knowledge graphs and models with AI-guided experimentation

Abstract

1 Introduction

1.1 Rationales of phenomenological relations

1.2 Data-driven approaches as alternatives

1.3 The hybridized theoretical/data-driven approach

1.4 Content organization

1.5 Notations and terminologies

Definition 1**.**

Definition 2**.**

Definition 3**.**

Definition 4**.**

Definition 5**.**

Definition 6**.**

Definition 7**.**

2 Meta-modeling: deriving material laws from a directed multigraph

2.1 Material modeling algorithm as a directed multigraph

Example 1.

2.2 Recasting the process of writing constitutive laws as selecting subgraphs in a directed multigraph

Example 2.

3 Two-player meta-modeling game for the discovery of elasto-plastic models through modeling and automated experiments

3.1 Data collection game for experimentalist agent

3.1.1 Game Board for Experimentalist

3.1.2 Game State for Experimentalist

3.1.3 Game Action for Experimentalist

3.1.4 Game Rule for Experimentalist

3.1.5 Game Reward for Experimentalist

3.1.6 Game Choices for Experimentalist

3.2 Meta-modeling game for modeler agent

3.2.1 Game Board for Modeler

3.2.2 Game State for Modeler

3.2.3 Game Action for Modeler

3.2.4 Game Rule for Modeler

3.2.5 Game Reward for Modeler

3.2.6 Game Choices for Modeler

3.2.7 Game Choice alternatives: training neural network edges

4 Deep reinforcement learning for the two-player meta-modeling game

5 Numerical Experiments

5.1 Numerical Experiment 1: Testing the ability of AI for reverse engineering constitutive laws

5.2 Numerical Experiments 2: Testing the ability of AI for forward predictions

5.3 Numerical Experiment 3: AI-generated material models in finite element simulations

6 Conclusion and future Perspectives

7 Acknowledgments

Definition 1.

Definition 2.

Definition 3.

Definition 4.

Definition 5.

Definition 6.

Definition 7.