Learning Agents in Black-Scholes Financial Markets: Consensus Dynamics and Volatility Smiles
Tushar Vaidya, Carlos Murguia, Georgios Piliouras

TL;DR
This paper models how traders learn implied volatility in Black-Scholes markets through opinion dynamics, proving convergence and bridging the gap between theoretical assumptions and market realities.
Contribution
It introduces novel learning agent models for volatility estimation and proves their convergence using control theory, addressing a key gap in financial market modeling.
Findings
Opinion dynamics converge under specified models
Models explain the emergence of volatility smiles
Bridges theory and market practice in volatility estimation
Abstract
Black-Scholes (BS) is the standard mathematical model for option pricing in financial markets. Option prices are calculated using an analytical formula whose main inputs are strike (at which price to exercise) and volatility. The BS framework assumes that volatility remains constant across all strikes, however, in practice it varies. How do traders come to learn these parameters? We introduce natural models of learning agents, in which they update their beliefs about the true implied volatility based on the opinions of other traders. We prove convergence of these opinion dynamics using techniques from control theory and leader-follower models, thus providing a resolution between theory and market practices. We allow for two different models, one with feedback and one with an unknown leader.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOpinion Dynamics and Social Influence · Complex Systems and Time Series Analysis · Game Theory and Applications
Learning Agents in Black-Scholes Financial Markets
Tushar Vaidya
SUTD
Carlos Murguia
TU/e
Georgios Piliouras
SUTD
Abstract
Black-Scholes (BS) is the standard mathematical model for European option pricing in financial markets. Option prices are calculated using an analytical formula whose main inputs are strike (at which price to exercise) and volatility. The BS framework assumes that volatility remains constant across all strikes, however, in practice it varies. How do traders come to learn these parameters?
We introduce natural agent-based models, in which they update their beliefs about the true implied volatility based on the opinions of other traders. We prove exponentially fast convergence of these opinion dynamics using techniques from control theory and leader-follower models, thus providing a resolution between theory and market practices. We allow for two different models, one with feedback and one with an unknown leader.
1 Introduction
Econophysics divides into two paradigms. Statistical econophysics relies on data, fitting certain power laws to existing asset prices at various time scales [76, 16]. In statistical econophysics, zero-intelligence agents have random interactions. Agents are homogenous and have no learning ability. The central object of study is historical price data. The viewpoint is that interacting zero-intelligence traders’ actions are already incorporated into price fluctuations. The focus is on the macroscopic aggregation in the form of available data. While this is an important area of research, agent-based models offer the opportunity to study the microscopic interactions in more detail. Here agents are heterogeneous.
Our objective is to offer a cogent and clear motivation for agent-based econophysics in the context of option volatilities, whereby learning and interaction are made explicit. To an outsider it may seem that financial assets are observed at one price, decided by the market. In reality, prices fluctuate throughout the day and there is no equilibrium price: it is always in flux. Interaction between strategic traders and other players is embedded in all transactions and informational channels. Interaction is vital to understanding markets. The motivation for this paper was inspired by the works of Kirman and Follmer [51, 36]. Rather than develop a full-blown game theoretic or mean-field model, we advocate something in between where interaction of traders is intrinsic. We aim to take a more nuanced view of agent-based econophysics as espoused by Chakroborti et al. [17].
Most trading is done electronically. To be dominant, firms now invest huge sums in technology to get an edge. For futures trading, speed is vital to profits. Trading complex derivatives requires not only speed but huge amounts of investment in quantitative models. This in turn feeds the need for mathematicians, computer scientists and engineers. Increasingly, over the last two decades the way trading is conducted has also seen drastic changes. Electronification of the markets has affected both instruments traded on and off exchange. Algorithmic trading drives not only plain vanilla instruments like stocks and futures but plays a crucial role in derivatives trading [6, 38, 87]. Furthermore, the distinction between stock exchanges and over-the-counter (OTC) markets is not as clear as it once was [59, 24, 80]. In OTC markets, trading is between two counterparties and there is no centralized marketplace. Increasingly, over the last decade there has been a regulatory push to make OTC markets more exchange-like. In over-the-counter markets, participants may see what their competitors are quoting for a particular security but volume and the actual price transacted remain the privy of the bilateral counterparties. In some quarters, OTC markets are usually referred to as being quote-driven or truly dark markets [32]. Regulation in the United States and European Union has resulted in fragmented exchange based trading but centralization of opaque OTC markets
1.1 Options Markets
Derivative contracts are actively traded across the world’s financial markets with a total estimate worth in the trillions of dollars. To get an intuitive understanding of the setting and the issues at hand, let’s consider the prototypical example of European options.
A European option is the right to buy or sell an underlying asset at some point in the future at a fixed price, also known as the strike. A call option gives the right to buy an asset and a put option gives the right to sell an asset at the agreed price. On the opposite side of the buyer is the seller who has relinquished his control of exercise. Buyers of puts and calls can exercise the right to buy or sell. Sellers of options have to fulfil obligations when exercised against. The payoff of a buyer of a call option with stock price at expiry time and exercise price is , whereas for a put option is .
To get a price we input the current stock price (e.g. K90), the expiry (e.g. three months from today) and the volatility in the Black-Scholes (BS) formula and out comes the answer, the quoted price of the instrument [21, 68, 47].
[TABLE]
Volatility, which captures the beliefs about how turbulent the stock price will be, is left up to the market. This parameter is so important that in practice the market trades European calls and puts by quoting volatilities.111Using the Black-Scholes formula with particular implied volatility, traders obtain a dollar value price.
Options can be struck at different strike prices on the same asset (e.g. K=\90,$75,$60T$ (e.g. 3 months) are the same, one would expect the volatility to be the same at different strikes. In practice, however, the market after the 1987 crash has evolved to exhibit different volatilities. This rather strange phenomenon is referred to as the smile, or smirk (see figure 1). Depending on the market, these smirks can be more or less pronounced. For instance, equity markets display a strong skew or smirk. A symmetric smile is more common in foreign exchange options markets. An excellent introduction to volatility smiles is given in [29].
How does the market decide about what the quoted volatility should be (e.g. for a stock index in 3 months from now) is a critical, but not well understood, question. This is exactly what we aim to study by introducing models of learning agents who update their beliefs about the volatility. Agent-based models on volatility-smile interaction and formation have not been thoroughly addressed in finance or econophysics. They remain a challenge [83]. Previous attempts have been made but the focus has never been on the mathematical or specific nature of interaction [86, 57]. Furthermore, our work takes into account the physicality of how trading occurs. An alternative perspective is offered in [56, 71], again though the nature of interaction is missing. Nevertheless, these early attempts offer a good indication that at least the problem has garnered significant interest in different disciplines.
1.2 Econophysics
The challenge for physicists is not to force existing physics-based models on human behaviour but rather develop new models [18, 46, 81]. To go from local microscopic interactions to global macroscopic behaviour is not an easy task [84, 77]. In fact, the choice of models seems infinite. There are a plethora of agent-based models [17, 81, 15]. Which one is correct? And moreover which type of social learning is representative of financial markets trading. Barron provides an early guide [54]. Agent-based models were proclaimed as the future for econophysics [35, 74]. While development in this area has been steady, the problem of the emergence of volatility smiles remains unresolved. The volatility smile is an active and vigorous area of research in the mathematical finance community. Many models postulate a stochastic process for the underlying stock and volatility combined.
1.2.1 Knightian Uncertainty
Risk and uncertainty are two different concepts [34, 52, 75]. Risky assets are those on which the probabilities of random events are well-defined and known. For instance, suppose we observe historical data of a stock price. Are we confident to say we know the distribution of the stock’s returns? If we are, then the stock is considered risky. Its risk is quantifiable. However, if we were unsure of even the correct probability measure, then we would be faced with uncertainty. In a sense, this captures the essence of financial markets. Traders and players use different probability measures. No such probability measure dominates. In incomplete markets, the choice of a correct probability measure such that a derivative contract is priced correctly is a subjective and quantitative exercise. In any case, no correct model exists [31, 50, 65, 20, 3]. As a result, participants in financial markets are free to choose whichever probability model they calibrate to market data [25, 22, 13].
The problem with economics based models and those in mathematical finance literature is that many times the analysis is centred on a representative agent. In case of risk and uncertainty, the choice of pricing a derivative contract boils down to choosing a correct equivalent martingale measure under which a derivative claim is replicable. For market-makers and dealers, the choice of models is vast. Each player has to make a choice and inevitably no two institutions will use the same models with the same parameters. In this case, it is remarkable that the market will aggregate the diverse beliefs to arrive at a consensus smile. At the microscopic level, though the dealers are observing each others’ updates. Hence, our model can be seen as a meta opinion dynamics framework built upon the individual choices of the dealers.
1.2.2 Financial markets: non-Bayesian
In financial markets, updating occurs at high frequency across geographic locations [88, 12]. Agents move simultaneously: cancellations are the norm [42, 89, 33]. In practical terms, sequential Bayesian learning models don’t seem appropriate [44, 63]. Bayesian observational learning examples include [7, 9] and [82]. These models are sequential in nature. They study herd behaviour. As time passes, a player in turn observes the actions of previous agents and receives a private signal. Each agent has a one-off decision when she updates her posterior probability and takes an action. In some instances, the th agent may reach the truth as .
In Degroot learning, myopic updating occurs in each iteration. Agents in our setup have fixed weights but update their responses until consensus is reached. Recently there have been some experimental papers on the evidence of Degroot updating [19, 8]. Repeated averaging models are our base precisely because they capture the nature of interaction and learning in financial markets so compactly. Players can observe previous choices but not the payoffs of their competitors. A more in depth discussion of learning in games would take us further away from our goal of studying the mathematical nature of interaction. The reader can consult [37, 48] for a game theoretic perspective.
Our contribution. We introduce two different classes of learning models that converge to a consensus. Our interest is not in equilibrium but what process lead to it [69, 70, 58]. The first introduces a feedback mechanism (Section 3.1, Theorem 3.1) where agents who are off the true “hidden” volatility parameter feel a slight (even infinitesimally so) pull towards it along with the all the other “random” chatter of the market. This model captures the setting where traders have access to an alternative trading venue or an information source provided by brokers and private message boards. The second model incorporates a market leader (e.g. Goldman Sachs) that is confident in its own internal metrics or is privy to client flow (private information) and does not give any weight to outside opinions (Section 3.3,Theorem 3.4). Proving the convergence results (as well as establishing the exponentially fast convergence rates) requires tools from discrete dynamical systems. We showcase as well as complement our theoretical results with experiments (e.g. Figures 2.a-2.d), which for example show that if we move away from our models convergence is no longer guaranteed.
We formalize the multi-dimensional analogues of our two models above using Kronecker products (Section 4, Theorems 4.1 and 4.3). Thus our models show how a volatility curve could function as a global attractor given adaptive agents. We conclude the paper by discussing future work on identifying necessary structural conditions on the shape of arbitrage free volatility curves.
2 Model description
In mathematical opinion dynamic models, agents take views of other agents into account before arriving at their own updated estimate. Agents can observe other agents’ previous signals.
Degroot [27] was one of the early developers of such observational learning dynamics. While simple, these models allow us to examine convergence to consensus. In a sense, these type of models are called naive models, as agents can recall perfectly what the other players submitted in the previous round. See the survey papers [60, 4, 41, 67].
2.1 Volatility Basics
Investors have an initial opinion of the implied volatility, which subsequently gets updated after taking into account volatilities of other agents. A feedback mechanism aids the agents in arriving at the true volatility parameter.
At all times the focus is on a static picture of the volatility smile. Within this static framework agents are updating their opinion of the true implied volatility. This updating occurs in a high-frequency sense. In an exchange setting, one can think of all bids and offers as visible to agents. The agents initially are unsure of the true value of the implied volatility, but by learning - and feedback - get to the true parameter. Our first attempt is a naive learning model common in social networks. Learning occurs between trading times. Thus our implicit assumption is that no transactions occur while traders are adjusting and learning each others quotes.
This rather peculiar feature is market practice. Trading happens at longer intervals than quote updating. This is as true for high frequency trading of stocks as it is for options markets. Quotes and prices - or rather vols - are changing more frequently than actual transactions.
Each dollar value of an option corresponds to an implied volatility parameter that depends on strike and expiry. Implied volatility is quoted in percentage terms.
Assumption 2.1**.**
We have three types of players: agents/traders, brokers and leaders. Brokers give feedback to the traders. The ability of agents to determine this feedback is their learning ability. Leaders are unknown and don’t give feedback but their quotes are visible.
Each agent takes a weighted average of the all the agents’ estimates of volatility at a particular strike and expiry.
2.2 Naive Opinion Dynamics
A first approach towards opinion dynamics is to assume each agent takes a weighted average of other agents’ opinions and updates his own estimate of the volatility parameter for the next period, i.e., at time , the opinion of the -th agent is given by
[TABLE]
where is the opinion of agent at time and denotes the opinion weights for the investors with and for all . Define ; then, the opinion dynamics of the agents can be written in matrix form as follows
[TABLE]
where is a row-stochastic matrix.
Definition 2.2** (consensus).**
The agents (2) are said to reach consensus if for any fixed initial condition , as for all .
Definition 2.3** (consensus to a point).**
The agents (2) are said to reach consensus to a point if for any initial condition , , where denotes the vector composed of only ones and . The constant is often referred to as the consensus value.
For the opinion dynamics , we introduce the following result by [27] (see also [73] for definitions).
Proposition 2.4**.**
Consider the opinion dynamics in equation (2). If is aperiodic and irreducible, then for any initial condition consensus to a point is reached. The consensus value depends on both the matrix and the initial condition .
Remark 2.5**.**
Proposition 2.4 implies that if the row stochastic opinion matrix is aperiodic and irreducible; then all the agents converge to some consensus value . However, since depends on the unknown initial opinion , the consensus value is unknown and, in general, different from the true volatility . We wish to alleviate this and thus introduce two novel models.
3 Consensus (scalar agent dynamics)
In this section, we assume that the agents are able to learn how far off they are from the true volatility by informational channels in the marketplace. There are many avenues, platforms and private online chat rooms that provide quotes for option prices; some of these are stale and some are fresh. The agents’ learning ability determines the quality of the feedback from all these sources. In reality, options are not traded on one exchange or platform. There are multiple venues and though there might be a dominant marketplace, the same instruments can be traded across different venues and locations. We aggregate all of this information in the form of feedback with learning ability. If agents are fast learners, they adjust their volatility estimates quickly.
3.1 Consensus with Feedback
We model this feedback by introducing an extra driving term into the opinion dynamics (1). An early model developed by Mizuno et al. [62] shares some similarities to ours. Traders use feedback from past behaviour. Our model is a discrete autoregressive process but the focus is on learning in high-frequency time [61]. Furthermore, our model formalizes this in a more social and dynamical setup. In particular, we feedback the difference between the agents’ opinion and the true volatility scaled by a learning coefficient . We assume that is invariant, i.e., for some fixed , for some fixed strike and maturity . Then, the new model is written as follows
[TABLE]
or in matrix form
[TABLE]
where . Then, we have the following result.
Theorem 3.1**.**
Consider the opinion dynamics (4) and assume that , ; then, consensus to is reached, i.e., .
Proof.
It is easy to verify that the solution of the difference equation (4) is given by
[TABLE]
By Gershgorin circle theorem, the spectral radius for all , . It follows that , where denotes the identity matrix of dimension , and , see [45]. The matrix is row stochastic; then, , where denotes the vector composed of only zeros. Hence, we can write ; and consequently . It follows that
[TABLE]
and the assertion follows. ∎
Corollary 3.2**.**
Consensus to is reached exponentially with convergence rate , i.e., , , where denotes the matrix norm induced by the vector infinity norm.
Proof.
Define the error sequence E_{t-1}:=(X_{t-1}-\bar{\sigma}\mathbf{1}_{n})\in\mbox{{\mathbb{R}}}^{n}. Then, from (4), the following is satisfied:
[TABLE]
The last equality in the above expression follows from the fact that , because is a stochastic matrix. The solution of the above difference equation is given by , where denotes the initial error. Let , , where . Note that exponential convergence of implies exponential convergence of itself. Using the solution , the following can be written:
[TABLE]
where denotes the matrix norm of induced by the vector infinity norm [45]. The inequality implies exponential convergence if . Because and , we can compute as \lVert(A-\mathcal{E})\rVert_{\infty}=\max_{i}\big{(}\sum_{j=1,j\neq i}^{n}|a_{ij}|+|a_{i}-\epsilon_{i}|\big{)}, . The matrix is stochastic, which implies and ; therefore, under the conditions of Theorem 3.1 (i.e., ), \lVert(A-\mathcal{E})\rVert_{\infty}=\max_{i}\big{(}\sum_{j=1,j\neq i}^{n}|a_{ij}|+|a_{i}-\epsilon_{i}|\big{)}<1 and hence exponential convergence of the consensus error can be concluded with convergence rate given by \lVert(A-\mathcal{E})\rVert_{\infty}=\max_{i}\big{(}\sum_{j=1,j\neq i}^{n}|a_{ij}|+|a_{i}-\epsilon_{i}|\big{)}. ∎
3.2 Random case
Under suitable random conditions for the trust matrix and , we can still have consensus. In this case, the learning rates and and weights are independently and identically distributed from each iteration. However we need a condition to ensure convergence, namely that on average the learning rates are less than self-belief, condition. Since this is only in expectation, a probabilistic statement, there is some leeway on the learning rates being strictly less than self-belief at time .
Theorem 3.3**.**
Consider the opinion dynamics
[TABLE]
where and are independent and identically distributed (iid). Furthermore suppose
[TABLE]
then, consensus to is reached, i.e., .
Proof.
We rewrite the above iteration, by subtracting , from both sides, dropping the one vector notation as the context is clear
[TABLE]
where and . We want to show . To this end, iterating the above recursion we arrive at
[TABLE]
Taking norms on the above equation, gives us the following inequalities, understanding that we mean the norm:
[TABLE]
The first inequality follows by sub-multiplicative property of matrix norms. Moreover, by the law of large numbers , which is negative by assumption. So the exponent ensures that, as the initial opinion is finite,
[TABLE]
Consequently, and every agent reaches consensus.
∎
Note we don’t require the stronger condition that for all . Unlike the deterministic case, the random case allows considerable flexibility. Neither self-belief nor positive learning is required for all times. However, there must be some interaction and learning for beliefs to converge. As matrix products don’t commute, if we were to follow the full recursion in any of our dynamics the result would be long matrix products. Random matrix products and dynamics are an active area of research not only in mathematics but also in physics [30, 23, 11, 39]. While the random case is certainly interesting, in this article our focus is on the first steps of modelling interaction and learning dynamics.
3.3 Consensus with an unknown leader
One criticism of model (4) is that feedback, even if it is not perfect, has to be learned. In practice, there might not be a helpful mechanism that provides feedback. An alternative is to have an unknown leader embedded in the set of traders. The agents are unsure who the leader is but by taking averages of other traders, they all arrive at the opinion of the leader. In markov chain theory, such behaviour is called an absorbing state. The leader guides the system to the true value. We assume that the identity of the leader is unknown to all agents.
Without loss of generality, we assume that the first agent (with corresponding opinion ) is the leader; it follows that , , , and . Then, in this configuration, the opinion dynamics is given by
[TABLE]
with , , for all , and for at least one , .
Theorem 3.4**.**
Consider the opinion dynamics (7) and assume that the matrix is substochastic and irreducible. It holds that , i.e., consensus to is reached.
Proof.
Define the invertible matrix
[TABLE]
Introduce the set of coordinates . Note that , . Hence, if the error vector , then consensus to is reached. Note that
[TABLE]
where denotes the zero vector of appropriate dimensions and as defined in (7). By construction, ; hence, the consensus error satisfies the following difference equation
[TABLE]
and the solution of is then given by .
Because for at least one , and is substochastic and irreducible, the spectral radius , see Lemma 6.28 in [73]; it follows that . Therefore, and the assertion follows. ∎
Corollary 3.5**.**
Let denote some matrix norm such that (such a norm always exists because under the conditions of Theorem 3.4). Then, consensus to is reached exponentially with the convergence rate given by , i.e. , for and some positive constant C\in\mbox{{\mathbb{R}}}_{>0}.
Proof.
See Lemma 5.6.10 in [45] on how to construct such a . Now consider the consensus error defined in the proof of Theorem 3.4, which evolves according to the difference equation (8). It follows that , where denotes the initial consensus error. Under the assumptions of Theorem 3.4, . By Lemma 5.6.10 in [45], implies that there exists some matrix norm, say , such that . We restate the error with norms and obtain . Because all norms are equivalent in finite dimensional vector spaces (see Chapter 5 in [45]), for some positive constant C\in\mbox{{\mathbb{R}}}_{>0}. As , the norm of the consensus error converges to zero exponentially with rate . ∎
4 Consensus (vectored agent dynamics)
In this section, we suppose that agents have beliefs over a range of strikes. Thus, each agent’s opinion of the volatility curve is a vector with each entry corresponding to a particular strike. Typically, in markets, options are quoted for at-the-money (atm) and for two further strikes left of and right of the atm level. Here, we examine the case of strikes and agents, i.e., each agent now has quotes for different moneyness levels. In this configuration, the true volatility is . See figure 1 (b).
4.1 Consensus with Feedback
Again, we assume that each agent takes a weighted average of other agents’ opinions and updates its volatility estimate vector for the next period, i.e., at time , the opinion of the -th agent is given by
[TABLE]
where denotes the learning coefficient of agent , is the opinion of agent at time , and denotes the opinion weights for the investors with and for all . In this case, the stacked vector of opinions is , . The opinion dynamics of the agents can then be written in matrix form as follows
[TABLE]
where is a row-stochastic matrix, , and denotes Kronecker product. We have the following result.
Theorem 4.1**.**
Consider the opinion dynamics in (10) and assume that , ; then, consensus to (with ) is reached, i.e., .
Proof.
Define the error sequence . Note that implies that consensus to is reached. Given the opinion dynamics (10), the evolution of the error satisfies the following difference equation
[TABLE]
It is easy to verify that, because is stochastic, . Then, the error dynamics simplifies to
[TABLE]
and consequently, the solution of (11) is given by . By properties of the Kronecker product and Gershgorin’s circle theorem, the spectral radius for . It follows that , see [45]. Therefore, and the assertion follows. ∎
Corollary 4.2**.**
Consensus to is reached exponentially with the convergence rate given by , i.e., .
The proof of the above result is very similar to previous corollaries and is omitted.
4.2 Consensus with an unknown leader
Similarly to the scalar case; here, we assume that there is a leader driving all the other agents through the opinion matrix . Again, without loss of generality, we assume that the first agent (with corresponding opinion ) is the leader, , , , and . Then, in this configuration, the opinion dynamics is given by
[TABLE]
with , , for all , and for at least one , .
Theorem 4.3**.**
Consider the opinion dynamics (12) and assume that the matrix is substochastic and irreducible; then, consensus to is reached, i.e., .
The proof of Theorem 4.3 follows the same line as the proof of Theorem 3.4 and it is omitted here.
Corollary 4.4**.**
Let denote some matrix norm such that , then consensus to is reached exponentially with convergence rate , i.e. , for some positive constant C\in\mbox{{\mathbb{R}}}_{>0}.
5 Numerical Simulations
Consider the opinion dynamics with feedback (4) with ten agents (i.e., ) , and initial condition
[TABLE]
In both exchange-based and OTC markets it is easy to ascertain who the main market-makers are for options on single stock or commodity [43, 10]. Option market-makers are usually investment banks and big trading houses. In this sense, the number of players is not large and thus the models developed always have a finite number of agents, .
Figure 2 depicts the obtained simulation results for different values of the learning parameters , . Specifically, Figure 2 (a) shows results without learning, i.e, (here there is no consensus to ), Figure 2 (b) depicts the results for . As stated in Theorem 3.1, consensus to is reached. Figure 2(c) shows results for with and otherwise, . Note that, in this case, the value of violates the condition of Theorem 3.1 (i.e., ) and, as expected, consensus is not reached. Next, consider the opinion dynamics with leader (7) with and initial condition
[TABLE]
For the leader case, the opinion weights matrix is constructed by replacing the first row of by . The corresponding matrix (defined in 7) is substochastic and irreducible, and , . Hence, all the conditions of Theorem 3.4 are satisfied and consensus to is expected. Figure 2(d) shows the corresponding simulation results. Finally, Figure 3 shows the evolution of the vectored opinion dynamics (10) with and (i.e., ten three dimensional agents), matrix as in the case with feedback, (vectored) volatility , learning parameters for as in , and initial condition with as in the first experiment above.
6 Arbitrage Bounds
We have taken the true volatility parameter as exogenous to our models. Our only requirement is that there is no static arbitrage, by which we mean that all the quotes in volatility which translate to option prices are such that one cannot trade in the different strikes to create a profit. Checking whether a volatility surface is indeed arbitrage free is non-trivial, nevertheless some sufficient conditions are well known [14, 40, 85]. As long as the volatility surface satisfies them our analysis implies global stability towards an arbitrage free smile.
We parameterize the volatility function (assuming expiry are fixed) and denote the option price as
[TABLE]
Our attention is on varying , to ensure no static arbitrage. We assume that the translates into unique call option dollar prices, which follows from the strictly positive first derivative of the option price with respect to .
- •
Condition 1: (Call Spread) For , we have
- •
Condition 2: (Butterfly Spread) For ,
How these arbitrage-free curve volatility conditions are developed is not an easy task: see an account by [72, 55]. Delving into this topic would take us further into stochastic analysis and away from the focus of this paper.
7 Connections and Conclusion
Recently, there has been some rather interesting work on the intersection of computer science and option pricing. Demarzo et al. [28] showed how to use efficient online trading algorithms to price the current value of financial instruments, deriving both upper and lower bounds using online trading algorithms. Moreover, Abernethy et al. [2, 1] developed Black-Scholes price as sequential two-player zero-sum game. Whilst these papers made an excellent start to bridge the gap between two different academic communities - mainly mathematical finance and theoretical computer science - they do not address the reality of volatility smiles and trading. Our contribution can be viewed as making these connections more concrete. The smile itself is a conundrum and there have even been articles questioning whether it can be solved [5]. The traditional way from the ground up is to develop a stochastic process for the volatility and asset price, possibly introducing jumps or more diffusions through uncertainty [49, 53]. Such models have been successfully developed, but the time is ripe to incorporate multi-agent models with arbitrage free curves.
Combining learning agents in stochastic differential equation models [78], such as the Black-Scholes model, is an exciting proposition. Moreover, opinion dynamics as a subject on its own has been studied quite extensively. Recent references that present an expansive discussion in computer science are [64, 58]. Econophysics is the right community to develop new models. After all, there is no attachment to utilities of players or stochastic volatility models so beloved in the mathematical finance community. Free from these shackles, researchers can use a range of tools and techniques to build more sophisticated models. Moreover, there is no restriction or debate on continuous or discrete time. While our framework is discrete, a continuous time could perhaps show a way forward to incorporate models from mathematical finance and financial economics [66, 26, 79]. The technical issues in random matrix products, briefly discussed in this paper, assure us that much more work needs to be done on the modelling and mathematical front. For example, the matrices and can be dependent with correlation decreasing in time. The random case contraction would still hold.
In this paper, we introduce models of learning agents in the context of option trading. A key open question in this setting is how the market comes to a consensus about market volatility, which is reflected in derivative pricing through the Black-Scholes formula. The framework we have established allows us to explore other areas. Thus far, we took the smile as an exogenous object, proving convergence to equilibrium beliefs. A natural step forward would be to look at the beliefs as probability measures, where each measure corresponds to a different option pricing model. Our learning models focus on interaction between agents. Actually, agents can be interpreted as algorithms. Each algorithm corresponding to a particular belief of a pricing model.
Acknowledgements
The authors would like to thank Elchanan Mossel, Ioannis Panageas, Ionel Popescu and JM Schumacher for fruitful discussions. Tushar Vaidya would like to acknowledge a SUTD Presidential fellowship. Carlos Murguia would like to acknowledge the National Research Foundation (NRF), Prime Minister’s Office, Singapore, under its National Cybersecurity R&D Programme (Award No. NRF2014NCR-NCR001-40) and administered by the National Cybersecurity R&D Directorate. Georgios Piliouras would like to acknowledge SUTD grant SRG ESD 2015 097 and MOE AcRF Tier 2 Grant 2016-T2-1-170.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] J. Abernethy, P. L. Bartlett, R. Frongillo, and A. Wibisono , How to hedge an option against an adversary: Black-scholes pricing is minimax optimal , in Advances in Neural Information Processing Systems, 2013, pp. 2346–2354.
- 2[2] J. Abernethy, R. M. Frongillo, and A. Wibisono , Minimax option pricing meets black-scholes in the limit , in Proceedings of the forty-fourth annual ACM symposium on Theory of computing, ACM, 2012, pp. 1029–1040.
- 3[3] B. Acciaio, M. Beiglböck, F. Penkner, and W. Schachermayer , A model-free version of the fundamental theorem of asset pricing and the super-replication theorem , Mathematical Finance, 26 (2016), pp. 233–251.
- 4[4] D. Acemoglu and A. Ozdaglar , Opinion dynamics and learning in social networks , Dynamic Games and Applications, 1 (2011), pp. 3–49.
- 5[5] E. Ayache, P. Henrotte, S. Nassar, and X. Wang , Can anyone solve the smile problem , The Best of Wilmott, (2004), p. 229.
- 6[6] V. Bacoyannis, V. Glukhov, T. Jin, J. Kochems, and D. R. Song , Idiosyncrasies and challenges of data driven learning in electronic trading , NIPS workshop 2018: Challenges and Opportunities for AI in Financial Services: the Impact of Fairness, Explainability, Accuracy, and Privacy, (2018).
- 7[7] A. V. Banerjee , A simple model of herd behavior , The quarterly journal of economics, 107 (1992), pp. 797–817.
- 8[8] J. Becker, D. Brackbill, and D. Centola , Network dynamics of social influence in the wisdom of crowds , Proceedings of the national academy of sciences, 114 (2017), pp. E 5070–E 5076.
