Learning from the past in an irreversible investment problem
Topias Tolonen-Weckstr\"om

TL;DR
This paper models an irreversible investment problem where learning from past information influences the timing of investments, using a recursive stopping problem approach with explicit boundaries.
Contribution
It introduces a novel recursive framework for investment decisions involving learning from past information, with semi-explicit solutions for optimal stopping boundaries.
Findings
Existence of one-sided stopping boundaries at each recursion step
Optimal investment strategy characterized by a sequence of semi-explicit boundaries
Numerical solutions and comparative statistics validate the approach
Abstract
We consider an irreversible investment problem under incomplete information, where the investor decides whether and when to make investments in a project. Upon investment, the investor acquires previously hidden information from the project's past (''learning from the past''), and so the learning rate of the problem is controlled by investing. We set up this original problem as an recursively defined stopping problem, where the learning rate is accelerated after each recursion step. To solve the problem, we show that at each step, there indeed exists a one-sided stopping boundary under general conditions. We proceed to present the optimal investment strategy as a sequence of semi-explicit stopping boundaries derived from smooth fit conditions. Feasibility of our approach is then demonstrated by solving boundaries numerically and by illustrating comparative statistics.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic processes and financial applications · Capital Investment and Risk Analysis · Auction Theory and Applications
Learning from the past in an irreversible investment problem
Topias Tolonen-Weckström Department of Mathematics, Uppsala University. Box 256, 75105 Uppsala, Sweden. Email address:[email protected].
(September 30, 2025)
Abstract
We consider an irreversible investment problem under incomplete information, where the investor decides whether and when to make investments in a project. Upon investment, the investor acquires previously hidden information from the project’s past (”learning from the past”), and so the learning rate of the problem is controlled by investing. We set up this original problem as an recursively defined stopping problem, where the learning rate is accelerated after each recursion step. To solve the problem, we show that at each step, there indeed exists a one-sided stopping boundary under general conditions. We proceed to present the optimal investment strategy as a sequence of semi-explicit stopping boundaries derived from smooth fit conditions. Feasibility of our approach is then demonstrated by solving boundaries numerically and by illustrating comparative statistics.
Keywords: irreversible investment, incomplete information, recursive optimal stopping, free-boundary problem, control of learning rate, project acquisition.
Mathematics Subject Classification 2020: 60G40, 93E11, 91G99.
1 Introduction
Consider a Bayesian decision-maker (investor) whose objective is to decide whether and when to make irreversible investments in a project. The decision-maker makes noisy observations of the project value under incomplete information and knows that after each investment, they will learn more about the project. We call this notion learning from the past and it is the main point of interest in our study.
We model such a problem as
[TABLE]
where the process represents the observation process with , a standard Brownian motion , and an unknown project value assuming a Bernoulli distribution with two possible values . In (1), induces the learning-from-the-past effect, is a positive constant denoting amount of learning per unit of investment, and is an increasing control process with and . The objective of the investor is then to control to maximize
[TABLE]
where is a known discount rate.
Intuitively, such a problem is solved by finding a suitable stopping boundary. However, we note that there are problems in the above problem formulation as the information available to the investor depends on the control process , which in turn should depend on the available information, creating a circular feedback between the observation process and admissible controls adapted to it. Also, the continuous formulation is challenging as it leads to rather involved regularity considerations along the curved boundary. Instead of formulating this problem precisely, in order to gain mathematical tractability and to focus our efforts on the study of the learning-from-the-past effect, we choose to study a discrete version of the problem.
In particular, we restrict the possible investment levels to only attain discrete values so that with decreasing, , and . Here the index in indicates that there are remaining investment possibilities, so is the initial level of investment and is the investment level after the last possible investment. Under such restriction, the control problem described in (1)–(2) collapses into a stopping problem. The investor seeks a sequence of investment times in order to optimize
[TABLE]
where . We set up this problem properly in Section 2.
Learning-from-the-past effect arises naturally in applications of irreversible investment problems. In particular, it is a natural model for cases of project acquisition, where the investor, upon acquisition, learns about the project’s (for example, a company) hidden intangible assets not accounted for in its public financial statements. When accessing the project as an insider, the investor gains insider information about such assets, which can include previously unaccounted and publicly hidden goodwill, intellectual property, human economic value (human resources), and organizational culture.
Our primary contribution is to set up and study an original problem of this type, deriving semi-explicit solutions for the optimal stopping problem. In our recursive problem formulation under incomplete information and additional learning, we show that at each investment step where the future values of subsequent investments are enveloped, there exists a one sided stopping boundary. To make the standard methods of optimal stopping go through, we provide a careful analysis of properties of different payoff functionals. We show that the boundaries characterizing the optimal investment times exist and are well-defined, as well as provide equations from which the boundaries can be solved numerically. Moreover, we demonstrate that the solution concept is feasible by providing numerical examples and comparative statistics.
1.1 Related literature
Our model of learning-from-the-past is original to our article. The model provides a new way of controlling learning rate in a optimal stopping problem in an irreversible investment setting under incomplete information.
Investment and utility maximization problems under incomplete information are well studied in the field of stochastic control. An early contemporary reference that combines stochastic control with incomplete information is [16]. A key study in early investment problems within the field is [6], where an investment timing problem under incomplete information with respect to an option payoff functional is studied. General investment timing problems are examined, for example, in [4], [5], and [18], while [11] examines an irreversible investment problem by characterizing the free boundary as the unique solution of an integral equation. Investment problems with a Bayesian setting under incomplete information are discussed for example in [14] and [24]. In particular, [24] discusses the relationship between belief of a favorable market and investment timing, closely resembling our set-up. In a recent effort, [12] studies an irreversible investment problem under incomplete information, where the investment is modeled as a geometric Brownian motion.
Main point of interest in our model, rarely discussed in optimal control research, is that the decision-maker controls the learning rate. Such stochastic control problems have been discussed only recently. Statistical problems of this type are considered in [3] with a problem of quickest detection with reversible controls, [7] with an estimation problem with costly observations, and [8] with a detection problem with irreversible controls and a linear cost to increase observation rate. [10] incorporates a irreversible investment problem, where an investment directly affects the drift coefficient of the observation process.
We construct a recursively defined stopping problem but initially motivate our model as a multiple stopping problem. Connection between the two is discussed in [1] under relatively general assumptions. In addition, [2] discusses multiple optimal stopping for American swing options, largely resembling our set-up.
The set-up in [10] also models a learning feature in an irreversible investment problem. In their set-up, they consider an example of project expansion. Upon investing, the investor begins to learn at an accelerated rate due to gaining more capacities of observing the development of the market, product testing, and realized demand or production costs, for example by setting up a new production unit. There is a crucial difference between their learning and our learning-from-the-past effect. In our model, the key application considers learning by acquiring already established units. Moreover, opposed to their set-up, the investor can’t directly affect the diffusion coefficient of the output process directly by increasing their investment level. Instead, the additional learning is modeled by speeding up the observation process upon investment. Realization of this accelerated process then reveals previously hidden information, inducing the learning-form-the-past effect. Despite these differences, both of the models study problems of irreversible investment under incomplete information, where the amount of learning is both controlled and monotone upon investment: investing more yields more information to the investor.
1.2 Structure of the article
The remainder of this article is organized as follows. In Section 2 we set up the model and define the key concepts we use to build our model. More specifically, we introduce a recursive stopping problem and introduce the learning-from-the-past effect as an accelerated learning rate, which the investor uses to evaluate evaluates the value of subsequent steps in the recursion. In Section 3, a candidate solution is characterized in terms of an optimal investment strategy and the corresponding stopping boundaries. In Section 4, we show that each investment step induces a one-sided stopping boundary. Main results are found in Section 5, where verification result for each step is provided together with a main theorem which shows that individual verification results go through when combined recursively, resulting in an optimal investment strategy for our problem. Finally, we illustrate our theoretical results with key numerical examples in Section 6.
2 Problem set-up
Let be a diffusion process with dynamics
[TABLE]
where is a known constant and is a standard Brownian motion defined on a probability space .
We consider an investor who is facing an optimization problem of investing to a project. The project value takes possible values and with . We model incomplete information, i.e. the lack of the investor’s information on , by letting the investor only observe realizations of . Let be the completion of the -algebra . Then, based on , the investor is interested in determining optimal investment times to maximize their value based on the unknown project value . We assume the admissible investment levels to be of the form
[TABLE]
and that each upon each investment, the investment level is raised from to . That is, the possible levels of investment is a decreasing sequence with and .
To characterize the investor’s learning about by observing realizations of , we define the belief process of the decision-maker as the conditional probability
[TABLE]
From standard literature on filter theory, see for example [17], we find that the dynamics of the process can be described as
[TABLE]
where
[TABLE]
is the so-called innovations process (an Brownian motion), is the signal-to-noise ratio, and is a known constant representing the investor’s prior information that (i.e. the probability ).
It is well-known that is a strong Markov process as it solves (7) (see, for example, [19]), and so we may embed the problem into a Markovian setting, and additionally optimization over stopping times coincides with optimization over stopping times ( being the completion of ).
Let be fixed so that for all . Then, for any stopping time , it follows from the tower property of conditional expectation and the Markov property of that
[TABLE]
where . (Observe that optimizing over the left-hand side and the right-hand side of (9) coincide).
Now consider the stopping problem (3) presented in Section 1. Upon the last possible investment, the investor wants to find an stopping time to solve
[TABLE]
Then, upon the previous investment, it is intuitively clear (see [1] for a general reduction of a multiple stopping problem) that the investor optimizes over a discounted payoff together with an expected value of the remaining investment. That is, the investor solves
[TABLE]
where denotes the additional units of observing the process .
These steps can be propagated up to steps, and so solving for (3) reduces into solving a recursively defined stopping problem
[TABLE]
for , where
[TABLE]
[TABLE]
and is an stopping time.
In Sections 3–5, we treat the problem (10).
Remark 2.1**.**
In (12), denotes a conditional expectation of evaluated over a strong Markov process starting from the value and diffusing for units ( is analogous to in (1) by letting .). The conditional expectation is a function of and it denotes an expectation of the value function over the diffusion process
[TABLE]
for some starting point , and so it models the learning-from-the-past effect as delayed information after stopping (see, for example, [20] for treatment of an optimal stopping problem with delayed information).
3 Finding a candidate solution
We first have the following result.
Lemma 3.1**.**
Let , and be as in (10)–(12). Then, for all , the following hold:
- (i)
, , and are convex functions, 2. (ii)
.
Proof.
We note that is convex. By arguments for preservation of convexity for martingale diffusion processes in [15], an expected value is convex in for every fixed time-point provided that is a convex function. Moreover, by a Bermudan approximation argument (see [9]), preservation of convexity extends to the corresponding stopping problem, so is convex. Then, by Jensen’s inequality, , and clearly . Convexity of follows from convexity of .
Next, assume that , and are convex. It follows that is also convex, and repeating the preservation of convexity and Bermudan approximation arguments yields that is convex, and so the convexity of follows. It follows that Jensen’s inequality asserts . That is, by induction, we have that , , and are convex for all , and .
Moreover, since , we have
[TABLE]
Assuming that
[TABLE]
one sees that and , and then . The second statement thus follows by induction. ∎
Remark 3.2**.**
Note that it follows from the bounds presented in Lemma 3.1 (ii) that and . Moreover, Lemma 3.1 implies that the first derivative of is bounded.
For an illustration of the relationship between and presented in Lemma 3.1, see Figure 1.
Remark 3.3**.**
The function in Figure 1 is produced numerically. For a short discussion on numerical methods used in this article, see Remark 6.1.
Case
Consider
[TABLE]
as given in equation (10). Since this is a value function of a call option type, we expect the optimal strategy to be given by a stopping time
[TABLE]
for some boundary . By standard methods in optimal stopping theory and dynamic programming (see, for example, [23]), one expects to solve the corresponding free-boundary problem:
[TABLE]
where the differential operator is given by
[TABLE]
The general solution to the ODE in (13) is of type
[TABLE]
for constants and , where and are the positive and the negative solutions to the quadratic equation . From the boundary condition at we see that . We denote
[TABLE]
for which
[TABLE]
Using this notation, the value function assumes the form
[TABLE]
in the continuation region. Plugging in the boundary conditions at yields
[TABLE]
By noting that
[TABLE]
one uses the smooth fit equations (17) to derive
[TABLE]
This corresponds to the candidate value function assuming the form
[TABLE]
Verifying that is straightforward, see Proposition 5.1 in Section 5.
Case
Now, for a general , consider
[TABLE]
as given in (10). Similarly as above, for each , we expect the optimal stopping time to be of the form
[TABLE]
for some boundary . In particular, we expect the value function to solve the free boundary problem
[TABLE]
As in the case above, the smooth-fit guess gives us
[TABLE]
for a constant and as in (15). This yields an equation
[TABLE]
If the smooth fit equation (22) admits a unique solution , we then define a candidate value function as
[TABLE]
The relationship between , , , and in the case is illustrated in Figure 2. From the figure it can also be seen how the value functions coincide with respective payoff functions at the respective boundary points for .
4 Study of the smooth fit equations
We proceed by studying the solvability of equation (22). In order to do so, we first need some technical results.
Lemma 4.1**.**
Let for some function that has a bounded first derivative.
[TABLE]
and
[TABLE]
Proof.
For the first claim, let so that . Since and the first derivative of is bounded, is a supermartingale, and is decreasing in . Therefore, , and consequently
[TABLE]
For the second claim, by Itô’s formula we get
[TABLE]
Differentiating with respect to yields
[TABLE]
where the notation is used to indicate that the starting point of is . Using a non-crossing property of paths (see [22, Chapter IX.3]) and monotonicity of , we find that . Consequently
[TABLE]
that is, is decreasing in . ∎
Remark 4.2**.**
In the following, we apply Lemma 4.1 to the function which is not in but merely in . However, a closer inspection of the proof of the lemma shows that the conclusion also holds in this case.
For the remainder of Section 4, we work under the following assumption.
Assumption 4.3**.**
For a given , we assume that and is decreasing.
That is, we proceed to show that under Assumption 4.3, the stopping problem with remaining investments with a one-sided boundary determined by smooth fit is solved. We then use induction to show that the recursive stopping problem is indeed solved for all .
Lemma 4.1 implies the following result, for which recall that
[TABLE]
Proposition 4.4**.**
Assume that Assumption 4.3 holds. Then is strictly decreasing. Moreover, there exist unique solutions and of and , respectively. Furthermore, and .
Proof.
By Lemma 4.1, if is decreasing then is decreasing in , and thus
[TABLE]
is strictly decreasing. In addition, if , we have
[TABLE]
Moreover, by (27) we have that, at ,
[TABLE]
That is, is strictly decreasing, and it satisfies for small . Moreover, at it is non-positive, which shows that a unique solution to exists, and also that .
To show the remaining claim, it suffices to note that
[TABLE]
so . ∎
To show that the equation (22) indeed has a unique solution, we define
[TABLE]
for . Recall that we expect that the boundary solving the th free-boundary problem (21) is a solution to the equation .
Proposition 4.5**.**
Assume that Assumption 4.3 holds. Then, there exists a unique solution to the equation . Moreover, .
Proof.
Since for all , we have for , where is the solution to .
Using , we have
[TABLE]
where the last inequality pair comes directly from Proposition 4.4. The sign of
[TABLE]
coincides with the sign of . Consequently, is decreasing on and increasing on , so there exists at most one solution of , and for such a solution we must have . We next show that , which then finishes the proof.
To see that , select a constant such that . Lemma 4.1, together with properties of and (see Lemma 3.1 and equation (16), respectively), yields
[TABLE]
By the maximum principle, it follows that for , so
[TABLE]
Consequently, , so
[TABLE]
where the last equality comes from noting that is solved by by (19). This completes the proof. ∎
5 Main results
We start by verifying the candidate value function .
Proposition 5.1**.**
Let be as in (10) and as in (20). Then for all .
Proof.
When , we have directly by (20). By the convexity of , it follows that
[TABLE]
also for . Since , when we have
[TABLE]
Similarly, if , by (13), holds. Therefore, by a standard verification argument (see, for example, [21] for theory and several examples), we indeed have . ∎
Next, we provide a verification result for for an .
Proposition 5.2**.**
Assume that Assumption 4.3 holds. Let be as in (10) and as in (23). Then for all .
Proof.
Similarly as in the proof of Proposition 5.1, for the verification argument we need
[TABLE]
and
[TABLE]
For the condition (29), we note that for we automatically have
[TABLE]
On the other hand, when , we have
[TABLE]
since (see Propositions 4.4 and 4.5).
For the condition (30) we note that if , then by construction we have . For we argue as follows.
First, we claim that on . In fact, if this was not the case, then there exists with . Since on , the maximum principle yields on . On the other hand, , which implies that in a left neighborhood of , which is a contradiction. It follows that on .
Second, we show that on .
Since , and on , the maximum principle gives also on .
Since conditions (29) and (30) hold, by standard verification arguments (see note in the proof of Proposition 5.1) we have . ∎
To combine the individual verification results 5.1 and 5.2 for our main result, we need the following simple proposition.
Proposition 5.3**.**
Assume that Assumption 4.3 holds. Then, also is decreasing.
Proof.
By the construction of the candidate value function in (23) and Proposition 5.2, we have for , and for . Therefore, by Proposition 4.4, is decreasing. ∎
Using an induction argument, the following theorem contains the main result of our article.
Theorem 5.4**.**
For , let and be as given in equation (10), and and be as given in equations (20) and (23), respectively. Then
[TABLE]
for all and for all . Moreover, for each , the optimal investment strategy is to invest at random times
[TABLE]
characterized by a sequence of stopping boundaries , where is given by the equation (19), and for each , the boundary is the unique solution of , where is given in equation (28).
Proof.
Claim comes directly from Proposition 5.1. It follows that
[TABLE]
is decreasing and so by Proposition 5.3, is also decreasing. Then, for any given , assume that is decreasing. This assumption together with Proposition 5.3 implies that also is decreasing. That is, by induction is decreasing for all . In addition, Proposition 5.2 then asserts that for all with . The optimality of for all follows by construction.
∎
An example of the optimal strategy characterized as boundaries is illustrated in the following figure.
We finish the study of the learning-from-the-past effect by discussing some properties of boundaries with numerical methods.
6 Comparative statistics
We conduct a numerical study on the behavior of boundary with respect to changes in the model parameters. To highlight some of the results, we show that a case with is possible (see Figure 4), which shows that it may be optimal to invest in a project with negative expected value (similar observations have been made in [10]). Similarly, we show that is not monotone with respect to (see Figure 7), demonstrating a mixed effect between and signal-to-noise ratio . We then conclude with an observation that increasing (and so decreasing learning per individual investment) decreases (see Figure 8).
Remark 6.1**.**
Recall that boundary is given by solving (22). To solve for the boundaries, functions are solved using a finite differences method. We consider a second-order partial differential equation
[TABLE]
with . In particular, . Moreover, we establish boundary values of at and using the results in Lemma 3.1. The boundaries are solved on a discrete grid but the values plotted and used in recursion are taken as weighted averages between the grid points. In all figures, the numerically solved boundaries are produced using the same set of parameters unless otherwise mentioned. Boundaries are solved numerically and they are verified with the explicit values given in (19).
In Figure 4, we plot the boundaries for two different levels of . Note how the boundary is not dependent on the chosen level of maximum learning, and for some , for a higher amount of total learning. This suggests a tradeoff between learning and earning, a prevalent topic in many studies of incomplete information (see, for example [13] for a classical study under a Bayesian setting).
Next, we compare the boundaries for different levels of (the diffusion coefficient of the observation process (4)), which affects dynamics of the process (described in (7)) via the signal-to-noise ratio . It is expected that the boundary is increasing with respect to and so decreasing with respect to : A higher signal-to-noise ratio implies that the investor learns more only by observing , whereas a lower signal-to-noise ratio makes the investor more eager to invest to attain additional learning. This phenomenon is confirmed in Figure 5.
Similar comparison can be done for the discount rate . Intuitively, high discount rate penalizes waiting which is tied to earlier investment times. This is confirmed in Figure 6, where we compare boundaries for different levels of .
It is also noteworthy to comparie boundaries for different project values and . Recall that in Section 2 we motivated the solution concept to our problem by defining
[TABLE]
that is, comparing different project values and reduces to comparing different values for and . We expect that increasing pushes up the boundary, and with Figure 5 we argued that the boundary is also monotone with respect to . Moreover, both and are monotone with respect to . However, and have different monotonities in , leading to a mixed effect. This is demonstrated in Figure 7, where there is no monotonicity of with respect to .
Increasing the number of possible investments should also affect the boundary. Such comparison is possible with fixing , i.e. to scale inversely the amount of additional learning with . One expects increasing to decrease the boundary , reducing the effect of an individual investment. This is confirmed in Figure 8.
We finish with the following remark on possible further extensions to our research.
Remark 6.2**.**
We note that solving for the multiple stopping problem (10) indeed is a discrete analogue of solving the continuous problem (1)–(2). By taking the limit and by detaching from the discrete grid in investment levels and optimal stopping times, we expect the problem to collapse into a continuous stochastic control problem, where one attains a continuous boundary corresponding to optimal control of the process , formal definition of which we leave for further research. Indeed, definition of the admissible class of controls and characterizing the boundary through suitable boundary conditions remains to be an ample opportunity for future research.
Acknowledgement**.**
We sincerely thank Erik Ekström for his patient, kind, and useful guidance and countless discussions which were invaluable in shaping this article to its current form.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] R. Carmona and S. Dayanik. Optimal multiple stopping of linear diffusions. Math. Oper. Res. , 33(2):446–460, 2008.
- 2[2] R. Carmona and N. Touzi. Optimal multiple stopping and valuation of swing options. Mathematical Finance , 18(2):239–268, 2008.
- 3[3] R. C. Dalang and A. N. Shiryaev. A quickest detection problem with an observation cost. Ann. Appl. Probab. , 25(3):1475–1512, 2015.
- 4[4] J.-P. Décamps, T. Mariotti, and S. Villeneuve. Irreversible investment in alternative projects. Econom. Theory , 28(2):425–448, 2006.
- 5[5] A. K. Dixit and R. S. Pindyck. Investment under Uncertainty . Princeton University Press, 1994.
- 6[6] J.-P. Décamps, T. Mariotti, and S. Villeneuve. Investment timing under incomplete information. Mathematics of Operations Research , 30(2):472–500, 2005.
- 7[7] E. Ekström and I. Karatzas. A sequential estimation problem with control and discretionary stopping. Probab. Uncertain. Quant. Risk , 7(3):151–168, 2022.
- 8[8] E. Ekström and A. Milazzo. A detection problem with a monotone observation rate. Stochastic Process. Appl. , 172:Paper No. 104337, 19, 2024.
