On the Value of Penalties in Time-Inconsistent Planning
Susanne Albers, Dennis Kraft

TL;DR
This paper explores the use of penalties as a more effective and computationally feasible alternative to prohibitions in designing commitment devices for time-inconsistent planning, demonstrating theoretical advantages and approximation algorithms.
Contribution
It introduces a penalty-based commitment device that improves upon prohibition methods, providing a 2-approximation algorithm and analyzing the computational hardness of optimal penalty design.
Findings
Penalties can be up to 1/β times more efficient than prohibitions.
A 2-approximation algorithm for penalty allocation is presented.
Optimal penalties are NP-hard to approximate within a ratio of 1.08192.
Abstract
People tend to behave inconsistently over time due to an inherent present bias. As this may impair performance, social and economic settings need to be adapted accordingly. Common tools to reduce the impact of time-inconsistent behavior are penalties and prohibition. Such tools are called commitment devices. In recent work Kleinberg and Oren connect the design of prohibition-based commitment devices to a combinatorial problem in which edges are removed from a task graph with nodes. However, this problem is NP-hard to approximate within a ratio less than . To address this issue, we propose a penalty-based commitment device that does not delete edges but raises their cost. The benefits of our approach are twofold. On the conceptual side, we show that penalties are up to times more efficient than prohibition, where parameterizes the present…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
On the Value of Penalties in Time-Inconsistent Planning
Susanne Albers Department of Computer Science, Technical University of Munich, 85748 Garching, Germany; [email protected]. Work supported by the European Research Council, Grant Agreement No. 691672.
Dennis Kraft Department of Computer Science, Technical University of Munich, 85748 Garching, Germany. [email protected]
Abstract
People tend to behave inconsistently over time due to an inherent present bias. As this may impair performance, social and economic settings need to be adapted accordingly. Common tools to reduce the impact of time-inconsistent behavior are penalties and prohibition. Such tools are called commitment devices. In recent work Kleinberg and Oren [5] connect the design of prohibition-based commitment devices to a combinatorial problem in which edges are removed from a task graph with nodes. However, this problem is NP-hard to approximate within a ratio less than [2]. To address this issue, we propose a penalty-based commitment device that does not delete edges but raises their cost. The benefits of our approach are twofold. On the conceptual side, we show that penalties are up to times more efficient than prohibition, where parameterizes the present bias. On the computational side, we significantly improve approximability by presenting a -approximation algorithm for allocating the penalties. To complement this result, we prove that optimal penalties are NP-hard to approximate within a ratio of .
1 Introduction
Most people make long term plans. They intend to eat healthy, save money, prepare for exams, exercise regularly and so on. Curiously, the same people often change their plans at a later point in time. They indulge in fast food, squander their money, fail to study and skip workouts. Although change may be necessary due to unforeseen events, people often change their plans even if the circumstances stay the same. This type of time-inconsistent behavior is a well-known phenomenon in behavioral economics and might impair a person’s performance in social or economic domains [1, 8].
A sensible explanation for time-inconsistent behavior is that people are present biased and assign disproportionately greater value to the present than to the future. Consider, for instance, a scenario in which a student named Alice attends a course over several weeks. To pass the course, Alice either needs to solve a homework exercise each week or give a presentation once. The presentation incurs a onetime effort of , whereas each homework exercise incurs an effort of . Assume that she automatically fails the course if she misses a homework assignment before she has given a presentation. If the course lasts for more than weeks, she clearly minimizes her effort by giving a presentation in the first week. Paradoxically, if Alice is present biased, she might solve all homework exercises instead. The reason for this is the following:
Suppose Alice perceives present effort accurately, but discounts future effort by a factor of . In the first week Alice must decide between solving the homework exercise or giving a presentation. Clearly, the homework incurs less immediate effort than the presentation. Furthermore, Alice can still give a presentation next week. Her perceived effort for doing the homework this week and giving the presentation the week after is . To Alice this plan appears more convenient than giving the presentation right away. Consequently, she does the homework. However, come next week she changes this plan and postpones the presentation once more. Her reasoning is the same as in the first week. Due to her time-inconsistent behavior, Alice continues to postpone the presentation and ends up doing all the homework assignments.
Previous Work
Time-inconsistent behavior has been studied extensively in behavioral economics. For an introduction to the topic refer for example to [1]. Alice’s scenario demonstrates how time-inconsistency arises whenever people are present biased. Alice evaluates her preferences based on a well-established discounting model called quasi-hyperbolic-discounting [7]. As her story shows, quasi-hyperbolic-discounting tempts people to make poor decisions. To prevent poor decisions, social and economic settings need to be adapted accordingly. Depending on the domain, such adaptations might be implemented by governments, companies, teachers or people themselves. We call these entities designers and their motivation can be benevolent or self-serving. In either case, the designer’s objective is to commit people to a certain goal. Their tools are called commitment devices and may include rewards, penalty fees and strict prohibition [3, 9].
Until recently, the study of time-inconsistent behavior lacked a unifying and expressive framework. However, groundbreaking work by Kleinberg and Oren closed this gap by reducing the behavior of a quasi-hyperbolic-discounting person to a simple planning problem in task graphs [5]. Their framework has helped to identify various structural properties of social and economic settings that affect the performance of present biased individuals [5, 10]. It has also been extended to people whose present bias varies over time [4] as well as people who are aware of their present bias and act accordingly [6]. We will formally introduce the framework in Section 2. A significant part of Kleinberg and Oren’s work is concerned with the study of a simple yet powerful commitment device based on prohibition [5]. In particular, they demonstrate how performance can be improved by removing a strategically chosen set of edges from the task graph. The drawback of their approach is its computational complexity. As it turns out, an optimal commitment device is NP-hard to approximate within a ratio less than , where denotes the number nodes in the task graph [2]. Currently, the best known polynomial-time approximation achieves a ratio of [2]. It should be mentioned that Kleinberg and Oren’s framework has also been used to analyze reward-based commitment devices [2, 10]. Unfortunately, their computational complexity does not permit a polynomial-time approximation within a finite ratio unless [2].
Our Contribution
To circumvent the theoretical bottleneck mentioned above, we propose a natural generalization of Kleinberg and Oren’s commitment device. Instead of prohibition, our commitment device is based on penalty fees, a standard tool in the economic literature [3, 9]. This means that the designer is free to raise the cost of arbitrary edges in the task graph. We call such an assignment of penalties a cost configuration. The designer’s objective is to construct cost configurations that are as efficient as possible.
In Section 3 we conduct a quantitative comparison between the efficiency of prohibition-based and penalty-based commitment devices. Assuming that optimal solutions are known, we show that penalties are strictly more powerful than prohibitions. In particular, we show that penalties may outperform prohibitions by a factor of where parameterizes the present bias. This result is tight. In Section 4 we investigate the computational complexity of our commitment device. Using a reduction from -SAT, we argue that the construction of an efficient cost configuration is NP-hard when posed as a decision problem. A generalization of this reduction proves NP-hardness for approximations within a ratio of . Unless , this dismisses the existence of a polynomial-time approximation scheme. While analyzing the complexity of our commitment device we also point to a remarkable structural property. More specifically, we show that every cost configuration admits another cost configuration of comparable efficiency that assigns its cost entirely along a single path. Assuming that the path is known in advance, we provide an algorithm for constructing such a cost configuration in polynomial-time. This result is important for the design of exact algorithms as it reduces the search space to the set of paths through the task graph. Finally, Section 5 introduces a -approximation algorithm for our commitment device. This is the main result of our work and a considerable improvement to the complexity theoretic barrier of for approximating prohibition-based commitment devices [2].
2 The Formal Framework
In the following, we introduce Kleinberg and Oren’s framework [5]. Let be a directed acyclic graph with nodes that models a given long-term project. The edges of correspond to the tasks of the project and the nodes represent the states. In particular, there exists a start state and a target state . Each path from to corresponds to a valid sequence of tasks to complete the project. The effort of a specific task is captured by a non-negative cost assigned to the associated edge .
To complete the project, an agent with a present bias of incrementally constructs a path from to as follows: At any node different from , the agent evaluates her lowest perceived cost. For this purpose she considers all paths leading from to . However, she only anticipates the cost of the first edge of correctly; all other edges of are discounted by her present bias. More formally, let denote the cost of a cheapest path from node to . The agent’s lowest perceived cost at is defined as . We assume that she only traverses edges that minimize her anticipated cost, i.e. edges for which . Ties are broken arbitrarily. For convenience, we define the perceived cost of as . The agent is motivated by an intrinsic or extrinsic reward collected at . As she receives this reward in the future, she perceives its value as at each node different from . When located at , she compares her lowest perceived cost to the anticipated reward and continues moving if and only if . Otherwise, if , we assume she abandons the project. We call motivating if she does not abandon while constructing her path from to . Note that in some graphs the agent can take several paths from to due to ties between incident edges. In this case, is considered motivating if she does not abandon on any of these paths.
For the sake of a clear presentation, we will assume throughout this work that each node of is located on a path from to . This assumption is sensible for the following reason: Clearly, the agent can only visit nodes that are reachable from . Furthermore, she is not willing to enter nodes that do not lead to the reward. Consequently, only nodes that are on a path from to are relevant to her behavior. Note that all nodes that do not satisfy this property can be removed from in a simple preprocessing step.
To illustrate the model, we revisit Alice’s scenario from Section 1. Assume that the course takes weeks. We represent each week by a distinct node and set . Furthermore, we introduce a target node that marks the passing of the course. Each week Alice can either give a presentation or proceed with the homework. We model the first case by an edge of cost and the latter case by an edge of cost . In the last week, i.e. , Alice’s only sensible choice is to do the homework. Therefore, edge is of cost . Recall that Alice’s present bias is . Moreover, assume that her intrinsic reward for passing is . For her perceived cost of the edges is . As this is less than her perceived reward, which is , she is never motivated to give a presentation right away. However, her perceived cost of the edges is at most . This matches her perceived reward. As a result, she walks from to along the edge . Once she reaches she traverses the only remaining edge for a perceived cost of and passes the course. This matches our analysis from Section 1.
3 Prohibition versus Penalty
In this section we demonstrate how the designer can modify a given project to help the agent reach . For this purpose, the designer may have several commitment devices at her disposal. A straightforward approach is to increase the reward that the agent collects at . Although this may keep the agent from abandoning the project prematurely, it has no influence on the path taken by the agent. Furthermore, increasing the reward may be costly for the designer. As a result, the designer has two conflicting objectives. On the one hand, she must ensure that the agent reaches . On the other hand, she needs to minimize the resources spent. To deal with this dilemma, Kleinberg and Oren allow the designer to prohibit a strategically chosen set of tasks [5]. This commitment device is readily implemented in their framework. In fact, it is sufficient to remove all edges of prohibited tasks. The result is a subgraph that may significantly reduce the reward required to motivate the agent. Unfortunately, an optimal subgraph is NP-hard to approximate within a ratio less than [2].
To circumvent this theoretical bottleneck, we propose a different approach. Instead of prohibiting certain tasks we allow the designer to charge penalty fees. Such fees could be implemented in several ways; for instance in the form of donations to charity. Our only assumption is that the designer does not benefit from the fees, i.e. there is no incentive to maximize the fees payed by the agent. Similar to commitment devices based on prohibition, our commitment device is readily implemented in Kleinberg and Oren’s framework. The designer simply assigns a positive extra cost to the desired edges . The new cost of is equal to . We call a cost configuration. Applying a cost configuration to yields a new task graph with increased edge cost. All concepts of the original framework carry over immediately. Sometimes it will be necessary to compare different commitment devices with each other. To clarify which commitment device we are talking about, we use the following notation whenever necessary: If we consider a subgraph , we write , and . Similarly, if we consider a cost configuration , we write , and . Moreover, we denote the trivial cost configuration, i.e. the one that assigns no extra cost, by .
It is interesting to think of penalty fees as a natural generalization of prohibition. This becomes particularly apparent in the context of Kleinberg and Oren’s framework as we can recreate the properties of any subgraph by a cost configuration . For this purpose, it is sufficient to assign an extra cost of to any edge not contained in . As a result, the agent’s perceived cost of paths along certainly exceeds her perceived reward. However, this means that is irrelevant to the agent’s planning and could be deleted from altogether. Consequently, penalties are at least as powerful as prohibitions. But how much more efficient are penalties in the best case? As the following theorem suggests, cost configurations may outperform subgraphs by a factor of almost .
Theorem 1**.**
The ratio between the minimum reward that admits a motivating subgraph and the reward of a motivating cost configuration is at most . This bound is tight.
Proof.
To see that , let be an arbitrary task graph and consider a subgraph whose only edges are those of a cheapest path from to . Recall that denotes the cost of . In the agent’s only choice is to follow . Because her perceived cost is a discounted version of the actual cost, she never perceives a cost greater than in . Consequently, is an upper bound on . Next, consider an arbitrary cost configuration . As only increases edge cost, the agent’s lowest perceived cost at is at least . We conclude that must be at least to be motivating. This yields the desired ratio.
It remains to show the tightness of the result. For this purpose, we construct a task graph such that: (a) The minimum reward that admits a motivating subgraph is . (b) There exists a cost configuration that is motivating for a reward of , where is a positive value strictly less than . Our construction is a modified version of Alice’s task graph. Let and assume that contains a path whose edges are all of cost . We call this the main path and set and . In addition to the main path, each with has a shortcut to via a common node . The edges are free, whereas is of cost . Figure 1 illustrates the structure of . Note that the drawing merges some of the edges for a concise representation.
We proceed to argue that satisfies (a). For the sake of contradiction, assume the existence of a subgraph that is motivating for a reward . In this case the agent must not take shortcuts as her perceived cost at exceeds her perceived reward. Therefore, she must follow the main path. In particular, she must visit each node on the first half of the path, i.e. . At each of these nodes, her lowest perceived cost is realized along the edge . Essentially, there are two ways she can come up with this cost. First, she might plan to take a shortcut at a later point in time. As a result, we get . Secondly, she might plan to stay on the main path. In this case she must traverse at least edges, each of which contributes or more to . Consequently, we get . Either way her perceived cost for taking the main path is at least . As this tempts her to take the shortcut at , all of the first shortcuts must be interrupted in . This means she must walk along at least edges of the main path before taking the first shortcut. As a result, her lowest perceived cost at is at least . This is a contradiction to the assumption that is motivating.
Next we show how to construct a cost configuration that satisfies (b). For this purpose it is sufficient to add an extra cost of to all edges . To upper bound the agent’s perceived cost of , assume she plans to take a shortcut in the next step, i.e. at . For we get . In the special case of , the inequality is still satisfied, this time via the direct edge . In contrast, the agent’s perceived cost of an immediate shortcut is for all . Therefore, she is never tempted to divert from the main path. Furthermore, a reward of is sufficient to keep her motivated. ∎
4 Computing Motivating Cost Configurations
We now turn our attention to the computational aspects of designing efficient penalty fees. In this section, we assume that the agent’s reward is fixed to some value . Our goal is to compute cost configurations that are motivating for whenever they exist. Similar to prohibition-based commitment devices [2], this task is NP-hard whenever the agent is present biased, i.e. . We will prove this claim at the end of the section. But first, assume that we already have partial knowledge of the solution. More precisely, assume we know one of the paths the agent might take in a motivating cost configuration provided a motivating cost configuration exists. We call this path . Based on , Algorithm 1 constructs a cost configuration that is motivating for a slightly larger reward .
The basic idea of Algorithm 1 is simple. Starting with , it considers all nodes of in reverse order. For each it assigns an extra cost of to the edges that leave , i.e. edges different from . As a result, the agent’s perceived cost of is greater than that of by at least . Consequently, she has no incentive to divert from at . Since the algorithm runs in reverse order, extra cost assigned in iteration has no effect on the agent’s behavior at later nodes, i.e. nodes with . Figuratively speaking, the algorithm builds a fence of penalty fees along preventing the agent from leaving . For this reason, we call the algorithm PathAndFence. As the next proposition suggests, cost configurations of this particular fence structure can achieve almost the same efficiency as any other cost configuration. Due to space constraints, refer to the Appendix for a proof.
Proposition 1**.**
Let be the agent’s path from to with respect to a cost configuration that is motivating for a reward . PathAndFence constructs a cost configuration that is motivating for a reward of , where is an arbitrary small but positive quantity.
Proposition 1 has some interesting implications. The first one is of conceptual nature. Note that PathAndFence constructs a cost configuration that never actually charges the agent any extra cost. This suggests the existence of efficient penalty-based commitment devices that do not require the designer to enforce penalties. The mere threat of repercussions appears to be sufficient. The second implication is computational. Clearly, PathAndFence runs in polynomial-time with respect to . In particular, the number of iterations does not depend on the choice of . Consequently, PathAndFence can be combined with an exhaustive search algorithm that considers all paths from to to search for a motivating cost configuration. Although the number of such paths can be exponential in , this approach still reduces the size of the search space considerably. Finally, it should be noted that a similar result for commitment devices based on prohibition is unlikely to exist. The reason is that subgraphs remain hard to approximate even if the agent’s optimal path is known [2], indicating a favorable computational complexity for the design of penalty fees. Of course there is another potential source of hardness: the computation of . To prove that this is a limiting factor, we introduce the decision problem MOTIVATING COST CONFIGURATION:
Definition 1** (MCC).**
Given a task graph , a reward and a present bias , decide the existence of a motivating cost configuration.
We propose a reduction from -SAT to show that MCC is NP-complete for arbitrary . At a later point we will use the same reduction to establish a hardness of approximation result.
Theorem 2**.**
MCC is NP-complete for any present bias .
Proof.
According to [2], whether or not a given task graph is motivating for a fixed reward can be verified in polynomial-time. Of course, this remains valid if the edges are assigned extra cost. Consequently, any motivating cost configuration is a suitable certificate for a ”yes”-instance of MCC. We conclude that MCC is in NP. In the following, we present a reduction from -SAT to show that MCC is also NP-hard. This establishes the theorem.
Let be an arbitrary instance of -SAT consisting of clauses over variables . We construct a MCC instance such that its task graph admits a motivating cost configuration for a reward of if and only if has a satisfying variable assignment. Figure 2 depicts for a small sample instance of . In general, consists of a source , a target and five nodes . Depending on , also contains some extra nodes. For each variable , there are two variable nodes and . The idea is to interpret as true whenever the agent visits and as false whenever she visits . As a result, the agent’s walk through yields a variable assignment . Furthermore, for each clause there is a literal node corresponding to the -th literal of . Our goal is to construct in such a way that every motivating cost configuration guides the agent along literal nodes that are satisfied with respect to .
All nodes and are connected via so-called forward edges. More specifically, for all and there is a forward edge from to . Similarly, there is a forward edge from to for all and . We also have forward edges from to each , from each to , from to each and from each to . For the sake of readability, some forward edges are merged in Figure 2. The price of each forward edge is , where the encoding length of is assumed to be polynomial in . Furthermore, denotes a small but positive quantity such that
[TABLE]
In addition to the forward edges, there are three types of shortcuts. The first type, which is depicted as dashed edges in Figure 2, connects each literal node to a distinct variable node via a single edge of cost . If the -th literal of is equal to , the shortcut goes to . Otherwise, if the literal is negated, i.e. , the shortcut goes to . The second type of shortcut goes from to along a single edge of cost . For clear representation, this shortcut is omitted in Figure 2. The third type of shortcut connects each variable node to via a distinct intermediate node. The first edge is free while the second costs . Again, shortcuts of this type are omitted in Figure 2 to keep the drawing simple. Finally, there are four more edges , , and of cost , , and respectively. Note that is acyclic and its encoding length is polynomial in .
To establish the theorem, we must show that has a satisfying variable assignment if and only if has a motivating cost configuration. A detailed argument is described in the Appendix. At this point we only sketch the main ideas. For this purpose let be a cost configuration that is motivating for a reward of and let be the agent’s path through with respect to . Note that cannot contain shortcuts of the second or third type as their edges are too expensive. Furthermore, cannot contain a shortcut of the first type because the agent either perceives it as too expensive or is tempted to enter a shortcut of the third type immediately afterwards. As a result, contains exactly one of the two nodes and for each variable . Let be the corresponding variable assignment. To keep the agent on , must assign extra cost to all shortcuts that start at a variable node satisfied by . However, this raises the perceived cost of all paths via literal nodes not satisfied by to values that are not motivating. Consequently, cannot contain such literal nodes. But must contain exactly one literal node of each clause because takes no shortcuts. This means that satisfies at least one literal in each clause and is therefore a feasible solution of . Conversely, whenever has a feasible solution , we can construct a motivating cost configuration as follows: First, assign an appropriate extra cost, e.g. , to the shortcuts of type three starting at the variable nodes . Secondly, block the forward edges into the variable nodes with high extra cost of e.g. . ∎
5 Approximating Motivating Cost Configurations
The previous section showed that optimal penalty-based commitment devices are NP-hard to design. This section therefore focuses on an optimization version of the problem. Our goal is to construct cost configurations that require the designer to raise the reward at as little as possible. However, before we provide a formal definition of the problem we should consider a curious technical detail; namely, not all task graphs admit an optimal cost configuration.
Consider, for instance, the task graph in Figure 3. At the agent is indifferent between the edges and . In both cases her perceived cost is . If she chooses , she faces a perceived cost of at . Conversely, if she chooses , she perceives a cost of at , and . Assuming that , is the better choice. To break the tie between and we must place a positive extra cost of onto the upper path. However, when located at the agent’s perceived cost of the upper path is . In contrast, her perceived cost of the lower path is . Assuming that , she prefers the upper path. Consequently, we can construct a cost configuration that is motivating for a reward arbitrarily close to , but no cost configuration is motivating for a reward of exactly . To account for the potential lack of an optimal solution, we compare our results to the infimum of all rewards that admit a motivating cost configuration. The optimization problem MCC-OPT is defined accordingly:
Definition 2** (MCC-OPT).**
Given a task graph and a present bias , determine the infimum of all rewards for which a motivating cost configuration exists.
We are now ready to introduce Algorithm 2. This algorithm enables us to construct cost configurations that approximate MCC-OPT within a factor of . At a high level, the algorithm proceeds in two phases. First, it computes a value such that is a lower bound for any reward that admits a motivating cost configuration. Secondly, it constructs a cost configuration that is motivating for a reward of . This yields the promised approximation ratio of .
For a more detailed discussion of Algorithm 2 assume that each edge is labeled with its perceived cost . Furthermore, let be an arbitrary cost configuration and the agent’s corresponding path from to . Our goal is to lower bound the minimum reward that is motivating for by some value . For this purpose, it is instructive to observe that any motivating reward must be at least . Since can be an arbitrary path from to , we set
[TABLE]
In other words, is the maximum edge cost of a minmax path from to with respect to . Note that can be computed in polynomial-time by adding the edges of in non-decreasing order of perceived cost to an initially empty set until and become connected for the first time. Any path from to that only uses edges of is a suitable minmax path.
We continue with the construction of . To facilitate this task, Algorithm 2 sets up a cheapest path successor relation . More precisely, it assigns a distinct successor node to each . The successor is chosen in such a way that is the initial edge of a cheapest path from to . Since we may assume that is reachable from each node of , all must have at least one suitable successor. By construction of , any path is a cheapest path from to . We call the -path of .
Once has been created, Algorithm 2 starts to assign an appropriate extra cost to all edges of . The idea behind this assignment is to either keep the agent on or guide her along a suitable -path. For this reason, we also call the algorithm MinMaxPathApprox. While iterating through the edges of the algorithm distinguishes between three types of edges: First, might be an edge of or an edge of a -path. In the latter case must not be a node of . Any that satisfies these requirements is an edge we want the agent to traverse or use in her plans. Consequently, is assigned no extra cost. Secondly, might neither be an edge of nor of a -path. Since we do not want the agent to traverse or plan along such an edge, the algorithm assigns an extra cost of to . This is sufficiently expensive for the agent to lose interest in provided that the reward is . Thirdly, might not be an edge of but of a -path such that is a node of . This is the most involved case. To find an appropriate cost for , the algorithm considers the -path of . Let be the first common node between and that is different from . Because and both end in , such a node must exist. Moreover, let be the most expensive edge of between and . The algorithm assigns an extra cost of to . As we will show in Theorem 3, this cost is either high enough to keep the agent on or she travels to along without encountering edges that are too expensive.
Clearly, Algorithm 2 can be implemented to run in polynomial-time with respect to the size of . It remains to show that the algorithm returns a cost configuration that approximates MCC-OPT within a factor of .
Theorem 3**.**
MinMaxPathApprox* has an approximation ratio of .*
Proof.
Recall that denotes the maximum perceived edge cost along the minmax path . From the above description of MinMaxPathApprox, it should be evident that is a lower bound on the minimum motivating reward of any cost configuration. To prove the theorem, we need to show that the algorithm returns a cost configuration that is motivating for a reward of .
As our first step we argue that the cost of a cheapest path from any node to with respect to is at most twice the cost of a cheapest path with respect to . More formally, we prove that . For this purpose let be the -path of . By construction of , is a cheapest path from to . It is crucial to observe that MinMaxApprox only assigns extra cost to an edge of if is located on . Consequently, there is at most one edge with extra cost between any two consecutive intersections of and . Furthermore, this extra cost is equal to the cost of an edge on between and the next intersection of and . Therefore, each edge of can contribute at most once to the total extra cost assigned to . This means that the price of with respect to is at most twice the price of with respect to . Because the price of is an upper bound for , we have shown that .
We proceed to investigate the agent’s walk through . Our goal is to show that her lowest perceived cost is at most at every node on her way. This establishes the theorem. Our analysis is based on the following case distinction: First, assume that is located on . The immediate successor of on is denoted by . Remember that assigns no extra cost to . Using the result from the previous paragraph, we get
[TABLE]
The last inequality is valid by definition of .
Secondly, assume that is not located on and consider the last node on the agent visited before . Because she traversed to get to , we know that and . We also know that she faces an extra cost of whenever she tries to leave the -path of before the next intersection of and . Since she is not willing to pay this much, must be located on . In particular, all paths from to either visit or cross an edge that charges an extra cost of . Consequently, a cheapest path from to with respect to costs at least . As , this implies that . Our proof is almost complete. For the final part, recall that is located on between and the next intersection of and . By construction of we have . Furthermore, has no extra cost. Putting all the pieces together we get
[TABLE]
To complement this result and emphasize the quality our approximation given the theoretical limitations, we argue that MCC-OPT is NP-hard to approximate within any ratio of or less. In particular, assuming that this rules out the existence of a polynomial-time approximation scheme.
Theorem 4**.**
MCC-OPT is NP-hard to approximate within a ratio less or equal to .
Proof.
To establish the theorem, a reduction similar to the one from Theorem 2 can be used. In fact, given a -SAT instance we can construct the corresponding MCC-OPT instance the same way as in the proof of Theorem 2. The only difference is that our choice of is slightly more restrictive as we require
[TABLE]
The proof can be structured around the following properties of : (a) If has a solution, admits a motivating cost configuration for a reward of . (b) If has no solution, admits no motivating cost configuration for a reward of or less. Consequently, any algorithm that approximates MCC-OPT within a ratio of or less must also solve . To maximize this ratio we choose and obtain the desired approximability bound, namely . All that remains to show is that indeed satisfies (a) and (b). The correctness of (a) is an immediate consequence of the proof of Theorem 2. A detailed proof of (b) can be found in the Appendix. ∎
6 Conclusion
In this work we have used Kleinberg and Oren’s graph theoretic framework [5] to provide a systematic analysis of penalty-based commitment devices. We have shown that penalty fees are strictly more powerful than commitment devices based on prohibition. In particular, we have shown that penalties may outperform prohibitions by a factor of up to . We have also been able to obtain some of the first positive computational results for the algorithmic design of commitment devices. We have given a polynomial-time algorithm to construct penalty fees that match an optimal solution by a factor of . This is significant progress when compared to prohibition-based commitment devices whose approximation is known to be NP-hard within a factor less than [2]. Due to their versatility, expressiveness and favorable computational properties, we believe that penalty-based commitment devices will prove to be a valuable tool for the targeted improvement of complex social and economic settings in the context of time-inconsistent behavior.
Appendix A Appendix
Proof of Proposition 1.
Let and assume that and . From the description of PathAndFence it should be clear that whenever the algorithm assigns extra cost to an edge , the perceived cost of that edge exceeds the perceived cost of by at least . Furthermore, since is acyclic, the extra cost of does not affect the agent’s perceived cost of any previously processed edge with . We conclude that PathAndFence returns a cost configuration for which . Consequently, the agent has no incentive to divert from .
In the remainder, we bound the perceived cost of each by . Together with the observations from the previous paragraph, this concludes the proof. Our argument is based on an induction on . The main induction hypothesis is . Clearly, this also implies that . To simplify matters, we introduce as an auxiliary induction hypothesis.
We start the induction at the last edge of , i.e. . Our goal is to show that and . Recall that minimizes the agent’s perceived cost with respect to . By definition of we also know that minimizes her perceived cost with respect to . Consequently, we have and . Since is the last edge of , we conclude that
[TABLE]
as well as
[TABLE]
Moreover, assigns no extra cost to . Therefore, holds true. Combining the last three inequalities concludes the basis of our induction.
For the inductive step, assume that and are valid for all such that . We proceed to argue that both of these inequalities are also valid for . We start with the first inequality. By construction of we have . Consequently, we can bound the perceived cost of by . The auxiliary induction hypothesis now implies the desired result
[TABLE]
The proof of the second inequality, i.e. , is a bit more involved. Let be the initial edge of a cheapest path from to with respect to . In a first step, we show that . For this purpose let be the node of smallest index different from that is located at an intersection between and . Because and both end in , such a node must exist. Recall that only assigns extra cost to edges that leave a node of . By definition of , no edge in between and can be such an edge. Let and denote the cost of a cheapest path from to with respect to and . According to our considerations, holds true. If , this immediately implies . Otherwise, if , we can apply the auxiliary induction hypothesis to obtain the desired result
[TABLE]
Finally, we take a closer look at the initial edge of . We distinguish between two scenarios. First, assume that assigns no extra cost to . In this case, we have . Together with the inequality from the previous paragraph, this immediately concludes the inductive step
[TABLE]
Secondly, assume that assigns positive cost to . In this case, the perceived cost of with respect to is just slightly greater than that of . More formally, it holds true that . This follows from the construction of . Therefore, we can upper bound by
[TABLE]
Recall that . In combination with our upper bound on , we obtain
[TABLE]
Because minimizes the perceived cost at with respect to , we may assume that and obtain
[TABLE]
Proof of Theorem 2 (continued).
It remains to show that has a satisfying variable assignment if and only if has a cost configuration that is motivating for a reward of . () We start by constructing from an assignment of truth values that satisfies each clause of . For this purpose we assign an extra cost of to the first edge of all shortcuts that start at variable nodes . Furthermore, we assign an extra cost of to all forward edges ending in a variable node of the from . To show that is indeed motivating, we divide the agent’s walk into two separate parts.
The first part contains the literal nodes from to . When located at a specific node , the agent has two options: either she takes the shortcut or she follows a forward edge. In the first case, she ends up at a variable node. By construction of , the cost of a cheapest path from any variable node to is at least . This holds true regardless of extra cost. As a result, her perceived cost for taking the shortcut at is at least . Her other option is to take a forward edge. Assuming that , let be the index of a literal in that evaluates to true with respect to . Because is a satisfying variable assignment, such a literal must exist. The agent’s perceived cost for traversing and then taking the two direct shortcuts to is . In the special case that we obtain the same perceived cost along the path . Consequently, the agent always prefers at least one forward edge to the current shortcut. A similar argument shows that there is one forward edge with a perceived cost of out of . Furthermore, there are no immediate shortcuts at . Finally, when located at , the agent has no choice but to traverse . At this point her perceived cost of the path is . Considering that her perceived value of the reward is , we conclude that she follows the forward edges until she successfully completes the first part of her walk.
The second part of the agent’s walk contains the variable nodes from to . At the agent has three options. First, she can follow the shortcut of type two. This has a cost of and is clearly not motivating. Secondly, she can traverse the forward edge to . By construction of , this edge has a cost greater than . Again, this is not motivating. Thirdly, she can traverse the forward edge to . If the she plans to take the shortcut to immediately afterwards, the perceived cost is . Therefore, this is her preferred choice. Because it is also a motivating choice, she moves to where she faces the same three options. The only difference is that this time the first option is a shortcut is of the third type and has a perceived cost of . Repeating the argument shows that the agent travels from one variable node to the next until she gets to . At this point the only path to is along the nodes , and . Since the agent’s lowest perceived cost at all three nodes is , she remains motivated and eventually reaches . We conclude that is motivating.
() Next, assume that has a solution, i.e. there exists a cost configuration that is motivating for a reward of . We proceed to show how to obtain a variable assignment that satisfies each clause of . For this purpose we make the following two observations: First, no motivating cost configuration can guide the agent onto a shortcut of type two or three. This is because these shortcuts have an edge of cost and are too expensive to traverse for the given reward. Secondly, the agent cannot enter a shortcut of the first type either. To understand this, assume for a moment that she does take such a shortcut. The shortcut takes her from some literal node to a variable node . Her perceived cost of can be at most , otherwise the shortcut would not be motivating. By construction of there is exactly one cheapest path from to , namely the direct shortcut to . As the total cost of this shortcut is , the only way to achieve a perceived cost of for is along this very shortcut. In particular, no extra cost can be assigned to this shortcut. However, once the agent has reached , her perceived cost for taking the direct shortcut to is as no extra cost is placed on this path. Conversely, her perceived cost of any forward edge at is at least , even if we neglect extra cost. By choice of , it holds true that
[TABLE]
Consequently, the agent prefers the shortcut to any of the forward edges. This contradicts our previous observation that she does not take shortcuts of type three.
Because no motivating cost configuration can guide the agent onto a shortcut, we conclude that her walk from to must contain exactly one literal node and one variable node for each clause and variable of . Let be one of possibly several paths the agent can walk from to . Based on , we construct a suitable variable assignment as follows: If she visits along , we set . Otherwise, if she visits , we set . To conclude the proof, we argue that satisfies all clauses of .
Consider an arbitrary clause and let be the corresponding literal node in . Furthermore, let be the literal node preceding in . If , let . We denote the agent’s planned path to when located at by . Clearly, the first edge of must be as this edge is also on . For the next edge of we have two options: The first is another forward edge. As a result, there must be some additional edge of cost in . This can either be a subsequent shortcut of type one or . Moreover, must include the edges and or a shortcut of type two or three. In all cases, the total cost of these edges is at least . Therefore, the agent’s perceived cost of the first option sums up to a value greater or equal to
[TABLE]
The inequality is valid by choice of . Clearly, this is not motivating. The second option is that the next edge of is the shortcut from to the corresponding variable node . Again, we can distinguish between two cases. First, might include a forward edge from to a subsequent variable node , or to if . However, similar calculations to the one above indicate that this is not motivating. The only remaining option is that contains the shortcut from to . In this case, the perceived cost of is at least . This leaves an extra cost of at most to place onto the shortcut from to .
Now assume that includes , i.e. the agent visits at a later point. Recall that her perceived cost for taking a forward edge at is at least . By choice of , an extra cost of is not sufficient to prevent her from entering the shortcut at as
[TABLE]
Of course, this contradicts the fact that the agent cannot take shortcuts. Consequently, the agent cannot visit but must visit instead. By construction of , this implies that satisfies the -th literal of . Because this holds true for all clauses of , must be a satisfying variable assignment. ∎
Proof of Theorem 4 (continued).
In the following, we prove that satisfies (b). For the sake of contradiction assume that there exists a cost configuration that is motivating for a reward of at most , but has no solution. Let be a path that corresponds to the agent’s walk from to .
Similar to the proof of Theorem 2 we first argue that cannot include shortcuts. Recall that shortcuts of the second and third type have an edge of cost . However, the agent’s perceived reward is at most . Because , she has no incentive to traverse such an edge. It remains to show that she does not take a shortcut of the first type either. For this purpose assume she travels from a literal node to some variable node via a shortcut of type one. Let be her planned path when located at . We distinguish between two scenarios. First, might include a forward edge after . Even if we neglect extra cost, her perceived cost of is at least
[TABLE]
The inequality is valid by choice of . Because her perceived cost of exceeds her perceived reward, this scenario is not possible. Secondly, might contain the shortcut from to . In this case, the agent’s perceived cost of is at least . Consequently, may assign an extra cost of no more than to the edges of . This holds particularly true for edges of the shortcut from to . Therefore, her perceived cost for taking the shortcut at is at most . Conversely, even without extra cost, her cost for taking a forward edge at is at least . By choice of , she prefers the shortcut
[TABLE]
This contradicts the fact that she does not enter a shortcut of type three.
Because does not guide the agent onto a shortcut, we conclude that must contain exactly one literal node and one variable node for each clause and each variable of . Similar to the proof of Theorem 2 we use to construct a variable assignment in the following way: If the agent visits along , we set . Otherwise, if she visits , we set . To conclude the proof we argue that satisfies all clauses of . This is a contradiction to our initial assumption that has no solution.
Consider an arbitrary clause and let be the corresponding literal node in . Furthermore, let be the literal node that precedes in . If , let . The agent’s planned path from to is denoted by . Clearly, the first edge of must be . In the next step two directions are possible. The first one is another forward edge. As argued in the proof of Theorem 2 the perceived cost of is at least
[TABLE]
By choice of , this is not motivating
[TABLE]
The second direction is along the shortcut from to some variable node . Again, we can distinguish between two cases. First, might contain a forward edge from to some variable node or if . However, a calculation similar to the one above indicates that this is not motivating. The only remaining possibility is that also contains the shortcut from to . In this case, the perceived cost of is at least . This leaves an extra cost of no more than to place onto the shortcut from to .
Now assume that also includes . The agent’s perceived cost for taking a forward edge from is at least . By choice of , we conclude that an extra cost of is not sufficient to prevent the agent from entering the shortcut as
[TABLE]
Of course, this contradicts the fact that the agent cannot take shortcuts. Consequently, the agent cannot visit but must visit instead. By construction of , this implies that satisfies the -th literal of clause . Because this holds true for all clauses of , must be a satisfying variable assignment. ∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] George A Akerlof. Procrastination and obedience. The American Economic Review , 81(2):1–19, 1991.
- 2[2] Susanne Albers and Dennis Kraft. Motivating time-inconsistent agents: A computational approach. In Proceedings of the 12th Conference on Web and Internet Economics , pages 309–323. Springer, 2016.
- 3[3] Gharad Bryan, Dean Karlan, and Scott Nelson. Commitment devices. Annual Review of Economics , 2:671–698, 2010.
- 4[4] Nick Gravin, Nicole Immorlica, Brendan Lucier, and Emmanouil Pountourakis. Procrastination with variable present bias. In Proceedings of the 17th ACM Conference on Economics and Computation , pages 361–361, New York, NY, USA, 2016. ACM.
- 5[5] Jon Kleinberg and Sigal Oren. Time-inconsistent planning: A computational problem in behavioral economics. In Proceedings of the 15th ACM Conference on Economics and Computation , pages 547–564, New York, NY, USA, 2014. ACM.
- 6[6] Jon Kleinberg, Sigal Oren, and Manish Raghavan. Planning problems for sophisticated agents with present bias. In Proceedings of the 17th ACM Conference on Economics and Computation , pages 343–360, New York, NY, USA, 2016. ACM.
- 7[7] David Laibson. Golden eggs and hyperbolic discounting. The Quarterly Journal of Economics , pages 443–477, 1997.
- 8[8] Ted O’Donoghue and Matthew Rabin. Doing it now or later. The American Economic Review , 89:103–124, 1999.
