NOCTA: Non-Greedy Objective Cost-Tradeoff Acquisition for Longitudinal Data
Dzung Dinh, Boqi Chen, Yunni Qu, Marc Niethammer, Junier Oliva

TL;DR
NOCTA introduces a novel framework for sequentially selecting informative features over time in cost-sensitive settings, improving prediction accuracy while reducing acquisition costs in longitudinal data analysis.
Contribution
It proposes the NOCT objective and two estimators, NOCT-Contrastive and NOCT-Amortized, to optimize feature acquisition considering temporal dynamics and costs.
Findings
Outperforms existing baselines in synthetic and medical datasets.
Achieves higher accuracy with lower feature acquisition costs.
Effectively models future trajectories for cost-effective predictions.
Abstract
In many critical domains, features are not freely available at inference time: each measurement may come with a cost of time, money, and risk. Longitudinal prediction further complicates this setting because both features and labels evolve over time, and missing measurements at earlier timepoints may become permanently unavailable. We propose NOCTA, a Non-Greedy Objective Cost-Tradeoff Acquisition framework that sequentially acquires the most informative features at inference time while accounting for both temporal dynamics and acquisition cost. NOCTA is driven by a novel objective, NOCT, which evaluates a candidate set of future feature-time acquisitions by its expected predictive loss together with its acquisition cost. Since NOCT depends on unobserved future trajectories at inference time, we develop two complementary estimators: (i) NOCT-Contrastive, which learns an embedding of…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
Following are the strengths of the paper: 1. This paper addresses an important and well motivated problem of Longitudinal Active Feature Acquisition (AFA) under temporal constraints especially in healthcare and other resource-constrained applications. 2. Also, the formulation of the objective (Eq. 1) elegantly captures the joint utility of future acquisitions rather than myopic per-feature gains. In addition to this the iterative reassessment (Algorithm 1) after each acquisition is practical
Following are the main weakenesses of the paper: 1. The procedure seems computationally intensive. - The action space is exponential $2^{ML}$ and optimizing the subset selection each time seems computationally hard. - Empirically, the paper uses uniform random sampling of only 1000 candidates to address this. No justification for why this heuristic should find near-optimal solutions in all the settings. 2. Empirical analysis needs to be expanded to check the scalability and robustne
- I think the idea is quite relevant, since many recent works either consider non-temporal AFA or constrained AFA. Allowing to select tests again reflects how tests are done in practice - The objective in (1) and the RL-free optimization is sound
Weaknesses/Questions: - The claim that $|\mathcal{O}| = 1000$ is sufficient, beyond which we get diminishing returns is a bit flawed in my opinion; in particular the combinatorial space of candidate plans grow as $2^{M(L-t)}$, and the ablation with M=2, is insufficient. It would be great if the authors can compare with growing feature sizes (maybe only work with subsets of OAI?) - It is hard to interpret the pareto-optimal curves, as I would be more interested to see where the decision-making ch
The paper considers an important problem that is common in clinical settings: when should we observe a patient, what should we observe when we do, and when should we stop observing? The proposed method, which is discussed in detail, shows better experimental performance compared to baselines on four datasets (three with real data). The approach differs from existing works that use RL, which is not well suited to the challenges of longitudinal AFA.
The details of the paper were at times difficult to follow. I understand that the nature of the problem necessitates many variables and subscripts, but there were parts I was unable to understand. I filled in the paper summary based on my best understanding, but I had trouble understanding details of the paper, especially in sections 3.3 and 3.4. For me there was a disconnect between the problem setup in the introduction, which made sense, and the details in the methodology section, which I foun
- The paper addresses a challenging and high-impact problem. - The proposed non-RL objective (Eq. 1) offers an elegant and intuitive formulation that directly optimizes the trade-off between predictive accuracy and acquisition cost, presenting a compelling alternative to complex RL-based approaches.
A key benefit of greedy objectives—typically formulated as the immediate gain from acquiring the next feature—is their ability to drastically reduce the search space into $2^M$ possible subsets (i.e., a manageable sequential selection process at each step). Although greedy methods can produce myopic decisions, prior work (e.g., Qin et al., 2024) has addressed this limitation by incorporating discounted future rewards, allowing the model to anticipate the downstream effects of current actions and
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReservoir Engineering and Simulation Methods
