The Complexity of Definability by Open First-Order Formulas
Carlos Areces, Miguel Campercholi, Daniel Penazzi, Pablo Ventura

TL;DR
This paper studies the computational complexity of determining whether a relation can be defined by open first-order formulas in finite structures, establishing coNP-completeness and parameterized complexity results.
Contribution
It formally defines the open definability problem and proves its coNP-completeness, along with its parameterized complexity classification as coW[1]-complete.
Findings
Open definability is coNP-complete.
The problem is coW[1]-complete when parameterized by size and arity of the target relation.
Complexity results hold for any vocabulary with at least one binary relation.
Abstract
In this article we formally define and investigate the computational complexity of the Definability Problem for open first-order formulas (i.e., quantifier free first-order formulas) with equality. Given a logic , the -Definability Problem for finite structures takes as input a finite structure and a target relation over the domain of , and determines whether there is a formula of whose interpretation in coincides with . We show that the complexity of this problem for open first-order formulas (open definability, for short) is coNP-complete. We also investigate the parametric complexity of the problem, and prove that if the size and the arity of the target relation are taken as parameters then open definability is -complete for every vocabulary with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
The Complexity of Definability
by Open First-Order Formulas
Carlos Areces Miguel Campercholi
[email protected] [email protected]
Daniel Penazzi Pablo Ventura
[email protected] [email protected]
Universidad Nacional de Córdoba and CONICET
Córdoba, Argentina
Abstract
In this article we formally define and investigate the computational complexity of the Definability Problem for open first-order formulas (i.e., quantifier free first-order formulas) with equality. Given a logic , the -Definability Problem for finite structures takes as input a finite structure and a target relation over the domain of , and determines whether there is a formula of whose interpretation in coincides with . We show that the complexity of this problem for open first-order formulas (open definability, for short) is coNP-complete. We also investigate the parametric complexity of the problem, and prove that if the size and the arity of the target relation are taken as parameters then open definability is -complete for every vocabulary with at least one, at least binary, relation.
1 Introduction
Arguably, any attempt to provide a logic with a formal semantics starts with the definition of a function that, given a suitable structure for and a formula in , returns the extension of in . Usually, this extension is a set of tuples built from elements in . These extensions, also called definable sets, are the elements that will be referred by the formulas of in a given structure, and in that sense, define the expressivity of . The definable sets of are the only objects that can see. For that reason, definable sets are one of the central objects studied by Model Theory. It is usually an interesting question to investigate, given a logic , which are the definable sets of over a given structure , or, more concretely, whether a particular set of tuples is a definable set of over . This is what we call the Definability Problem for over .
In this article we investigate the computational complexity of the definability problem for open first-order formulas –i.e., quantifier free first-order formulas– with equality over a relational vocabulary (open-definability, for short).
One of the main goals of Computational Logic is to understand the computational complexity of different problems for different logics. Classically, one of the most investigated inference problems is Satisfiability (SAT, for short): given a formula from a given logic decide whether there exists a structure that makes true. In recent years, and motivated by concrete applications, other reasoning problems have sparkled interest. A well known example is the Model Checking Problem (MC, for short) used in software verification to check that a given property (expressed as a formula in the verification language) holds in a given formal representation of the system (see, e.g., [7, 5]). From a more general perspective, MC can be defined as follows: given a structure , and a formula decide which is the extension of in . From that perspective, the definability problem can be understood as the inverse problem of MC: given a structure and a target set it asks whether there is a formula whose extension is . A further example of a reasoning task related to definability comes from a seemingly unrelated field: computational linguistics, more specifically, in the subarea of automated language generation called Generation of Referring Expressions (GRE). The GRE problem can be intuitively understood as follows: given a context and an target object in , generate a grammatically correct description (in some natural language) that represents , differentiating it from other objects in , or report failure if such a description does not exist (see [17] for a survey on GRE). Most of the work in this area is focused on the content determination problem (i.e., finding the properties that singles out the target object) and leaves the actual realization (i.e., expressing this content as a grammatically correct expression) to standard techniques. As it is discussed in [2, 1] the content realization part of the GRE problem can be understood as the task that, given a structure that represents the context , and an object in the domain of returns a formula in a suitable logic whose extension in coincides with . Of course, this will be possible only if is definable for over .
The complexity of the definability problem for a number of logics has already been investigated. Let FO be first-order logic with equality in a vocabulary without constant symbols. The computational complexity of FO-definability was already discussed in 1978 [18, 4], when a semantic characterization of the problem, based on automorphisms, placed FO-definability within coNP. Much more recently, in [3], a polynomial-time algorithm for FO-definability was given, which uses calls to a graph-isomorphism subroutine as an oracle. As a consequence, FO-definability is shown to be inside GI (defined as the set of all languages that are polynomial-time Turing reducible to the graph isomorphism problem). The authors also show that the problem is GI-hard and, hence, GI-complete. Interestingly, Willard showed in [19], that the complexity of the definability problem for the fragment of FO restricted to conjunctive queries (i.e., formulas of the form , where each conjunct is atomic) was coNEXPTIME-complete. The complexity upper bound followed from a semantic characterization of CQ-definability in terms of polymorphisms given in [16], while the lower bound is proved by an encoding of a suitable tiling problem. The complexity of definability has been investigated also for some modal languages: [2] shows that for the basic modal logic , the definability problem is tractable (i.e., in P); in [1] the result is extended to some fragments of known as and . [12] discusses the length of the shortest formula required to define a given target set, proving that for , the lower bound for the length of a definition is exponential in the size of the input structure. More precisely, it is shown that there are structures , , …such that for every , the size of is linear in but the size of the shortest definition for some element in is bounded from below by a function which is exponential on .
The article is structured as follows. After introducing basic notations and definitions in Section 2, we show that open-definability is coNP-complete in Section 3. Section 4 discusses the parameterized complexity of the problem. Finally, in Section 5 we show that the length of the shortest open formula that might be required in a definition cannot be bounded by a polynomial: in some cases, definitions by open formulas need to be exponentially long.
2 Preliminaries
In this section we provide some basic definitions and fix notation. We assume basic knowledge of first-order logic. For a detailed account see, e.g., [11].
We focus on definability by open first-order formulas in a purely relational first-order vocabulary, i.e., without function or constant symbols. For a relation symbol in a vocabulary , let denote the arity of . In what follows, all vocabularies are assumed to be finite and purely relational. We assume that the language contains variables from a countable, infinite set . Variables from VAR are the only terms in the language. Atomic formulas are either of the form or , where , is a sequence of variables in VAR of length and is a relation symbol of arity . Open formulas are Boolean combinations of atomic formulas. We shall often write just formula instead of open formula. We write for an open formula whose variables are included in .
Let be a vocabulary. A -structure (or model) is a pair where is a non-empty set (the domain or universe), and is an interpretation function that assigns to each -ary relation symbol in a subset of . If is a structure we write for its domain and for its interpretation function. Given a formula , and a sequence of elements we write if is true in under an assignment that maps to .
We say that a subset is open-definable in if there is an open first-order formula in the vocabulary of such that
[TABLE]
In this article we study the following computational decision problem:
[TABLE]
Let be a function. Given , we say that preserves if for all we have . The function is a subisomorphism (subiso for short) of provided that is injective, and both and preserve for each . (Note that a subiso of is exactly an isomorphism between two substructures of .) We denote the set of all subisomorphisms of by .
The following semantic characterization of open-definability is central to our study.
Theorem 1** ([6, Thm 3.1]).**
Let be a finite relational structure and . The following are equivalent:
* is open-definable in .* 2. 2.
* is preserved by all subisomorphisms of .* 3. 3.
* is preserved by all subisomorphisms of with .*
Proof.
The equivalence of (1) and (2) is proved in [6, Thm 3.1]. Certainly (2) implies (3), so we show that (3) implies (2). Let be a subiso of and let . Note that the restriction is a subiso of . Thus, . ∎
2.1 Encodings and Sizes
As is customary when considering complexity questions, the size of an object is the length of a string over a finite alphabet encoding the object. We assume fixed encodings for vocabularies, relations, structures and formulas, and define the size of these objects according to these encodings. For a set , let be the number of elements in , and for a relational vocabulary , let be the number of relational symbols in .
We write to denote the size of an object . Even though we do not specify the encodings we assume the following equalities throughout this note. Let be a relational vocabulary, a -structure, and a first-order formula.
- •
,111When an expression involving does note make sense, read it as .
- •
,
- •
,
- •
.
Here [] stands for the number of occurrences of relation symbols [variables] in , and is the number of different variables occurring in . Since a formula is a -formula for every containing the relation symbols in , the encoding of (and thus its size) depend on which vocabulary we have in mind for . Another assumption we make on the encodings is that determining whether can be computed in time .
3 Classical Complexity of Open-Definability
In what follows a graph is a model of the vocabulary , with binary, and such that is symmetric and irreflexive. We provide a reduction from the following problem to prove our hardness result.
[TABLE]
is known to be -complete (see, e.g., [15]).
Theorem 2**.**
* is -complete.*
Proof.
We first prove hardness. Fix an input graph and a positive integer . We may assume that is disjoint with the set of integers. Suppose first that . Let be the graph with universe and with
[TABLE]
That is, is the disjoint union of and a path of length . Define
[TABLE]
Now, observe that by Theorem 1 we have that returns on input if and only if returns on input . The case where is odd is analogous.
Showing that is in is a straightforward application of Theorem 1. Given a finite relational structure and , the fact that returns on input is witnessed by a bijection between subsets of satisfying conditions easily checked in poly-time with respect to the size of . ∎
Given a relational signature let be the restriction of to input structures of signature . In view of the proof of Theorem 2 we have the following.
Corollary 3**.**
* is -complete.*
4 Parameterized Complexity of
Parameterized complexity is a mathematical framework that allows for a more fine-grained analysis of the computational costs of a problem than classical complexity. In a *parameterization *of a (classical) problem we single out a specific part of the input of the problem to try and understand how this part affects the computational cost. For example, a parameterization of propositional (i.e., the satisfiability problem for Propositional Logic) could be the number of variables in the input formula.
We begin with the basic definitions involved. There are slight discrepancies for these definitions in the literature, but our results remain valid regardless of which of the versions is used. We follow the account in [14]. As is customary, (classical) decision problems are formalized as languages over finite nonempty alphabets. Let be a finite alphabet.
- •
A parameterization of is a mapping that is polynomial time computable.
- •
A parameterized (or parametric) problem (over ) is a pair consisting of a set of strings over and a parameterization of .
We consider the following parameterization222This parameterized problem (and others appearing below) is presented in an informal way, but it should be clear how to cast them in the form of our formal definition. of :
[TABLE]
For a positive integer , the -th slice of a parameterized problem is the restriction of the problem to all instances such that . Let denote the -th slice of .
Proposition 4**.**
* is computable in time where is the size of the input and a polynomial.*
Proof.
Fix a finite relational structure and . Let be the size of and . Given a positive integer let
[TABLE]
Note that Corollary 1 implies that is open-definable in if and only if every preserves . We show that the right-hand side of this equivalence can be checked in polynomial time. Let
[TABLE]
Observe that
[TABLE]
since there are bijections between any two subsets of size of , and so
[TABLE]
Now for each we have to check:
if , 2. 2.
and in that case, if preserves .
Clearly both these tasks can be carried out in time bounded by a polynomial in . If is such a polynomial, then checking if every member of preserves takes at most steps. Thus, the computation for on input can be done in at most
[TABLE]
steps. ∎
A parameterized problem over the alphabet is fixed parameter tractable (FPT) if there is an algorithm together with a polynomial and a computable function such that decides if in time for a all . The class of all fixed parameter tractable problems plays the role plays in classical complexity.
Even though each slice of can be computed in polynomial time, the bound given by Proposition 4 does not imply , since the parameter appears as an exponent of the size of the input. In fact, as we shall see below it is unlikely that is FPT, since it is hard for the class ; a class of parameterized problems believed to be strictly larger than . But before we can discuss hardness of parameterized problems we need an adequate notion of reduction.
Definition 5**.**
Let and be parameterized problems over the alphabets and , respectively. An fpt-reduction from to is a mapping such that:
For all we have . 2. 2.
There is a computable function and a polynomial such that is computable in time . 3. 3.
There is a computable function such that for all .
If and are parameterized problems, we write if there is an fpt-reduction from to , and write if there are fpt-reductions in both directions.
There are other notions of fpt-reduction, such as Turing fpt-reductions [14], involving oracles. All fpt-reductions in this note satisfy Definition 5, and thus we simply use the name fpt-reduction for them.
To analyze the complexity of parametric problems which appear not to be tractable, Downey and Fellows introduced the hierarchy [8]. The classes
[TABLE]
in this hierarchy are closed under fpt-reductions and are believed to be all different. They have many natural complete problems (see, e.g., [10, 14]). One can think of this hierarchy as analogous to the polynomial hierarchy in classical complexity. We only consider the classes and in the sequel. Their formal definitions are somewhat involved and not needed in our arguments, so we do not include them here. (The interested reader can find them in [14].) We do need the following characterization of , analogous to the characterization of in terms of “certificates”.
Lemma 6** ([14, Lem 3.8]).**
A parameterized problem over the alphabet is in if and only if there are computable functions , a polynomial , and a such that:
For all it is decidable in time whether . 2. 2.
For all , if then . 3. 3.
For every we have iff there exists such that .
The complement of a parametric problem is the parametric problem . It follows directly from the definitions that iff . For a class of parametric problems, let denote the class off all parametric problems whose complement is in .
Proposition 7**.**
.
Proof.
By Theorem 1 we know that fails to be open-definable in if and only if there is a bijection between subsets of of cardinality at most , satisfying that is a subisomorphism of and does not preserve . Such a bijection can be encoded as a binary string of length , and we can compute in time polynomial in ) if is a subisomorphism of not preserving . (Note that this is a refinement of the argument we used in Theorem 2 to prove .) ∎
Next we establish a lower bound for the complexity of by a reduction from the following parameterized version of the Clique problem.
[TABLE]
It is proved in [9, Cor 3.2] that is complete (under fpt-reductions) for the class .
Lemma 8**.**
; hence is hard for .
Proof.
The idea is the same as in the proof of Theorem 2. Given an input for the reduction computes the input for , where is the disjoint union of with the complete graph on the vertices , and . It is easy to see that this is an fpt-reduction, and that has a clique of size iff is not open-definable in . (Note that this is *not *a polynomial reduction.) ∎
In contrast with our analysis of the classical complexity of , we were not able to show that is in . However when we fix the vocabulary we can establish a sharp upper bound. For a vocabulary let denote the restriction of to input structures with vocabulary , and let denote the complement of this problem. Before we start on the upper bounds, we have the following consequence of Theorem 8.
Corollary 9**.**
For every vocabulary with at least one at least binary relation we have ; and hence is hard for .
Proof.
If has at least one at least binary relation, it is easy to see that
[TABLE]
From the proof of Lemma 8 it follows that . ∎
We turn now to establishing an upper bound for . Recall that a sentence is existential if it has the form where is open. Let be the set of all existential sentences over a vocabulary . For a given vocabulary , consider the following parameterized model checking problem:
[TABLE]
Recall that . It is proved in [13] that is in for all .
Theorem 10**.**
For every vocabulary , . Thus, .
Proof.
Fix a -structure and . For each tuple , and each let be the the conjunction of the following set of atomic formulas
[TABLE]
where is the arity of . Observe that
[TABLE]
for a suitable polynomial . Also observe that can be computed in time , so there is a polynomial such that the computation of can be done in at most steps. Next, define
[TABLE]
Let be the greatest among the arities of the relations in . Then, , and is computable in time bounded by . Note that characterizes the isomorphism type of in , i.e., for all we have
[TABLE]
Now, let
[TABLE]
and take as the sentence
[TABLE]
where is the number of tuples in . It is straightforward to check that there are polynomials and such that:
- (1)
can be computed in time ,
and
- (2)
Next, note that if and only if has the same isomorphism type as some tuple in ; thus asserts that there are distinct -tuples such that each one has the same isomorphism type as some tuple in . So, if and only if there are and such that is a subisomorphism of . By Theorem 1, this says that:
- (3)
iff is not open-definable in .
To conclude, observe that (1-3) guarantee that the transformation is an fpt-reduction from to . (Notably, it is also a polynomial many-one reduction.) ∎
It is worth to note that, by Corollary, Theorem 10 actually holds with in place of . That is,
[TABLE]
for every vocabulary with at least one at least binary relation.
Combining the upper an lower bounds found above we obtain the main result of this section.
Theorem 11**.**
* is -complete for every vocabulary with at least one at least binary relation. In particular, is -complete.*
Proof.
Combine Corollary 9 and Theorem 10. ∎
5 The Length of Open Formulas
In previous sections we show that the definability problem for open formulas is not tractable. We now discuss the size of formulas required in definitions. In particular, we show that it is not possible to polynomially bound the size of an open formula required in the definition of a relation in a given structure. More formally, we construct a sequence whose size grows polynomially in , and such that the smallest definition of in is exponentially large in . This does not come as a surprise though; a polynomial bound on the size of defining open formulas would entail that is in and, since we know is -complete (Theorem 2), we would have .
Theorem 12**.**
For each there are a finite relational structure and an -ary relation over such that:
- •
,
- •
* is open-definable in , and*
- •
every open formula defining in has at least literals.
Proof.
Fix a natural number and let be the union of four pairwise disjoint -element sets, say
[TABLE]
Let be the matrix such that
- •
the first column of is , and the rest of its entries are .
For each take to be the matrix such that
- •
the first column of is , the -th column of is , and the rest of its entries are .
Next, define
[TABLE]
In what follows we identify -tuples with matrices (in the obvious way). We adjust the indexes of our variables accordingly, i.e., we shall write
[TABLE]
instead of .
Let ; we prove that and satisfy the conditions of the theorem. First observe that, according to our definition of size (see Section 2.1), we have
[TABLE]
Next, define
[TABLE]
Suppose is an matrix with entries in such that . Then the first column of , say , and each of its rows must be in . As each must be the first coordinate of some tuple in , we have that . If , then , since for each there is exactly one tuple in with first coordinate . On the other hand, if , we have that for each the -th row of agrees with the -th row of for some .
Given let be the matrix whose -th row is the -th row of for . From the considerations in the last paragraph it follows that
[TABLE]
For each define
[TABLE]
and observe that . So, if we take
[TABLE]
it follows that . Thus is open-definable in .
To conclude, note that is the only atomic formula that distinguishes from . Thus, if is an open formula that defines , then must occur in for every . ∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] C. Areces, S. Figueira, and D. Gorín. Using logic in the generation of referring expressions. In S. Pogodalla and J. Prost, editors, Proceedings of the 6th International Conference on Logical Aspects of Computational Linguistics , volume 6736 of LNCS , pages 17–32. Springer, 2011.
- 2[2] C. Areces, A. Koller, and K. Striegnitz. Referring expressions as formulas of description logic. In Proceedings of the Fifth International Natural Language Generation Conference , pages 42–49. ACL, 2008.
- 3[3] M. Arenas and G. Diaz. The exact complexity of the first-order logic definability problem. ACM Transactions on Database Systems , 41(2):13:1–13:14, 2016.
- 4[4] F. Bancilhon. On the completeness of query languages for relational data bases. In J. Winkowski, editor, Proceedings of the 7th Symposium of Mathematical Foundations of Computer Science , volume 64, pages 112–123. Springer, 1978.
- 5[5] B. Berard, M. Bidoit, A. Finkel, F. Laroussinie, A. Petit, L. Petrucci, and P. Schnoebelen. Systems and Software Verification: Model-Checking Techniques and Tools . Springer, 2010.
- 6[6] M. Campercholi and D. Vaggione. Semantical conditions for the definability of functions and relations. Algebra universalis , 76(1):71–98, 2016.
- 7[7] E. Clarke, O. Grumberg, and D. Peled. Model Checking . MIT Press, 1999.
- 8[8] R. Downey and M. Fellows. Fixed-parameter tractability and completeness I: Basic results. SIAM Journal on Computing , 24(4):873–921, 1995.
