Compression with wildcards: Abstract simplicial complexes
Marcel Wild

TL;DR
This paper introduces a new algorithm called Facets-To-Faces for efficiently compressing and representing abstract simplicial complexes, with applications in various computational fields.
Contribution
The paper presents a novel algorithm for compressing simplicial complexes using wildcards, improving efficiency over existing methods, and introduces a new way to compute face numbers from facets.
Findings
Facets-To-Faces outperforms Mathematica's BooleanConvert and Python BDDs in compression.
The algorithms can be parallelized for enhanced performance.
Applications include reliability analysis, combinatorial topology, and frequent set mining.
Abstract
Despite the more handy terminology of abstract simplicial complexes SC, in its core this article is about antitone Boolean functions. Given the maximal faces (=facets) of SC, our main algorithm, called Facets-To-Faces, outputs SC in a compressed format. The degree of compression of Facets-To-Faces, which is programmed in high-level Mathematica code, compares favorably to both the Mathematica command BooleanConvert, and to the BDD's provided by Python. A novel way to calculate the face-numbers from the facets is also presented. Both algorithms can be parallelized and are applicable (e.g.) to reliability analysis, combinatorial topology, and frequent set mining.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFormal Methods in Verification · Polynomial and algebraic computation · Constraint Satisfaction and Optimization
Compression with wildcards:
Abstract simplicial complexes
Marcel Wild
Abstract. Despite the more handy terminology of abstract simplicial complexes , in its core this article is about antitone Boolean functions. Given the maximal faces (=facets) of , our main algorithm, called Facets-To-Faces, outputs in a compressed format. The degree of compression of Facets-To-Faces, which is programmed in high-level Mathematica code, compares favorably to both the hardwired Mathematica command BooleanConvert, and to the hardwired BDD’s provided by Python. A novel way to calculate the face-numbers from the facets is also presented. Both algorithms can be parallelized and are applicable (e.g.) to reliability analysis, combinatorial topology, and frequent set mining.
**Key words: **Abstract simplicial complex, face-numbers, antitone Boolean function, exclusive sum of products, binary decision diagram, compressed enumeration, wildcards, reliability polynomial, partitionability conjecture
1 Introduction
While the present article focuses on bare algorithmics, four application areas are outlined at the end of this introduction, in Subsection 1.3. We start with a broad (1.1), and then more detailed (1.2) outline of the article.
1.1 An abstract111The adjective ’abstract’ is sometimes added to make a distincion to the simplicial complexes considered in topological combinatorics. For the sake of brevity we henceforth drop ’abstract’. simplicial complex (also called set ideal) based on a set is a family of subsets (called faces) such that from , , follows . Without further mention, in this article all structures will be finite. In particular all simplicial complexes contain maximal faces, called the facets of . Henceforth we mostly stick to . A face of cardinality is called a -face, and the set of all -faces is denoted as . The numbers are the face-numbers of the simplicial complex. The purpose of this article is to retrieve the following data from the facets:
an enumeration of ;
an enumeration of for one arbitrary ;
the cardinality ;
the face-numbers for all .
Although our four tasks can be phrased in terms of Boolean functions, speaking of simplicial complexes is, for the most part, more illuminating. While task matches , there is a mismatch between and . Here is why: If we change to the calculation of one , then this (essentially) is just as hard. Throughout the article the simplicial complex whose facets are
(1) ,
serves to illustrate our algorithms.
The theoretic complexity of at least three of the problems is well known. To witness, according to [V] it is -hard to calculate the number of models of a Boolean function given in DNF, even if is antitone222Strictly speaking that follows by de Morgan duality since Valiant only speaks about CNF’s and monotone Boolean functions. Recall that is monotone if , and antitone if . . Since (C) can be modelled by such (see (3)), this implies the -hardness of and a fortiori . Like most (unfortunately not all) authors we take enumeration as a synonym for generation, thus not to be confused with mere counting. It might be counter-intuitive333More ’philosophy’ on this matter follows in Subsection 6.3. that enumerating should ever be more tractable than counting. Yet amounts to enumerate the models of a specific DNF, and enumerating the models of any DNF works in ’benign’ polynomial total time, whereas (C) is -hard. Perhaps the complexity of was known before but the author could not pinpoint a reference; the matter is settled anew in Theorem 2. Our main contributions are however on the practical side; when computational efficiency lacks a theoretic underpinning (which is to be expected in view of Valiant’s results) it will be evidenced by numerical experiments. The main effort will go into and . That is because we strive for a compressed enumeration in both cases.
1.1.1 Compression starts with the don’t-care symbol ’2’ (other authors write ) which say in signifies that both bitstrings (=01-rows) and are allowed. This leads to 012*-rows*. For instance, the modelset of a (Boolean) term like is the 012-row (assuming there are 7 Boolean variables altogether). Conversely, any 012-row of length yields a unique term with at most Boolean variables. As usual is isomorphic to the powerset of . Thus above can be viewed as a 16-element interval (also called ’cube’) of , with smallest element and largest element . Suppose a Boolean function has a DNF which is orthogonal in the sense that the conjunction of any two terms in it is insatisfiable. Then the modelset is a disjoint union of the 012-rows . Although ’orthogonal DNF’ and ’exclusive sum of products (ESOP)’ are often used synonymously, in the present article ESOP always refers to a representation of as a disjoint union of 012-rows.
Apart from ’2’ novel types of wildcards will be introduced. We mainly deal with 012e-rows but in the last Section glimpse at 012men-rows like (Table 11). Here respectively mean: at least one 1 here (so ); at least one 0 here; at least one 1 and one 0 here.
1.2 Here comes the Section break-up. Section 2 deals with (C). After dispensing with inclusion-exclusion we turn to so-called Binary Decision Diagrams (BDD’s), that will accompany us throughout the article. We use to illustrate the basic structure of BDD’s and how they solve (C). The third method handling (C) applies the e-algorithm of [W2], whose main features are quickly reviewed. Section 3 is dedicated to . Inclusion-exclusion can still be used but remains awfully slow. As to BDD’s, an elegant method of Knuth is mentioned. The third method (with the prosaic name e+rp+sub) again exploits the -algorithm and adds another gadget. The core Section 4 deals with (E). We start with two naive (yet intriguing) methods solving (E). Then come binary decision diagrams, which offer some compression via 012-rows. Our Facets-To-Faces algorithm does better by employing 012e-rows and, as opposed to BDD’s, it has a theoretic backbone (Theorem 1). Connections to combinatorial topology and convex polytopes are pointed out. The numerical experiments in Section 5 show that Facets-To-Faces always compresses better than the Mathematica command BooleanConvert, and way better than BDD’s. Timewise Facets-To-Faces keeps at bay BDD’s but yields to BooleanConvert if instead of few large facets there are many small facets. The fact that Facets-To-Faces is programmed in high-level Mathematica, whereas BooleanConvert is ’hardwired’, admittedly does not fully account for this. But then again, it matters little since Facets-To-Faces is easy to parallelize. Section 6 offers two algorithms for . While polynomial total time can be proven for one, the other performs better in practice (due to compression).
The last two Sections can be viewed as ’side-shows’. Section 7 investigates what happens when instead of the facets the minimal non-faces of a simplicial complex are given. The four problems can then be handled in a more or less dual fashion. Section 8 harks back to Section 4 and makes first strides to lift Facets-To-Faces from antitone DNF’s (=simplicial complexes) to arbitrary DNF’s.
1.3 Here come four areas of application; the latter two are currently of a more tentative nature.
First Reliability Analysis. In this domain the usual name for ’simplicial complex’ is ’coherent system’ (or ’independence system’). The reliability polynomial of a coherent system is defined as where the ’s are the face-numbers of (see above). In several areas of engineering (e.g. network analysis or stack filters for nonlinear signal processing) it is important to calculate fast, and many methods have been proposed in the last six decades. Some of them (like our e+rp+sub) target the face-numbers. In another vein, a partitioning of into few intervals (=012-rows) would yield immediately. Such a partitioning was found in [BN] for matroid-complexes, i.e simplicial complexes consisting of all independent sets of a matroid. Our Facets-To-Faces succeeds for every simplicial complex and uses more powerful 012e-rows.
This leads to Combinatorial Topology. Namely, the number of 012-rows used in [BN] is as small as it can possibly be; it equals the number of bases of the matroid. Generally a simplicial complex with facets is called partitionable if it can be represented as a disjoint union of many 012-rows. This is a popular concept in combinatorial topology. Many deep connections to other concepts have been established. For instance: -- and . The long conjectured implication - was falsified in [DKM]. A few ideas on how Facets-To-Faces and e+rp+sub may touch upon these matters follow in Section 4.4.
Third, consider the classic Inclusion-Exclusion formula with its exponentially many summands. It is vexing that many summands are often zero, but pleasant that the nonzero summands match a simplicial complex (aka ’nerve’). Isolation and compression of the nerve speed up classic inclusion-exclusion. See arXiv:1309.6927.
Last but hardly least, a prominent area of data mining is Frequent Set Mining. Specifically, Facets-To-Faces can compress all frequent sets from a knowledge of either the maximal frequent sets (i.e. the facets), or the minimal infrequent sets (Sec.7). Many algorithms (e.g. the A priori method, listed in [WK]) have been proposed for these problems; all proceeding one-by-one. See arXiv1910.14508, which also discusses how to get the maximal frequent sets in the first place.
2 Calculating the cardinality of from its facets
After inclusion-exclusions (2.1) and BDD’s (2.2), a novel method to solve (C) is introduced 2.3.
2.1 Consider the simplicial complex whose facets are listed in (1). Using inclusion-exclusion one finds
Having complexity , this method is only efficient for small , but for such has the advantage that the cardinalities of the faces hardly matter, as opposed to competing methods.
2.2 Another established method uses Binary Decision Diagrams (BDD’s); we recommend [K,Sec.7.1.4] as a general reference. To warm up with Boolean functions and to survey the essentials of BDD’s, consider this (antitone) Boolean function:
The models of (i.e. the bitstrings with ) match the faces of . For instance since . Accordingly
On the other hand, and accordingly
Whether or not a bitstring is a model of a Boolean function can (excluding trivial cases) be decided faster by feeding to the BDD than by evaluating a potentially large Boolean formula. The BDD of is rendered in Figure 1. If , then tells us that at the top node (=root) of the BDD we must take the dashed branch (it being labelled by 0). It leads us to one of the two sons of , i.e. the one labelled . Since , the dashed path leads us to a node labelled . Since we now take the solid path (it being labelled by 1), which brings us to the rightmost node labelled . Because , three dashed paths bring us to a node labelled . Because , the dashed path brings us to the leaf 1 (distinguished from ordinary nodes by a square frame). By construction of the BDD that signifies . (Notice that the values of were irrelevant.) One checks that indeed . If the value of had been 1 instead of 0, then we would have reached the leaf 0 (with square frame) at once, indicating that .
2.2.1 BDD’s allow to determine the number of models fast. For this purpose we assign in a recursive manner a probability to each node. One starts by assigning probability 0 to the leaf 0, and probability 1 to the leaf 1. Working one’s way from bottom to top, if has sons with probabilities , assign to it probability . For in the end the root gets probability . Since the total number of length 9 bitstrings is , a moment’s thought shows that the cardinality of the model set is , which matches (2). The cost of calculating (C) this way is linear in the size of the BDD (=number of its nodes).
Figure 1: One (of many) BDD of in (3)
2.3 The third way to settle (C) is based on a certain e-algorithm, which in turn is based on 012e-rows. Extending the concept of a 012-row (Introduction), by definition a 012e-row contains one or more wildcards of type , each one of which demanding ’at least one 1 here’. Thus the 012e-row is the set of bitstrings , where e.g. . If several wildcards occur, they are distinguished by subscripts. Calculating the number of bitstrings contained in a 012e-row is easy, say
Recall that a transversal of a hypergraph (=set system) is a subset such that for all . Let be the set of all transversals. The (transversal) e-algorithm, fully described in [W2], represents as a disjoint union of many 012e-rows in polynomial total time .
**2.3.1 ** Consider now any simplicial complex with facets and so on. Putting for any it holds for all that
(4) .
To fix ideas, take , whose five facets are listed in (1). If we apply the -algorithm to then it outputs as a disjoint union of seven -rows:
[TABLE]
Table 1: Compressing with the transversal -algorithm
According to (4), coincides with the set filter . It follows that
(5)
,
which matches the number obtained in 2.1 and 2.2. As will be seen in 3.3.1, inclusion-exclusion stands no chance against the method of 2.3. The bottleneck in 2.2 is the calculation of the BDD itself. That’s because the expected444To be fair, in many scenarios the occuring Boolean functions do not represent a random sample of all many Boolean functions, and the BDD-size can be moderate then. size of the BDD of a Boolean function is , and hence the calculation of BDD’s cannot be done in polynomial total time; the numerical experiments in Section 5 will speak the same language.
3 Calculating the face-numbers of from its facets
Here we settle by refining the three methods of Section 2.
3.1 Generalizing (2) the principle of inclusion-exclusion als applies to calculate the face-numbers . Thus for we find
For say this gives
**3.2 ** While calculating the number of models of a Boolean function from its555We mention in passing that ’its’ is unprecise. A Boolean function has a unique BDD only once a linear ordering of the Boolean variables has been fixed. The BDD in Figure 1 is based on the (popular default) ordering . BDD is well known, using BDD’s to calculate the number of models of fixed Hamming weight (here: faces of fixed cardinality) is less known. Somewhat streamlining the account of Knuth [K,p.260,Exercise 25], details can be found in the preprint arXiv.1703.08511v5. Suffice it to say that the sought numbers fall out as the coefficients of a certain polynomial that is calculated recursively by processing the BDD bottom-up, in much the same way as in 2.2.
**3.3 ** One ingredient of the third method for will also exploit coefficients of polynomials, but they are different from Knuth’s polynomials. The main ingredient is, as in 2.3, the -algorithm. Consider thus a generic -row
(8)
It is easy to see that the number Card of -element sets in equals the coefficient of in the row-polynomial
(9)
Details on the complexity of calculating these coefficients can be found in [W2, Theorem 1]. Here we simply apply the Mathematica command Expand to the polynomial induced by in Table 1 and obtain
(10) .
Thus e.g. Card. Recall from 2.3.1 that . Let be the number of -element transversals of , i.e. the number of -element sets of . By the above, all numbers are readily calculated as
(11)
Hence the face-numbers of (or any simplicial complex) can be calculated with this ’subtraction trick’:
(12)
For instance , which matches (7). In view of the #P-hardness of and the costly calculation of BDD’s we consider our threefold approach
a nice way to get the face numbers from the facets. The e-algorithm is easy to parallelize (by the same reason as in [W4, sec.6.5]), and therefore also e+rp+sub. In contrast, the calculation of a BDD from a Boolean formula can hardly be parallelized.
3.3.1 In a previous version of the present article (arXiv:1302.1039v4) e+rp+sub was pitted against inclusion-exclusion on random simplicial complexes of type , i.e. the facets all had facet-size . Predictably inclusion-exclusion took time almost proportional to ; thus took 164 seconds and needed to seconds. In contrast, e+rp+sub took 1896 sec for the latter, and handled (which triggered about 2 billion 012e-rows) in 64606 seconds . The corresponding time for inclusion-exclusion measures in centuries.
4 The Facets-To-Faces algorithm
Here we tackle the main task (E), i.e. given the facets, enumerate (preferably compressed) all faces! Section 4.1 describes two naive algorithms. The first is everybody’s first temptation, but outputs the faces one-by-one. Although the second has the potential for compression (using 012-rows), it nevertheless can be inferior. Knowing a BDD of one can use 012-rows more efficiently for compression. Section 4.3 introduces the novel Facets-To-Faces algorithm which displays as a disjoint union of more powerful 012e-rows. Section 4.4 relates Facets-To-Faces to facets and faces of topological simplicial complexes and convex polytopes.
**4.1 ** We put in front some definitions for 4.1.2. If , call a 012-row of length feasible if (which amounts to for some ). Further call final if (which amounts to for some ).
**4.1.1 ** The First Naive Algorithm (FNA) for enumerates simply as . As to ’simply’, trouble is that multiple occurencies of faces (such as need to be pruned. Specifically, by induction suppose that for any we have obtained such that for . Then only the members of distinct from all ’s are added to the list. Here comes a confession: This is how FNA de iure must be programmed. De facto we exploited two shortcuts provided by Mathematica. First, enumerating a powerset (such as ) is more subtle than it looks; see [K,sec. 7.2.1.1]. We circumvented that issue with the Mathematica command Subsets, which behaves as follows: Subsets directly outputs . Second, the command Union automatically prunes multiple occurencies (and orders the output); thus Union outputs . Therefore, if has been computed, we get a pruned listing of with Union.
4.1.2 The Second Naive Algorithm (SNA) uses variable-wise branching (a better name being pivotal decomposition, as argued in [W4, sec. 2.5]). Initially our Last-In-First-Out (LIFO) stack only contains the feasible row . Generally always the top 012-row of the LIFO-stack is picked. The ”first” occuring digit 2 (with respect to a fixed ordering of the index set ) is turned to 0 and 1 respectively. This yields 012-rows and . By induction was feasible. Since subsets of faces are faces, it follows that is feasible, but not necessarily . These one or two feasible 012-rows replace on the LIFO stack (except that final rows go to an initially empty ’final stack’). As soon as the LIFO stack is empty, the union of the 012-rows in the final stack is disjoint and equals . (Theorem 2 fine-tunes the above in a more sophisticated setting.)
**4.1.3 ** Here comes an experimental comparison of the two naive algorithms. For various random instances (see 3.3.1) we recorded the times (rounded to full seconds) needed for FNA and SNA respectively to enumerate the ensuing simplicial complex . The number of final 012-rows produced by SNA is recorded as well. In contrast, FNA offers no compression but, recall, its advantage is that all faces contained in a facet , are ’instantaneously’ produced by Subsets[]. This advantage wins out in the instance. The -instance lets SNA catch up because it compresses on average roughly 16 faces per 012-row, whereas FNA outputs faces one-by-one and invests considerable time (despite the hardwired Union command!) to prune duplicated faces. These two trends increase in the -instance to the extent that SNA is more than twice as fast as FNA. The tables are turning again in the extrapolated -instance because the compression of SNA is just too poor (about 2 faces per 012-row). Finally, the extrapolated -instance with its 109’437’738 faces666Several of the methods dscussed in Section 5 can get this number very fast provides a Pyrrhus victory for SNA: The FNA ran out of memory while executing Union.
[TABLE]
Table 2: Comparison of the two naive algorithms
**4.1.4 ** For SNA the number of final 012-rows which are proper (i.e. not 01-rows) heavily depends on the particular ordering of the index set . For instance, using the natural ordering the SNA represents our 52-element example as a disjoint union of 19 rows. The minimum (=13) and maximum (=44) number of final 012-rows are obtained (e.g.) for the orderings and respectively.
**4.2 ** Figure 1 shows the BDD of the Boolean function of (3). Recall from 2.2 that for , and so feeding to the BDD traced a path from the root to the 1-leaf. The fact that the values were irrelevant for reaching the 1-leaf shows that . Generally each path from to the 1-leaf yields a 012-row contained in , and distinct paths induce disjoint 012-rows (why?). Therefore, if there are such paths, then can be written as union of disjoint 012-rows. We call this the BDD-induced ESOP of (see 1.1.1). How to find these paths efficiently?
[TABLE]
Table 3: The ESOP induced by the BDD of
A look at Figure 1 shows that the only -models with are the ones in the 012-row of Table 3. All other -models must fit the pattern of . One can get rid of the first ’?’ in by splitting as . However, continuing in this manner can create many dead-end paths. It is better to embrace a bottom-up approach akin to 2.2.1. This would show that and . Hence . Adding up the cardinalities of the fourteen rows yields , as was to be expected.
4.2.1 The mere number of 012-rows in a BDD-induced ESOP can be predicted without having to calculate the ESOP. Namely, proceeding bottom-up, assign integers (instead of probabilities as in 2.2.1) to the BDD-nodes as follows. The 0-leaf and 1-leaf receive 0 and 1 respectively. If node has sons with assigned integers , assign to it . The last number equals . The reader is invited to verify that for the BDD in Figure 1 this procedure indeed yields .
4.3 We now embark on the third method for solving (E), it being the core of our article. Suppose has facets to , and by induction we have obtained for some a representation
(13)
with -rows . If is the -row matching then evidently
(14) ,
and so the key problem is this: For a given -row and -row recompress the set difference as disjoint union of -rows. Let us do away with the two extreme cases first. First, iff thus iff either a 1 or -wildcard of falls into zeros. Second, iff , thus iff zeros. For instance .
[TABLE]
Table 4: Recompressing the type set difference
In all other cases we focus on the flexible (i.e. ) symbols of , thus for in Table 4 the symbols on the positions 1 to 11. The only way for to detach itself from (the ’plebs’ in) is to employ those flexible symbols of that are “above” a [math] of , in the sense that they occupy a position which in is occupied by [math]. For the particular and in Table 4 a bitstring detaches itself from iff ones. Depending on whether the smallest element of ones belongs to (this partition is dictated by the wildcard pattern of ), the bitstring belongs to exactly one of the sons .
The powersets of the five facets of (see (1)) are listed as the first five -rows in Table 5. Applying detachment repeatedly yields:
\begin{array}[]{rll}r_{1}\cup r_{2}&=&(r_{1}\setminus r_{2})\uplus r_{2}=:r_{6}\uplus r_{2}\\ \\ (r_{6}\uplus r_{2})\cup r_{3}&=&(r_{6}\setminus r_{3})\uplus(r_{2}\setminus r_{3})\uplus r_{3}=:(r_{7}\uplus r_{8})\uplus r_{9}\uplus r_{3}\\ \\ (r_{7}\uplus\cdots\uplus r_{3})\cup r_{4}&=&(r_{7}\setminus r_{4})\uplus(r_{8}\setminus r_{4})\uplus(r_{9}\setminus r_{4})\uplus(r_{3}\setminus r_{4})\uplus r_{4}\\ \\ &=:&r_{7}\uplus r_{8}\uplus(r_{10}\uplus r_{11})\uplus r_{12}\uplus r_{4}\\ \\ (r_{7}\uplus\cdots\uplus r_{4})\cup r_{5}&=&(r_{7}\setminus r_{5})\uplus(r_{8}\setminus r_{5})\uplus(r_{10}\setminus r_{5})\uplus(r_{11}\setminus r_{5})\uplus(r_{12}\setminus r_{5})\uplus(r_{4}\setminus r_{5})\uplus r_{5}\\ \\ &=:&r_{7}\uplus r_{8}\uplus r_{10}\uplus r_{11}\uplus r_{13}\uplus r_{14}\uplus r_{5},\end{array}
From Table 5 follows, as it must, that
[TABLE]
We call this algorithm Facets-To-Faces.
[TABLE]
Table 5: Compressing with Facets-To-Faces
Theorem 1: Let be the facets of a simplicial complex . Then Facets-To-Faces enumerates as a union777In view of the 012-rows entering the definition of ’ESOP’, one could call this kind of union a ’fancy ESOP’ for or, more precisely, for its underlying antitone Boolean function (such as (3) for ). of disjoint 012e-rows in time .
Proof. By induction assume that for some the decomposition (13) has been achieved. If some 012e-row is contained in then neither nor any of its sons and grandsons will survive in the long run. Thus is a dud, i.e. causing work without benefit. Moreover, unless is cancelled right away, it is impossible to predict the algorithm’s total time. Fortunately, letting be the unique largest set in (thus is obtained by setting all ’s and ’s to ), it holds that
[TABLE]
Testing for all whether with costs . In other words, that is the cost of pruning the righthand side of (13) from duds. What is the cost to proceed from a (pruned) representation (13) to a (not yet pruned) representation (14)? Because has at most sons (which is clear from Table 4), and ’writing down’ each son is obvious (i.e. costs ), the asked for cost is . Hence the overall cost is
[TABLE]
4.3.1 Suppose Facets-To-Faces has advanced to representing as a disjoint union of 012e-rows. At one’s digression one can then embark on distributing the computation to satellite stations. Say and
where are approximately equal. Putting the control sends to satellite 1, and to satellite 2, and to satellite 3. After a while the control receives from satellite 1 some 012e-rows such that . Satellite 2 and satellite 3 send analogous rows and .(Note that may differ significantly in magnitude.) The control pools the received rows, adds row , and divides the rows in three approximately equal-sized parts. The three parts, each augmented by , are sent back to the satellites. And so forth.
4.4. This Subsection links the above to convex polytopes. We begin with the framework of -subsemilattices , i.e. . If the set of meet-irreducibles (or any -generating set) is known, then can be generated one-by-one in polynomial total time by a variety of algorithms. These algorithms e.g. are of interest in Formal Concept Analysis [GO]. Ganter’s NextClosure algorithm [GO,p.44] was the first and is still popular.
Consider now a convex polytope . Quoting from [FR,p.192]: …the combinatorial face enumeration problem (CFEB) is to enumerate all faces of in terms of their representations without duplications. What Fukuda and Rosta mean by the ’representation’ of a face is the set of facets in which is contained. Let be the set of vertices of , identify each face of with the set of vertices it contains, and let be the set of all faces. In this setting CFEP reduces to enumerating from the set of facets. (As to how the facets themselves can be found, see 4.4.2.) In [KP], which improves upon results in [FR], and which was inspired by NextClosure, not only the individual faces but all covering pairs of faces are enumerated from the facets in polynomial total time.
4.4.1 A convex polytope is a simplex if any subset of (the vertex set of) any face is (the vertex set of) a face. For instance, the simplices in are exactly the tetrahedrons. Gluing together simplices yields (topological) simplicial complexes888Since we are only concerned with abstract simplicial complexes (defined in 1.1) we can dispense with a formal definition of topological simplicial complexes. We mention in passing: Other than convex polytopes, simplicial complexes which are not simplices have meet-irreducible faces which are not facets (which?). However this is irrelevant since is already determined by its facets.. As is to be expected, the [KP]-algorithm accelerates for simplicial complexes, yet still enumerates one-by-one. In [BM], which similarly caters for combinatorial topologists, the individual faces are organized in a tree-structure. This supports various combinatorial operations (such as contracting edges), but again offers no compression.
Enter Facets-To-Faces. Apart from the practical aspects of compression, there is a connection to an important theoretical concept. Namely, in any disjoint representation of by 012e-rows each facet must be the largest member in the row it happens to belong to. In particular, if there are facets then any disjoint representation comprises at least many 012e-rows. In combinatorial topology a simplicial complex is called partitionable if one can do with many 012-rows (yet other terminology is used). The relevance of this concept has been indicated in 1.3. Here are two veins for further research. Can the methods in arXiv:1811.11689 (which concern shellability) be adapted to find necessary or sufficient conditions for the partitionability of a random simplicial complex given by its facets? Defining to be e-partitionable if is a disjoint union of many 012e-rows, is this notion strictly weaker than being partitionable?
4.4.2 Notice that -subsemilattices (e.g. arising from convex polytopes) are not easily compressed; it seems one needs an implicational base of the closure system , but calculating from is usually hard [W3, sec. 3.6]. Is there nevertheless a place for compression in the context of convex polytopes ? To answer, recall the two fundamental representations of . The H-representation views as an intersection of closed half-spaces, each one of which given by an inequality . TheV-representation gives the vertex set , viewing that is the convex hull of . Much research999In particular, various methods have been proposed to get the ’combinatorial’ facets (i.e the sets of incident vertices) from either the H- or the V-representation. has been devoted going from one kind of representation to the other. As to ’a place for compression’, given the H-representation of , there is hope to compress the set of interior 0,1-points (i.e. ) as a disjoint union of 012e-rows (work in progress).
5 Numerical experiments
Theorem 1 only implies (due to the disjointness of 012e-rows), yet the numerical experiments below show that often . The meaning of a random instance is as in 3.3.1 and 4.1.3. The number of final 012e-rows spawned by the Facets-To-Faces algorithm, and its running time in seconds are recorded. In our implementation of Facets-To-Faces the precaution to avoid duds (see the proof of Theorem 1) was omitted because for the instances in Table 7 its incorparation would outweigh the benefits. For instance the (50,240,20)-instance features 460631 final versus 13244 wasteful 012e-rows. In the other instances the proportion wasteful/final is even smaller. In all instances more than half of the final 012e-rows were proper, i.e. not 012-rows. In the (2000,70,192)-instance only 1157 out of 70551 many 012e-rows were improper.
After introducing the two competitors of Facets-To-Faces (5.1), we assess the three algorithms’ compression capabilities in 5.2. Then we compare with respect to CPU time (5.3), and finally with respect to memory requirements (5.4).
5.1 Mathematica uses BDD’s only behind the scenes, in particular for the command SatisfiabilityCount which outputs the number101010Once Facets-To-Faces terminates, the cardinality is easily determined (see 5.5), and it always coincided with the number produced by SatisfiabilityCount. Hence Facets-To-Faces very likely works correctly. of models of any Boolean function fed to it. Therefore I am grateful to Maximilian Vides, who helped me to access BDD’s via Python. Specifically, for each instance the matching antitone Boolean function was translated from Mathematica notation to Python notation, then fed to the Python command expr2bdd which calculates a BDD, then was calculated as described in 4.2.1. The first competitor expr2bdd always uses the natural default ordering of variables. This may be part of the explanation why it is much slower than SatisfiabilityCount. In any case, even if the BDD underlying SatisfiabilityCount, undergoes minimisation, it seems to induce an ESOP that compresses poorly111111 Much research has gone into optimizing the variable ordering in order to reduce (or even minimize) the size and whence the time to calculate it. However, there is little relation between and because it is the structure rather than the sheer number of nodes that determines . For instance for but for . It seems that no research has gone into optimizing the variable ordering to make small. It may well be that this is fruitless since competing methods will keep on compressing better. Likewise the compression achieved by Faces-To-Facets heavily depends on the order in which the facets are processed, and so far no research in this regard exists. We conclude: Facets-To-Faces is just as disadvantaged by the default ordering of faces as expr2bdd is by the default variable ordering. . Why else would Mathematica use a command which is not based on BDD’s (as confirmed by the author) to calculate an ESOP of a Boolean function? This command is BooleanConvert (option ’ESOP’), and it is the second competitor of Facets-To-Faces.
5.2 In Table 6 the parameter is the number of 012e-rows produced by Facets-To-Faces, is the number of 012-rows in the ESOP calculated with BooleanConvert, and was defined above. One sees that for all instances it holds that . Let us fix and and observe what happens when increases. If say , then and and . Likewise fixing and taking gives a similar picture, except that expr2bdd couldn’t finish in reasonable time. In brief, all else being equal, all three suffer from increasing , expr2bdd more than BooleanConvert, and BooleanConvert more than Facets-To-Faces. We leave it to the reader to draw conclusions (although the data is sparse) from fixing and increasing , respectively fixing and increasing .
5.3 What concerns CPU-times, the state of affairs is not so clear-cut. In a nutshell, the Facets-To-Faces algorithm dislikes many short facets, but likes few large facets. As to few but large facets, in such situations it may not only best the time of BooleanConvert but even SatisfiabilityCount: It took Facets-To-Faces seconds to squeeze faces (contained in 70 facets of size 300) into a mere 707518 many 012e-rows, whereas SatisfiabilityCount (which we only asked to count the faces) was aborted after fourteen hours. When there are many small facets (such as for ) then is smaller than , but stays higher. The fact that Facets-To-Faces is implemented in high-level Mathematica code, whereas BooleanConvert is hardwired, is only part of the explanation. Fortunately according to 4.3.1 Facets-To-Faces can be parallelized. Thus, simply put, Facets-To-Faces can be accelerated by any factor , provided one is fit to ’control’ a network of colleagues who lend their PC’s.
5.4 Not only is always larger than , the Mathematica command MemoryInUse (whatever its units) shows that BooleanConvert is also more memory-intensive than the Facets-To-Faces algorithm. For example, in a random instance of type the before/after measurements were and for Facets-To-Faces, but and for BooleanConvert. As to SatisfiabilityCount, in the (2000,70,192)-example it also started with a modest but ended with a hefty . This may be related to why and Timing[SatisfiabilityCount] were not reliable: In the instance, say, the claim was contradicted by a hand-stopped time of 410 seconds. In the instance the claim Timing[SatisfiabilityCount]=157 was contradicted by a hand-stopped time of 785 seconds. In contrast, for the Facets-To-Faces algorithm always matched the hand-stopped time.
5.5 From the fancy ESOP calculated by Facets-To-Faces (say in time ) one can compute all face-numbers in a fraction of . This method may even beat e+rp+sub. Which method excels depends on the number and structure of facets and needs further investigation. For instance, e+rp+sub was slightly faster on the (50,1000,10)-example (901 seconds) but much slower on the (2000,70,192)-example which was stopped after an hour.
[TABLE]
Table 6: Facets-To-Faces versus BooleanConvert and expr2bdd
6 Two ways to enumerate from the facets of
From among the four tasks listed in 1.1, only remains to be dealt with. We offer two methods. Method 1 (in 6.1) is faster, but Method 2 (in 6.2) boasts a theoretic assessment.
6.1 In what follows any representation of as disjoint union of 012e-rows can be used as prerequisite for a compressed enumeration of . For instance, the output of the Facets-To-Faces algorithm in Table 5 would do, but for variation121212This representation actually stems from a variant of Faces-To-Facets discussed in Section 8. we chose to illustrate (see Table 7) our method on the representation ; this equality can be verified ad hoc131313 Note that as it must. It thus suffices to show that each is contained in , which is easy..
Additionally to the e-wildcard we now need the g-wildcard which means ’exactly many 1’s here’. Accordingly 01g-rows are defined, e.g. is
. Distinct g-wildcards within a 01g-row are distinguished by subscripts. Note that must be strictly smaller than the number of symbols because is impossible, and instead of we stick to .
6.1.1 We now describe the g-algorithm which, given and , represents as a disjoint union of 01g-rows. To fix ideas, let us target . The subset of all -faces (=bitstrings of Hamming weight 3) can be written as (Table 7). Expressing similarly is a bit subtler. But writing one sees that the sets of 3-faces in the first and second part of are and respectively. Likewise one verifies that .
[TABLE]
Table 7: Compressing with the -algorithm
What happens if, other than in Table 7, the -rows that constitute feature several -wildcards per row? For instance if
(16)
how can one represent as disjoint union of preferably few -rows? Although even one-by-one enumeration of is non-trivial (this type of task is solvable in output-linear time [W2, sec. 3.2]), proceeding differently one can actually get a compressed enumeration. This is carried out (on a dual example) in a previous version of our article [arXiv:1812.02570v2, sec. 3.3.2], and it is fairly clear that matters generalize.
6.1.2 More important than giving a formal proof of ’fairly clear’ is another issue. Suppose our target had been , not . Then . Generally in the worst case a representation of by disjoint 012e-rows might be such that of all rows have . Although this empty-row-issue prevents a (neat) theoretic assessment of the -algorithm, this does not preclude a good performance in practise.
6.2 Here we fine-tune the Second Naive Algorithm of 4.1.2 from outputting all faces to outputting all -faces.
Theorem 2: Suppose the facets of the simplicial complex are given. Then for any fixed the many -faces can be enumerated in time .
Proof. Starting with one maintains an oscillating (LIFO) stack of -feasible -rows (i.e. ) until the stack is emptied. The topmost row of the stack is always processed as follows. Let and be the rows obtained from by turning its first 2 to 0 and 1 respectively. (Here ’first’ refers to some previosly fixed linear ordering of the indices .) Row is -feasible iff for at least one facet one has
Likewise for . At least one of and is -feasible because is -feasible and . The feasible row(s) is (are) put back on the stack. That is unless (say) is a bitstring, i.e. twos. In this case we found a -face , which is output.
As to the cost, creating from and recycling at least one of them to the stack, costs . Each output -face has at most recycled ancestors. It follows that the overall cost is .
6.3 Notwithstanding Theorem 2, in practise Method 2 (which like SNA in 4.1.3 suffers mediocre compression) is often inferior to Method 1 (which laughs away its empty-row-issue). At this point the author may be forgiven for reflecting more broadly about one-by-one, compression, and optimization. As mentioned in 4.1.1 enumeration of a powerset (one-by-one) is non-trivial. Even more so enumerating all -sets of a set [K, sec. 7.1.1.3]. Apart from the fun of it, arguably the only purpose of enumerating all objects of a given type, is to find the best object (e.g. one or all -minimal object(s) with respect to a target function ). Compression with multi-valued rows (be it 012e, 01g, or other kinds) serves that purpose better than one-by-one enumeration. If say , then the -minimal set within the 01g-row below is quickly found to be (for brevity ):
[TABLE]
More about the interplay of compression and optimization can be found in [W4].
7 Assessing from its minimal non-faces
Our results on will be adapted (in that order) to the situation where not the faces but the minimal non-faces of a simplicial complex are given. For instance let be the family of all independent sets of a matroid. Then the facets are the bases and the minimal non-faces are the circuits of the matroid. Dual to the -widcard the -wildcard means ’at least one 0 here’, and 012n-rows are defined dually to 012e-rows. Generally if is a -row with and with many -wildcards of length respectively, then
(18) .
If is any hypergraph, then is a -noncover if for all . The (noncover) n-algorithm of [W1] displays the set of all -noncovers as a disjoint union of 012n-rows. (More details about the -algorithm follow in the proof of Theorem 3.)
**7.1 ** Here we settle (E). Suppose that was given not by its facets listed in (1), but by its minimal nonfaces, which are these:
(19) ,
,
.
For instance, is not a subset of any in (1), but each 2-element subset of is contained in some . Hence is a minimal non-face. Generally, let be given by its minimal nonfaces , and so forth. It then holds that
(20) .
From the first equivalence it follows that where . Applying the -algorithm to delivers as a disjoint union of the -rows in Table 8. (Incidently only is a proper 012n-row.)
[TABLE]
Table 8: Compressing with the noncover -algorithm
Theorem 3: Assume the minimal non-faces of the simplicial complex are known. Then can be represented as a disjoint union of many -rows in polynomial total time .
Proof. The minimal non-faces in (17) suggest to view (or any as the model set of the Boolean function141414Because of we have , despite appearances.
(21)
This is a Horn-CNF since each clause has at most one positive literal (in fact none). Generally, if is a Horn-CNF with clauses then the Horn--algorithm of [W1, Cor.6] enumerates as a union of many disjoint -rows in total polynomial time .
When the Horn-CNF has only negative clauses, the Horn -algorithm simplifies and was called ’noncover -algorithm’ in [W1]. The impression from (20) that the noncover n-algorithm is related to the transversal e-algorithm is justified; in fact a moment’s thought reveals that upon switching the roles of 0 and 1 the n-algorithm becomes the e-algorithm, and vice versa.
One application of Theorem 3 was alluded to in Section 1.3: From a knowledge of all minimal infrequent sets, one can compress the simplicial complex of all frequent sets.
7.1.2 As to problem , i.e. the enumeration of all -faces from the minimal non-faces, this can be handled by applying the (dual) -algorithm to the individual 012n-rows in Table 8. Trouble is, as in 6.1 this does not yield a polynomial total time algorithm because of the empty-row-issue. It remains an open question whether the analogon of Theorem 2 holds. More precisely by ’analogon’ we mean the statement that ensues from Theorem 2 when the part ’the facets’ is replaced by ’the minimal non-faces’. The problem is that (17) does not translate smoothly from facets to minimal non-faces .
7.2 As to the counting problem (C), the cardinality of is readily obtained from Table 8:
(22)
As to problem , each face-number of can be calculated from Table 8 by matching each 012n-row with some auxiliary polynomial, akin to 3.3. We hence call this method n+rp+sub.
**7.3 ** Hypergraph Dualization (HD) is the task to calculate the set of all minimal transversals of a hypergraph . This has plenty applications. As to HD in the present situation, let be a simplicial complex. Then by (20), the complements of its facets ’s are exactly the minimal transversals of its minimal non-faces ’s, and vice versa. Thus if HD was easy, one could switch back and forth between the ’s and ’s at one’s convenience; for instance discarding the seventeen ’s in (19) in favor of the five ’s in (1).
Unfortunately HD is hard. Despite partial successes it remains an open problem whether HD can be solved in polynomial total time. We stress that the e-algorithm computes the set of all transversals. Extra work151515This can actually be done, not in total polynomial time, but whilst maintaining compression to some degree; this is work in progress, arXiv:2008.08996. In one special case this worked particularly well: If is the set of all minimal cutsets of a graph , then is the set of all connected edge-sets. In arXiv:2002.09707 it is shown how the family of all trees can be compressed. is required to ’sieve’ from .
8 Can one go from simplicial complexes to general DNFs ?
Suppose has facets to , and by induction we have obtained for some a type (18) representation. In Section 8 we handle the newcomer -row in dual fashion:
(23)
We keep the notation for , and refer to Table 9 for the definition of . Furthermore, put say . Based on (23) our Tentative Facets-To-Faces algorithm proceeds as follows in our toy example :
\begin{array}[]{rll}r_{1}\cup r_{2}=r_{1}\uplus(r_{2}\setminus r_{1})&=:&r_{1}\uplus r^{\prime}_{6}\\ \\ r_{1}\uplus r^{\prime}_{6}\uplus(r_{3}\setminus(r_{1}\uplus r^{\prime}_{6}))=r_{1}\uplus r^{\prime}_{6}\uplus(r_{3}\setminus r_{1}\setminus r^{\prime}_{6})&=:&r_{1}\uplus r^{\prime}_{6}\uplus r^{\prime}_{7}\\ \\ r_{1}\uplus r^{\prime}_{6}\uplus r^{\prime}_{7}\uplus(r_{4}\setminus r_{1}\setminus r^{\prime}_{6}\setminus r^{\prime}_{7})&=:&r_{1}\uplus r^{\prime}_{6}\uplus r^{\prime}_{7}\uplus r^{\prime}_{8}\\ \\ r_{1}\uplus r^{\prime}_{6}\uplus r^{\prime}_{7}\uplus r^{\prime}_{8}\uplus(r_{5}\setminus r_{1}\setminus r^{\prime}_{6}\setminus r^{\prime}_{7}\setminus r^{\prime}_{8})&=:&r_{1}\uplus r^{\prime}_{6}\uplus r^{\prime}_{7}\uplus r^{\prime}_{8}\uplus\rho^{\prime}_{1}\uplus\rho^{\prime}_{2}\\ \end{array}
Note that is disjoint from and , and hence . Likewise being disjoint from and implies . The detachment of from is of type as opposed to in Section 4. Before we look at type detachments more systematically we argue ad hoc as follows. Since are contained in , and are mutually disjoint, and their cardinalities sum up to , it follows that . One checks that
(24)
which matches the cardinality (which we previously derived in various ways).
[TABLE]
Table 9: Compressing with a Tentative Facets-To-Faces algorithm
8.1 We saw that initial detachments can quickly ’deteriorate’ to detachments such as . While was handled ad hoc, let us now dig deeper. Namely, by definition means ’at least one 1 and at least one 0 here’. Let and be as in Table 10. With our new wildcard the row difference can be neatly expressed as . Indeed, clearly . If there was with then leads to the contradiction .
Table 10: Using the wildcard to recompress
As appealing as this may look, the downside is that embracing -rows forces us to cope with detachments of type . Table 11 must suffice as indication that things do not get out of hand. The verification that indeed is left to the dedicated reader.
[TABLE]
Table 11 : Recompression of a set difference of type
Once detachments are mastered, any DNF can be transformed to a fancy ESOP that uses -rows. Given a CNF instead of a DNF, a wholly different method to transform the CNF to a fancy ESOP (using 012n-rows) is presented in [W4].
References
- [BM]
J.D. Boissonnat, C. Maria, The simplex-tree: An efficient data structure for general simplicial complexes, Algorithmica 70 (2014) 406-427. 2. [BN]
M.O. Ball, G.L. Nemhauser, Matroids and reliability analysis problem, Math. of Oper. Res. 4 (1979) 132-143. 3. [DKM]
A.M. Duval, J. Klivans, J.L. Martin, The partitionability conjecture, Notices of the AMS 64 (2017) 117-122. 4. [FR]
K. Fukuda, V. Rosta, Combinatorial face enumeration in convex polytopes, Computational Geometry 4 (1994) 191-198. 5. [GO]
B. Ganter, S. Obiedkov, Conceptual Exploration, Springer 2016. 6. [KP]
V. Kaibel, M.E. Pfetsch, Computing the face lattice of a polytype from its vertex-facet incidences, Computational Geometry 23 (2002) 281-290. 7. [K]
D. Knuth, The Art of Computer Programming, Volume 4A, Addison-Wesley 2011. 8. [V]
L.G. Valiant, The complexity of enumeration and reliability problems, SIAM J. Comput. 8 (1979) 410-421. 9. [W1]
M. Wild, Compactly generating all satisfying truth assignments of a Horn formula, J. Satisf. Boolean Model. Comput. 8 (2012) 63-82. 10. [W2]
M. Wild, Counting or producing all fixed cardinality transversals, Algorithmica 69 (2014) 117-129. 11. [W3]
M. Wild, The joy of implications, aka pure Horn formulas: Mainly a survey, Theoretical Computer Science 658 (2017) 264-292. 12. [W4]
M. Wild, Compression with wildcards: From CNFs to orthogonal DNFs by imposing the clauses one-by-one, to appear in The Computer Journal. 13. [WK]
X. Wu, V. Kumar, The top ten algorithms in data mining, Chapman and Hall 2009. .
