Smallest and Largest Block Palindrome Factorizations
Daniel Gabric, Jeffrey Shallit

TL;DR
This paper investigates the properties of block palindrome factorizations in words, providing recurrence relations, expected value analysis, and extremal case studies, revealing structural insights and connections to borders in words.
Contribution
It introduces new recurrence formulas, analyzes the expected width of largest BP-factorizations, and explores the relationship between smallest and largest BP-factorizations and word borders.
Findings
Recurrence for the number of words with a given largest BP-factorization width.
Expected width of largest BP-factorization tends to a constant.
Connection between words with a unique border and coinciding smallest and largest BP-factorizations.
Abstract
A \emph{palindrome} is a word that reads the same forwards and backwards. A \emph{block palindrome factorization} (or \emph{BP-factorization}) is a factorization of a word into blocks that becomes palindrome if each identical block is replaced by a distinct symbol. We call the number of blocks in a BP-factorization the \emph{width} of the BP-factorization. The \emph{largest BP-factorization} of a word is the BP-factorization of with the maximum width. We study words with certain BP-factorizations. First, we give a recurrence for the number of length- words with largest BP-factorization of width . Second, we show that the expected width of the largest BP-factorization of a word tends to a constant. Third, we give some results on another extremal variation of BP-factorization, the \emph{smallest BP-factorization}. A \emph{border} of a word is a non-empty word that is…
| 10 | 284 | 12 | 224 | 40 | 168 | 72 | 96 | 64 | 32 | 32 |
|---|---|---|---|---|---|---|---|---|---|---|
| 11 | 568 | 0 | 472 | 0 | 416 | 0 | 336 | 0 | 192 | 0 |
| 12 | 1116 | 20 | 856 | 88 | 656 | 176 | 448 | 224 | 224 | 160 |
| 13 | 2232 | 0 | 1752 | 0 | 1488 | 0 | 1248 | 0 | 896 | 0 |
| 14 | 4424 | 40 | 3328 | 176 | 2544 | 432 | 1856 | 640 | 1152 | 640 |
| 15 | 8848 | 0 | 6736 | 0 | 5440 | 0 | 4576 | 0 | 3584 | 0 |
| 16 | 17622 | 74 | 13100 | 372 | 9896 | 984 | 7408 | 1744 | 5088 | 2080 |
| 17 | 35244 | 0 | 26348 | 0 | 20536 | 0 | 16784 | 0 | 13664 | 0 |
| 18 | 70340 | 148 | 51936 | 760 | 38824 | 2248 | 29152 | 4416 | 21088 | 6240 |
| 19 | 140680 | 0 | 104168 | 0 | 79168 | 0 | 62800 | 0 | 51008 | 0 |
| 20 | 281076 | 284 | 206744 | 1592 | 153344 | 4992 | 114688 | 10912 | 84704 | 17312 |
| 2 | 6.4686 |
|---|---|
| 3 | 2.5908 |
| 4 | 1.9080 |
| 5 | 1.6314 |
| 6 | 1.4827 |
| 7 | 1.3902 |
| 8 | 1.3272 |
| 9 | 1.2817 |
| 10 | 1.2472 |
| ⋮ | ⋮ |
| 100 | 1.0204 |
| 2 | 0.5155 |
|---|---|
| 3 | 0.3910 |
| 4 | 0.2922 |
| 5 | 0.2302 |
| 6 | 0.1890 |
| 7 | 0.1599 |
| 8 | 0.1384 |
| 9 | 0.1219 |
| 10 | 0.1089 |
| ⋮ | ⋮ |
| 100 | 0.0101 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topicssemigroups and automata theory · Coding theory and cryptography · DNA and Biological Computing
Smallest and Largest Block Palindrome Factorizations
Daniel Gabric111 Department of Math/Stats, University of Winnipeg, Winnipeg, MB R3B 2E9, Canada; [email protected]. and Jeffrey Shallit222School of Computer Science, University of Waterloo, Waterloo, ON N2L 3G1, Canada; [email protected].
Abstract
A palindrome is a word that reads the same forwards and backwards. A block palindrome factorization (or BP-factorization) is a factorization of a word into blocks that becomes palindrome if each identical block is replaced by a distinct symbol. We call the number of blocks in a BP-factorization the width of the BP-factorization. The largest BP-factorization of a word is the BP-factorization of with the maximum width. We study words with certain BP-factorizations. First, we give a recurrence for the number of length- words with largest BP-factorization of width . Second, we show that the expected width of the largest BP-factorization of a word tends to a constant. Third, we give some results on another extremal variation of BP-factorization, the smallest BP-factorization. A border of a word is a non-empty word that is both a proper prefix and suffix of . Finally, we conclude by showing a connection between words with a unique border and words whose smallest and largest BP-factorizations coincide.
1 Introduction
Let denote the alphabet . The length of a word is denoted by . A border of a word is a non-empty word that is both a proper prefix and suffix of . A word is said to be bordered if it has a border. Otherwise, the word is said to be unbordered. For example, the French word entente is bordered, and has two borders, namely ente and e.
It is well-known [1] that the number of length- unbordered words over satisfies
[TABLE]
A palindrome is a word that reads the same forwards as it does backwards. More formally, letting where and all are symbols, a palindrome is a word such that . The definition of a palindrome is quite restrictive. The second half of a palindrome is fully determined by the first half. Thus, compared to all length- words, the number of length- palindromes is vanishingly small. But many words exhibit palindrome-like structure. For example, take the English word . It is clearly not a palindrome, but it comes close. Replacing the block with a single letter turns the word into a palindrome. In this paper, we consider a generalization of palindromes that incorporates this kind of palindromic structure.
In the 2015 British Olympiad [2], the concept of a block palindrome factorization was first introduced. Let be a non-empty word. A block palindrome factorization (or BP-factorization) of is a factorization of a word such that is a possibly empty word, and every other factor is non-empty for all with . We say that a BP-factorization is of width where if is non-empty and otherwise. In other words, the width of a BP-factorization is the number of non-empty blocks in the factorization. The largest BP-factorization333Largest BP-factorizations also appear in https://www.reddit.com/r/math/comments/ga2iyo/i_just_defined_the_palindromity_function_on/. [3] of a word is a BP-factorization where is maximized (i.e., where the width of the BP-factorization is maximized). See [4, 5] for more on the topic of BP-factorizations and block reversals. Kolpakov and Kucherov [6] studied a special case of BP-factorizations, the gapped palindrome. If is non-empty and for all with , then is said to be a gapped palindrome. Régnier [7] studied something similar to BP-factorizations, but in her paper she was concerned with borders of borders. See [8, 9] for results on factoring words into palindromes.
Example 1**.**
We use the centre dot to denote the separation between blocks in the BP-factorization of a word.
Consider the word abracadabra. It has the following BP-factorizations:
[TABLE]
The last BP-factorization is of width and has the longest width; thus it is the largest BP-factorization of abracadabra.
Let be a length- word. Suppose is the largest BP-factorization of . Goto et al. [3] showed that is the shortest border of where . This means that we can compute the largest BP-factorization of by greedily “peeling off” the shortest borders of central factors until you hit an unbordered word or the empty word.
The rest of the paper is structured as follows. In Section 2 we give a recurrence for the number of length- words with largest BP-factorization of width . In Section 3 we show that the expected width of the largest BP-factorization of a length- word tends to a constant. In Section 4 we consider smallest BP-factorizations in the sense that one “peels off” the longest non-overlapping border. We say a border of a word is non-overlapping if ; otherwise is overlapping. Finally, in Section 5 we present some results on words with a unique border and show that they are connected to words whose smallest and largest BP-factorizations are the same.
2 Counting largest BP-factorizations
In this section, we prove a recurrence for the number of length- words over with largest BP-factorization of width . See Table 1 for sample values of for small , . For the following theorem, recall the definition of from Equation 1.
Theorem 2**.**
Let , and be integers. Then
[TABLE]
where
[TABLE]
Proof.
Let be a length- word whose largest BP-factorization is of width . Clearly . We know that each block in a largest BP-factorization is unbordered, since each block is a shortest border of some central factor. This immediately implies and .
Now we take care of the other cases.
- •
Suppose , are even. Then by removing both instances of from , we get , which is a length- word whose largest BP-factorization is of width . This mapping is clearly reversible, since all blocks in a largest BP-factorization are unbordered, including . Thus summing over all possible and all length- words with largest BP-factorization of width we have
[TABLE]
- •
Suppose is even and is odd. Then by removing from , we get , which is a length- word whose largest BP-factorization is of width . This mapping is reversible for the same reason as in the previous case. The word is of even length since . Since is even and is even, we must have that is even as well. Thus summing over all possible and all length- words with largest BP-factorization of width , we have
[TABLE]
- •
Suppose is odd and is even. Then the length of is , which is even, a contradiction. Thus .
- •
Suppose , are odd. Then by removing from , we get , which is a length- word whose largest BP-factorization is of width . This mapping is reversible for the same reasons as in the previous cases. Since is odd and is even (proved in the previous case), we must have that is odd. Thus summing over all possible and all length- words with largest BP-factorization of width , we have
[TABLE]
∎
3 Expected width of largest BP-factorization
In this section, we show that the expected width of the largest BP-factorization of a length- word over is bounded by a constant. From the definition of expected value, it follows that
[TABLE]
Table 2 shows the behaviour of as increases.
Lemma 3**.**
Let and be integers. Then
[TABLE]
Proof.
Let be a length- word whose largest BP-factorization is of width . Since is non-empty for every , we have that . So
[TABLE]
for all . ∎
Theorem 4**.**
The limit exists for all .
Proof.
Follows from the definition of , Lemma 3, and the direct comparison test for convergence. ∎
Interpreting as a power series in , we empirically observe that is approximately equal to
[TABLE]
We conjecture the following about .
Conjecture 5**.**
Let . Then
[TABLE]
where the sequence is A274199 in the On-Line Encyclopedia of Integer Sequences (OEIS) [10].
Cording et al. [11] proved that the expected length of the longest unbordered factor in a word is . Taking this into account, it is not surprising that the expected length of the largest BP-factorization of a word tends to a constant.
4 Smallest BP-factorization
A word , seen as a block, clearly satisfies the definition of a BP-factorization. Thus, taken literally, the smallest BP-factorization for all words is of width . But this is not very interesting, so we consider a different definition instead. A border of a word is non-overlapping if ; otherwise is overlapping. We say that the smallest BP-factorization of a word is a BP-factorization where each is the longest non-overlapping border of , except , which is either empty or unbordered. For example, going back to Example 1, the smallest BP-factorization of is and the smallest BP-factorization of is .
A natural question to ask is: what is the maximum possible width of the smallest BP-factorization of a length- word? Through empirical observation, we arrive at the following conjectures:
- •
We have for with and .
- •
We have for .
To calculate , two things are needed: an upper bound on , and words that witness the upper bound.
Theorem 6**.**
Let be an integer. Then for with and .
Proof.
Let be an integer. We start by proving lower bounds on . Suppose for some . Then the width of the smallest BP-factorization of
[TABLE]
is , so . To see this, notice that the smallest BP-factorization of is , and therefore is of width . Suppose for some with . Then one can take and insert either [math], , , , , , or to the middle of the word to get the desired length.
Now we prove upper bounds on . Let be a positive integer. Let be a length- word whose largest BP-factorization is of width . One can readily verify that , , , , , and through exhaustive search of all binary words of length . Suppose , so . Then we can write where . It is easy to show that by checking that all binary words of length do not admit a smallest BP-factorization of width . In the worst case, we can peel off prefixes and suffixes of length while accounting for the blocks they add to the BP-factorization until we hit the middle core of length . Thus, we have where is the width of the smallest BP-factorization of the middle core, which is of length . We have already computed for , so the upper bounds follow. ∎
Theorem 7**.**
Let and be integers. Then .
Proof.
Clearly . We prove . If is divisible by , then consider the word . If is not divisible by , then take and insert either [math], , , , or in the middle of the word. When calculating the smallest BP-factorization of the resulting words, it is easy to see that at each step we are removing a border of length . Thus, their largest BP-factorization is of width . ∎
5 Equal smallest and largest BP-factorizations
Recall back to Example 1, that has distinct smallest and largest BP-factorizations, namely and . However, the word has the same smallest and largest BP-factorizations, namely . Under what conditions are the smallest and largest BP-factorizations of a word the same? Looking at unique borders seems like a good place to start, since the shortest border and longest non-overlapping border coincide when a word has a unique border. However, the converse is not true—just consider the previous example . The shortest border and longest non-overlapping border are both , but is not a unique border of .
In Theorem 8 we characterize all words whose smallest and largest BP-factorization coincide.
Theorem 8**.**
Let be integers. Let be a word with smallest BP-factorization and largest BP-factorization . Then and for all , if and only if for all , , we have that is the unique border of and for we have that either
* is the unique border of , or* 2. 2.
* where is the unique border of .*
Proof.
Let be an integer such that . Let . Since is both the shortest border and longest non-overlapping border of (i.e., ), we have that has exactly one border of length . Thus, either is the unique border of , or has a border of length . If is the unique border of , then we are done. So suppose that has a border of length . Let be the shortest such border. We have that is both a prefix and suffix of . In fact, must be the unique border of . Otherwise we contradict the minimality of , or the assumption that is both the shortest border and longest non-overlapping border of . Since is unbordered, it cannot overlap itself in and . So we can write for some word where , or such that . If , then we see that is a suffix of and is a prefix of , implying that is a new smaller border of . This either contradicts the assumption that is the shortest border of length , or the assumption that has exactly one border of length . Thus, we have that . The shortest border and longest non-overlapping border of must be , by assumption. Additionally, is unbordered, so is of width and . This implies that and .
Let be an integer such that . We omit the case when , since proving for all other is sufficient. Since is the unique border of , we have that the shortest border and longest non-overlapping border of is . In other words, we have that . Suppose and where is the unique border of . Since is the unique border of , it is also the shortest border of . Additionally, the next longest border of is , which is overlapping. So is also the longest non-overlapping border of . Thus . ∎
Just based on this characterization, finding a recurrence for the number of words with a coinciding smallest and largest BP-factorization seems hard. So we turn to a different, related problem: counting the number of words with a unique border.
5.1 Unique borders
Harju and Nowotka [12] counted the number of length- words over with a unique border, and the number of length- words over with a length- unique border. However, through personal communication with the authors, a small error in one of the proofs leading up to their formula for was discovered. Thus, the formula for as stated in their paper is incorrect. In this section, we present the correct recurrence for the number of length- words with a length- unique border. We also show that the probability a length- word has a unique border tends to a constant. See A334600 in the OEIS [10] for the sequence .
Suppose is a word with a unique border . Then must be unbordered, and must not exceed half the length of . If either of these were not true, then would have more than one border. By combining these ideas, we get Theorem 9 and Theorem 10.
Theorem 9**.**
Let be integers. Then the number of length- words with a unique length- border satisfies the recurrence
[TABLE]
Proof.
Let be a length- word with a unique length- border . Since is the unique border of , it is unbordered. Thus, we can write for some (possibly empty) word . For , we have that since is unbordered and thus cannot overlap itself in .
Suppose . Let denote the number of length- words that have a length- unbordered border and have a border of length . Clearly . Suppose has another border of length . Furthermore, suppose that there is no other border with . Then is the unique border of . Since is the shortest border, we have . But we could possibly have . The only possible way for to exceed is if for some (possibly empty) word . But this is only possible if is even; otherwise we cannot place in the centre of . When is odd, we compute by summing over all possibilities for (i.e., ) and the middle part of (i.e., where ). This gives us the recurrence,
[TABLE]
When is even, we compute in the same fashion, except we also include the case where . This gives us the recurrence,
[TABLE]
∎
Theorem 10**.**
Let be an integer. Then the number of length- words with a unique border is
[TABLE]
5.2 Limiting values
We show that the probability that a random word of length has a unique border tends to a constant. Table 3 shows the behaviour of this probability as increases.
Let be the probability that a random word of length has a unique border. Then
[TABLE]
Lemma 11**.**
Let and be integers. Then
[TABLE]
Proof.
Let be a length- word. Suppose has a unique border of length . Since , we can write for some words and where . But this means that , and the lemma follows. ∎
Theorem 12**.**
Let be an integer. Then the limit exists.
Proof.
Follows from the definition of , Lemma 11, and the direct comparison test for convergence. ∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] P. T. Nielsen. A note on bifix-free sequences. IEEE Trans. Inform. Theory , IT-19:704–706, 1973.
- 2[2] The 2015 British Informatics Olympiad (Round 1 Question 1). https://olympiad.org.uk/2015/index.html .
- 3[3] K. Goto, I. Tomohiro, H. Bannai, and S. Inenaga. Block palindromes: A new generalization of palindromes. In T. Gagie, A. Moffat, G. Navarro, and E. Cuadros-Vargas, editors, String Processing and Information Retrieval , volume 11147 of Lecture Notes in Computer Science , pages 183–190, Cham, 2018. Springer International Publishing.
- 4[4] K. Mahalingam, A. Maity, P. Pandoh, and R. Raghavan. Block reversal on finite words. Theoret. Comput. Sci. , 894:135–151, 2021.
- 5[5] K. Mahalingam, A. Maity, and P. Pandoh. Rich words in the block reversal of a word, 2023. ar Xiv:2302.02109.
- 6[6] R. Kolpakov and G. Kucherov. Searching for gapped palindromes. Theoret. Comput. Sci. , 410(51):5365–5373, 2009.
- 7[7] M. Régnier. Enumeration of bordered words, le langage de la vache-qui-rit. RAIRO-Theor. Inf. Appl. , 26(4):303–317, 1992.
- 8[8] A. E. Frid, S. Puzynina, and L. Q. Zamboni. On palindromic factorization of words. Adv. in Appl. Math. , 50(5):737–748, 2013.
