In Praise of Sequence (Co-)Algebra and its implementation in Haskell

Kieran Clenaghan

arXiv:1812.05878·math.CO·March 1, 2019

In Praise of Sequence (Co-)Algebra and its implementation in Haskell

Kieran Clenaghan

PDF

Open Access

TL;DR

This paper introduces sequence algebra, demonstrating its foundational role across mathematics and computer science, and provides a comprehensive Haskell implementation to facilitate understanding and experimentation.

Contribution

It consolidates various sequence algebra concepts into one accessible overview and offers a complete Haskell implementation as a teaching and experimentation tool.

Findings

01

Sequence algebra is foundational across multiple disciplines.

02

A complete Haskell implementation is provided for experimentation.

03

The paper invites broader mathematical engagement with sequence algebra.

Abstract

What is Sequence Algebra? This is a question that any teacher or student of mathematics or computer science can engage with. Sequences are in Calculus, Combinatorics, Statistics and Computation. They are foundational, a step up from number arithmetic. Sequence operations are easy to implement from scratch (in Haskell) and afford a wide variety of testing and experimentation. When bits and pieces of sequence algebra are pulled together from the literature, there emerges a claim for status as a substantial pre-analysis topic. Here we set the stage by bringing together a variety of sequence algebra concepts for the first time in one paper. This provides a novel economical overview, intended to invite a broad mathematical audience to cast an eye over the subject. A complete, yet succinct, basic implementation of sequence operations is presented, ready to play with. The implementation also…

Tables7

Table 1. Table 1: differentiation rules

\begin{matrix} 1 . & D ​ a & = & 0 & constant \\ 2 . & D ​ x & = & 1 & variable \\ 3 . & D ​ (f + g) & = & D ​ f + D ​ g & sum \\ 4 . & D ​ (f ​ g) & = & (D ​ f) ​ g + f ​ (D ​ g) & product \\ 5 . & D ​ (f^{- 1}) & = & (- D ​ f) / f^{2} & reciprocal \\ 6 . & D ​ (f / g) & = & ((D ​ f) ​ g - f ​ (D ​ g)) / g^{2} & quotient \\ 7 . & D ​ (f^{\circ}) & = & {((D ​ f) \circ f^{\circ})}^{- 1} & converse \\ 8 . & D ​ (f^{n}) & = & n ​ (D ​ f) ​ f^{n - 1} & power \\ 9 . & D ​ (a_{n} ​ x^{n}) & = & n ​ a_{n} ​ x^{n - 1} & monomial \\ 10 . & [x^{n}] ​ D & = & (n + 1) ​ [x^{n + 1}] & power-series \\ 11 . & D ​ (f \circ g) & = & ((D ​ f) \circ g) ​ (D ​ g) & composition \\ 12 . & [x^{n}] ​ f & = & \frac{1}{n!} ​ [x^{0}] ​ D^{n} ​ f & Maclaurin \end{matrix}

Table 2. Table 2: integration rules

\begin{matrix} 1 . & \int a_{n} ​ x^{n} & = & \frac{a_{n}}{n + 1} ​ x^{n + 1} & monomial \\ 2 . & \int (a ​ f + b ​ g) & = & a ​ \int f + b ​ \int g & linear \\ 3 . & (n + 1) ​ [x^{n + 1}] ​ \int & = & [x^{n}] & power series \\ 4 . & \int ((D ​ f) \circ g) ​ a ​ D ​ g & = & a ​ f \circ g + c & composition \\ 5 . & \int f ​ D ​ g & = & f ​ g - \int (D ​ f) ​ g + c & product (1) \\ 6 . & \int f ​ g & = & f ​ \int g - \int ((D ​ f) ​ \int g) & product (2) \\ 7 . & \int (D ​ f) ​ \int (D ​ h) + \int D ​ (f ​ h) & = & h ​ \int D ​ f + f ​ \int D ​ h & Baxter \end{matrix}

Table 3. Table 3: Rules relating to log \log and exp \exp

\begin{matrix} \exp \circ (\log f) & = & f & \exp converse \\ lgn \circ f & = & lgn \circ g & \Rightarrow & f = g & lgn cancellation \\ \exp \circ f & = & \exp \circ g & \Rightarrow & f = g & exp cancellation \\ \log f & = & \log g & \Rightarrow & f = g & log cancellation \\ D ​ (\log g) & = & \frac{D ​ g}{g} & log derivative \\ lgn \circ f & = & 0 & \Leftrightarrow & f = 0 & zero lgn \\ \log g & = & 0 & \Leftrightarrow & g = 1 & zero log \\ \log (f ​ g) & = & \log f + \log g & log product \\ \log f^{r} & = & r ​ \log f & log rational power & (r \in ℚ) \\ \exp \circ (f + g) & = & (\exp \circ f) ​ (\exp \circ g) & exp sum \\ \exp^{n} \circ f & = & \exp \circ n ​ f & power exp \\ f^{r} & = & \exp \circ (r ​ \log f) & general power defn & (r \in F) \\ g^{r} \circ h & = & {(g \circ h)}^{r} & general distributivity & (r \in F) \\ f^{r} ​ f^{s} & = & f^{r + s} & law of exponents & (r, s \in F) \\ D ​ f^{r} & = & r ​ f^{r - 1} ​ D ​ f & differential F -power & (r \in F) \end{matrix}

Table 4. Table 4: Some counting sequences

\begin{matrix} emptySet & = & 1 \\ singletonSet & = & x \\ singletonList & = & x \\ nonEmptyList & = & list - 1 \\ pluralList & = & list - singletonList - 1 \\ ordPair & = & x^{2} \\ fibonacci & = & list \circ (singletonList + ordPair) \\ cycle & = & \log x^{*} \\ oneCycle & = & x \\ oneOrTwoCycle & = & oneCycle + x^{2} / 2 \\ involution & = & set \circ oneOrTwoCycle \\ nonLoopCycle & = & cycle - singletonSet \\ derangement & = & set \circ nonLoopCycle \\ permutation & = & derangement * set \\ nonEmptySet & = & set - empty \\ pluralSet & = & nonEmptySet - singletonSet \\ setPartition & = & set \circ nonEmptySet \\ oddNumberOfParts & = & \sinh \circ nonEmptySet \\ evenSizedParts & = & set \circ (\cosh - 1) \\ catalanTree & = & x ​ (list \circ catalanTree) \\ cayleyTree & = & x ​ (set \circ cayleyTree) \\ connectedAcyclicGraph & = & cayleyTree - \frac{1}{2} ​ {cayleyTree}^{2} \\ acyclicGraph & = & set \circ connectedAcyclicGraph \\ motzkinTree & = & x ​ (1 + motzkinTree + {motzkinTree}^{2}) \\ hipparchusSchroeder & = & (1 + x - \sqrt{1 - 6 ​ x + x^{2}}) / 4 \\ largeSchroeder & = & 2 * hipparchusSchroeder / x - 1 \\ connectedMapping & = & cycle \circ cayleyTree \\ mapping & = & set \circ connectedMapping \\ fixedPointFree & = & set \circ nonLoopCycle \circ cayleyTree \\ idempotent & = & set \circ (oneCycle * set) \\ partialMapping & = & mapping * (set \circ cayleyTree) \\ surjection & = & list \circ nonEmptySet \\ zigzag & = & 2 ​ (\tan + \sec) \\ b ​ e ​ r ​ n ​ o ​ u ​ l ​ l ​ i & = & x / (\exp - 1) \end{matrix}

Table 5. Table 5: Some bivariate sequences

\begin{matrix} pascal & = & {(u + z)}^{*} \\ intComposition & = & list \circ (u * (nonEmptyList \circ z)) \\ schroeder & = & z + u * (pluralList \circ schroeder) \\ catalanLeaves & = & u * z + z * (nonEmptyList \circ catalanLeaves) \\ cayleyLeaves & = & u * z + z * (nonEmptySet \circ cayleyLeaves) \\ ebinom & = & set \circ (z + u ​ z) \\ cycles & = & set \circ (u * (cycle \circ z)) \\ parts & = & set \circ (u * (nonEmptySet \circ z)) \\ permFixedPts & = & (derangement \circ z) * (set \circ u ​ z) \\ zigzags & = & (\sin \circ u + \cos \circ u) / \cos \circ (u + z) \\ ascents & = & list \circ (z + (pluralSet \circ (u ​ z - z)) / (u - 1)) \\ valleys & = & \frac{\sqrt{1 - u}}{\sqrt{1 - u} - \tanh \circ (z ​ \sqrt{1 - u})} \\ powerSums & = & \frac{\exp \circ u ​ z - 1}{\exp \circ z - 1} \\ bernoulliPoly & = & \frac{z ​ \exp \circ u ​ z}{\exp \circ z - 1} \\ legendre & = & {(1 - 2 ​ u ​ z + z^{2})}^{- 1 / 2} \\ chebyshev & = & \frac{1 - u ​ z}{z^{2} - 2 ​ u ​ z + 1} \\ laguerre & = & \frac{1}{1 - z} ​ \exp \circ \frac{- u ​ z}{1 - z} \\ hermite & = & \exp \circ (2 ​ u ​ z - z^{2}) \\ meixner & = & {(1 + z^{2})}^{- 1 / 2} ​ \exp \circ (u ​ \arctan \circ z) \end{matrix}

Table 6. Table 6: Δ − Σ Δ Σ \Delta-\Sigma rules

\begin{matrix} 1 . & Δ ​ (s ⊙ t) & = & s ⊙ Δ ​ t + Δ ​ s ⊙ t^{'} & Δ -product (1) \\ 2 . & Δ ​ (s ⊙ t) & = & Δ ​ s ⊙ t + s ⊙ Δ ​ t + Δ ​ s ⊙ Δ ​ t & Δ -product (2) \\ 3 . & Σ ​ (s ⊙ Δ ​ t) & = & s ⊙ t - Σ ​ (Δ ​ s ⊙ t^{'}) - {(s ⊙ t)}_{0}^{∙} & Σ -product (1) \\ 4 . & Σ ​ (s^{'} ⊙ v) & = & s ⊙ (Σ ​ v) - Σ ​ (Δ ​ s ⊙ Σ ​ v) & Σ -product (2) \\ 5 . & Σ ​ Δ ​ s ⊙ Σ ​ Δ ​ u + Σ ​ Δ ​ (s ⊙ u) & = & (Σ ​ Δ ​ s) ⊙ u + s ⊙ (Σ ​ Δ ​ u) & Σ -Baxter rule \end{matrix}

Table 7. Table 7: Some core Sequences

\begin{matrix} lgnx    = integ (1/(1+x)) & D ​ lgn = 1 / (1 + x); {lgn}_{0} = 0 \\ sinx    = integ cosx & D ​ \sin = \cos; \sin_{0} = 0 \\ cosx    = 1-integ sinx & D ​ \cos = - \sin; \cos_{0} = 1 \\ tanx    = integ (1+tanx^2) & D ​ \tan = 1 + \tan^{2}; \tan_{0} = 0 \\ secx    = 1+integ (secx * tanx) & D ​ \sec = \sec ​ \tan; \sec_{0} = 1 \\ coshx   = 1+integ sinhx & D ​ \cosh = \sinh; \cosh_{0} = 1 \\ sinhx   = integ coshx & D ​ \sinh = \cosh; \sinh_{0} = 0 \\ tanhx   = integ (1-tanhx^2) & D ​ \tanh = 1 - \tanh^{2}; \tanh_{0} = 0 \\ gdx     = integ (1/coshx) & D ​ gd = 1 / \cosh; {gd}_{0} = 0 \end{matrix}

Equations197

f = i = 0 \sum \infty f_{i} x^{i} = i \sum f_{i} x^{i} = [f_{0}, f_{1}, f_{2}, \dots]

f = i = 0 \sum \infty f_{i} x^{i} = i \sum f_{i} x^{i} = [f_{0}, f_{1}, f_{2}, \dots]

[x^{n}](f+g)=[x^{n}]f+[x^{n}]g;\hskip 8.53581pt[x^{n}]cf=c[x^{n}]f\hskip 8.53581pt\hskip 8.53581pt\mbox{$c$ is a constant}

[x^{n}](f+g)=[x^{n}]f+[x^{n}]g;\hskip 8.53581pt[x^{n}]cf=c[x^{n}]f\hskip 8.53581pt\hskip 8.53581pt\mbox{$c$ is a constant}

[x^{n}] f g = k = 0 \sum n f_{k} g_{n - k}

[x^{n}] f g = k = 0 \sum n f_{k} g_{n - k}

[x^{0}] x f = 0; [x^{n}] x f = x_{1} f_{n - 1} = f_{n - 1}, \mbox f or n > 0

[x^{0}] x f = 0; [x^{n}] x f = x_{1} f_{n - 1} = f_{n - 1}, \mbox f or n > 0

f g = (f_{0} + x f^{'}) (g_{0} + x g^{'}) = f_{0} g_{0} + f_{0} x g^{'} + x f^{'} g = f_{0} g_{0} + x (f_{0} g^{'} + f^{'} g)

f g = (f_{0} + x f^{'}) (g_{0} + x g^{'}) = f_{0} g_{0} + f_{0} x g^{'} + x f^{'} g = f_{0} g_{0} + x (f_{0} g^{'} + f^{'} g)

f = f_{0} + \int D f D (\int f) = f

f = f_{0} + \int D f D (\int f) = f

D f \circ g

D f \circ g

(D f \circ g) D g

[x^{n}] (D f \circ g) D g

[x^{n + 1}] \mbox lgn

[x^{n + 1}] \mbox lgn

=

g^{r} \circ h

g^{r} \circ h

(cos + i sin)^{n} = (exp \circ i x)^{n} = exp \circ i x \circ n x = (cos + i sin) \circ n x = cos \circ n x + i (sin \circ n x)

(cos + i sin)^{n} = (exp \circ i x)^{n} = exp \circ i x \circ n x = (cos + i sin) \circ n x = cos \circ n x + i (sin \circ n x)

[x^{n}]fg=\sum_{k=0}^{n}\frac{f_{k}g_{n-k}}{k!(n-k)!}=\sum_{k=0}^{n}\left(\begin{array}[]{c}n\\ k\end{array}\right)f_{k}g_{n-k}/n!

[x^{n}]fg=\sum_{k=0}^{n}\frac{f_{k}g_{n-k}}{k!(n-k)!}=\sum_{k=0}^{n}\left(\begin{array}[]{c}n\\ k\end{array}\right)f_{k}g_{n-k}/n!

\mbox perm = \mbox set \circ \mbox cycle

\mbox perm = \mbox set \circ \mbox cycle

\mbox cycle = n > 0 \sum (n - 1)! \frac{x ^{n}}{n !} = n > 0 \sum \frac{1}{n} x^{n} = - \mbox lgn \circ (- x) = - lo g (1 - x) = lo g x^{*}

\mbox cycle = n > 0 \sum (n - 1)! \frac{x ^{n}}{n !} = n > 0 \sum \frac{1}{n} x^{n} = - \mbox lgn \circ (- x) = - lo g (1 - x) = lo g x^{*}

\mbox schroeder (z, u) = z + u * (\mbox pluralList \circ \mbox schroeder (z, u))

\mbox schroeder (z, u) = z + u * (\mbox pluralList \circ \mbox schroeder (z, u))

\begin{array}[]{rcl|rcl}(st)^{\prime}&=&s^{\prime}t+s_{0}t^{\prime}&(st)_{0}=s_{0}t_{0}\\ (s\otimes t)^{\prime}&=&s^{\prime}\otimes t+s\otimes t^{\prime}&(s\otimes t)_{0}=s_{0}t_{0}\end{array}

\begin{array}[]{rcl|rcl}(st)^{\prime}&=&s^{\prime}t+s_{0}t^{\prime}&(st)_{0}=s_{0}t_{0}\\ (s\otimes t)^{\prime}&=&s^{\prime}\otimes t+s\otimes t^{\prime}&(s\otimes t)_{0}=s_{0}t_{0}\end{array}

D^{n}(fg)=\sum_{k=0}^{n}\left(\begin{array}[]{c}n\\ k\end{array}\right)(D^{k}f)D^{n-k}g;\hskip 28.45274ptE^{n}(s\otimes t)=\sum_{k=0}^{n}\left(\begin{array}[]{c}n\\ k\end{array}\right)E^{k}s\otimes E^{n-k}t

D^{n}(fg)=\sum_{k=0}^{n}\left(\begin{array}[]{c}n\\ k\end{array}\right)(D^{k}f)D^{n-k}g;\hskip 28.45274ptE^{n}(s\otimes t)=\sum_{k=0}^{n}\left(\begin{array}[]{c}n\\ k\end{array}\right)E^{k}s\otimes E^{n-k}t

[x^{n}](s\otimes t)=\sum_{k=0}^{n}\left(\begin{array}[]{c}n\\ k\end{array}\right)([x^{0}]E^{k}s)\otimes([x^{0}]E^{n-k}t)=\sum_{k=0}^{n}\left(\begin{array}[]{c}n\\ k\end{array}\right)s_{k}t_{n-k}

[x^{n}](s\otimes t)=\sum_{k=0}^{n}\left(\begin{array}[]{c}n\\ k\end{array}\right)([x^{0}]E^{k}s)\otimes([x^{0}]E^{n-k}t)=\sum_{k=0}^{n}\left(\begin{array}[]{c}n\\ k\end{array}\right)s_{k}t_{n-k}

0 = 1^{'} = (s \otimes s^{- 1_{\otimes}})^{'} = s^{'} \otimes s^{- 1_{\otimes}} + s \otimes (s^{- 1_{\otimes}})^{'}; (s^{- 1_{\otimes}})^{'} = - s^{'} \otimes s^{- 1_{\otimes}} \otimes s^{- 1_{\otimes}}

0 = 1^{'} = (s \otimes s^{- 1_{\otimes}})^{'} = s^{'} \otimes s^{- 1_{\otimes}} + s \otimes (s^{- 1_{\otimes}})^{'}; (s^{- 1_{\otimes}})^{'} = - s^{'} \otimes s^{- 1_{\otimes}} \otimes s^{- 1_{\otimes}}

[x^{n}]x^{\otimes}=[x^{n}]x\otimes x^{\otimes}=\sum_{k=0}^{n}\left(\begin{array}[]{c}n\\ k\end{array}\right)[x^{k}]x[x^{n-k}]x^{\otimes}=\left(\begin{array}[]{c}n\\ 1\end{array}\right)[x^{n-1}]x^{\otimes}=n[x^{n-1}]x^{\otimes}

[x^{n}]x^{\otimes}=[x^{n}]x\otimes x^{\otimes}=\sum_{k=0}^{n}\left(\begin{array}[]{c}n\\ k\end{array}\right)[x^{k}]x[x^{n-k}]x^{\otimes}=\left(\begin{array}[]{c}n\\ 1\end{array}\right)[x^{n-1}]x^{\otimes}=n[x^{n-1}]x^{\otimes}

x^{\otimes} = 1 + x x^{\otimes} + x^{2} D x^{\otimes}

x^{\otimes} = 1 + x x^{\otimes} + x^{2} D x^{\otimes}

(x^{\otimes})^{'} = ((1 - x)^{- 1_{\otimes}})^{'} = (1 - x)^{- 2_{\otimes}}; x^{\otimes} = 1 + x (1 - x)^{- 2_{\otimes}}

(x^{\otimes})^{'} = ((1 - x)^{- 1_{\otimes}})^{'} = (1 - x)^{- 2_{\otimes}}; x^{\otimes} = 1 + x (1 - x)^{- 2_{\otimes}}

[x^{k}](1+bx)^{r}=\frac{1}{k!}[x^{0}]D^{k}(1+bx)^{r}=\frac{1}{k!}[x^{0}]r^{\underline{k}}b^{k}(1+bx)^{r-k}=b^{k}\left(\begin{array}[]{c}r\\ k\end{array}\right)

[x^{k}](1+bx)^{r}=\frac{1}{k!}[x^{0}]D^{k}(1+bx)^{r}=\frac{1}{k!}[x^{0}]r^{\underline{k}}b^{k}(1+bx)^{r-k}=b^{k}\left(\begin{array}[]{c}r\\ k\end{array}\right)

[x^{n}] g

[x^{n}] g

[x^{n}] c

[x^{n}] t

[x^{n}] g = \frac{1}{n} [x^{- 1}] D h f^{- n}; [x^{0}] g = h_{0}

[x^{n}] g = \frac{1}{n} [x^{- 1}] D h f^{- n}; [x^{0}] g = h_{0}

ΔΣ s = s \Leftrightarrow (Σ s)^{'} - Σ s = s \Leftrightarrow (Σ s)^{'} = Σ s + s \Leftrightarrow Σ s = 0 + x (Σ s + s) \Leftrightarrow Σ s = \frac{x s}{1 - x}

ΔΣ s = s \Leftrightarrow (Σ s)^{'} - Σ s = s \Leftrightarrow (Σ s)^{'} = Σ s + s \Leftrightarrow Σ s = 0 + x (Σ s + s) \Leftrightarrow Σ s = \frac{x s}{1 - x}

[x^{n}] ΣΔ s = [x^{n}] x x^{*} Δ s = [x^{n - 1}] x^{*} Δ s = i = 0 \sum n - 1 ([x^{i + 1}] s - [x^{i}] s) = [x^{n}] s - [x^{0}] s

[x^{n}] ΣΔ s = [x^{n}] x x^{*} Δ s = [x^{n - 1}] x^{*} Δ s = i = 0 \sum n - 1 ([x^{i + 1}] s - [x^{i}] s) = [x^{n}] s - [x^{0}] s

E^{n} = (1 + Δ)^{n}

E^{n} = (1 + Δ)^{n}

[x^{n}] s = (E^{n} s)_{0}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLogic, programming, and type systems · Computability, Logic, AI Algorithms · Numerical Methods and Algorithms

Full text

In Praise of Sequence (Co-)Algebra and its implementation in Haskell

Kieran Clenaghan

Abstract

What is Sequence Algebra? This is a question that any teacher or student of mathematics or computer science can engage with. Sequences are in Calculus, Combinatorics, Statistics and Computation. They are foundational, a step up from number arithmetic. Sequence operations are easy to implement from scratch (in Haskell) and afford a wide variety of testing and experimentation. When bits and pieces of sequence algebra are pulled together from the literature, there emerges a claim for status as a substantial pre-analysis topic. Here we set the stage by bringing together a variety of sequence algebra concepts for the first time in one paper. This provides a novel economical overview, intended to invite a broad mathematical audience to cast an eye over the subject. A complete, yet succinct, basic implementation of sequence operations is presented, ready to play with. The implementation also serves as a benchmark for introducing Haskell by mathematical example.

1 Introduction

Consider these titles: Formal Power Series [64], Power Series, Power Serious [57], A Coinductive Calculus of Streams [72], and Concrete Stream Calculus [39]. The mixing of the classical and the modern in these papers is stimulating and suggests a re-telling of the elementary theory and application of sequences. Casting our net wider than the citations in those four papers, brings up a number of corroborating works, including [85, 8, 83, 84], in which the authors call attention to the intrinsic qualities and utility of an elementary calculus or algebra of sequences. We do more than re-advertise this work – we endeavour to tease out the common ground, emphasising economy of statement and notation, whilst embracing variety of approach.

Our aim is to attract those who are less well acquainted with sequence work, or those who are unfamiliar with Haskell or both. It is so that others can enjoy “messing about”, as Hayman [34] might put it, with sequences and their implementation. Elementary sequence algebra provides a good answer to the question, ‘What is the smallest coherent chunk of mathematics to set undergraduates to implement, from scratch, so that they get the greatest reward?’.

First we must say that sequence algebra is an umbrella title for algebraic manipulations of finite and infinite sequences, $p=[p_{0},p_{1},\ldots,p_{n}],\ f=[f_{0},f_{1},\ldots]$ , over some given element set, $F$ . It encompasses, prominently, formal power series algebra, but is not restricted to it. A finite sequence, viewed as a sequence of coefficients for powers of $x$ , can be expressed as a formal polynomial. Thus $[0,1]=0x^{0}+1x^{1}=x$ , or we may say that $x$ is implemented by $[0,1]$ . Similarly, $3x^{2}-x+4=[4,-1,3]$ . A finite sequence can also be interpreted as an infinite sequence by appending an infinite sequence of zeros. An infinite sequence can be expressed as a formal power series:

[TABLE]

We write $f_{n}$ or $[x^{n}]f$ for the element at position $n$ , called the $n$ th term of $f$ ; it is the coefficient of $x^{n}$ in the power series view. The zeroth term can also be written $f(0)$ , but otherwise the notation $f(n)$ is reserved for function application: let $p=3x^{2}+4$ , then $p_{2}=3$ , but $p(2)=16$ ; contrast $p_{0}=p(0)=4$ . This example reveals that the symbol $p$ is overloaded, it stands for a function of type ${\mathbb{N}}\rightarrow F$ and also for another of type $F\rightarrow F$ , and we must be careful to distinguish them.

It seems that in spite of Niven’s paper [64] receiving the Lester R Ford award, the algebra of formal power series (FPS) has not entered the standard introductory university texts in any substantial way. Niven’s paper has, however, been a key reference for the first chapters of both [36, 31], neither of which is standard undergraduate fare. In general, we find that elements of FPS/sequence algebra appear throughout a vast literature, but the algebra itself is treated minimally, perfunctorily, or is taken for granted, as might be expected in research work [21] and specialist texts [61, 2]. It also comes under generatingfunctionlogy [86, 32]. The term “generating function” has proven to be a bit awkward, because of the “function” part. Notice how many times Wilf [86] and others have to remind readers when convergence is not as issue. So, where some might use “generating function”, we use the plain “sequence expression”. The term “generating function” is more applicable when an analytic function interpretation is intended [19, Ch. VII]. However, we freely use the standard names of core analytic functions for their Taylor sequences, but rather than say, $\exp(x)$ , we use just $\exp$ for the sequence.

The extent of material that fits into elementary sequence algebra is perhaps under-appreciated. Our goal is to raise appreciation through a modest “survey” of examples, presented in section 3. Various notations and proof-styles appear in the literature, and an effort has been made to be inclusive and to harmonise. The one-word term “sequence” is often preferred to the three-word “formal power series”, but both have their merits, the latter being preferred in the multivariate case, and when formal variable substitution is involved.

The sequence algebra we exhibit is on the same elementary level as Niven’s paper [64]. Much detail has to be omitted so that we can cover more examples. Enough detail is included to convey the foundational concreteness of the topic, and omitted detail is in the literature. The Haskell implementation is given in full in section 4 so that the reader can be in no doubt about how succinct it is, and can type it all up (or download it), and “own” it. The more mathematically-inclined reader may dwell on section 3, and reflect on the potential of a sequence algebra topic in the mathematical curriculum. The more programming-inclined may dwell on sections 4, 5 and 6, and engage with the proposition that sequence algebra provides an excellent vehicle for exploring learning-mathematics-through-programming, or vice-versa. There is great scope for making a contribution to the consolidation, refinement, and application of sequence algebra as an introductory subject.

To help catch, at a glance, some of the things that come under sequence algebra, we have included a number of tables. Tables 1, 2 and 3 are instantly recognisable as belonging to a calculus text, but here, uncommonly, the objects being related are sequences, not analytic functions. Of course, they herald identities that hold for analytic functions, but in agreement with Niven [64] and Tutte [84], there is something to celebrate in the fact that the identities can be established on very elementary grounds. There is more to celebrate in tables 4 and 5, because many of the sequence expressions therein have a dual interpretation as a set-theoretic structure specification [25]. Yet more satisfaction is to be had from table 7, because the solutions to the defining differential equations for the core sequences transliterate trivially into Haskell definitions [58].

Therefore a grounding in sequence algebra and its implementation surely pays off. This claim is validated at least in the study of the two texts, Concrete Mathematics [32] and Analytic Combinatorics [25]. There we see numerous examples where sequence algebra is at play, and the implementation can be used for testing, for reinforcement, or just for fun. Some of these appear in the next section. For example, in item W we derive the bivariate power series $S(z,u)=\displaystyle\frac{\exp\circ uz-1}{\exp\circ z-1}$ that appears in [32, sect. 7.6]. This generates a sequence $\breve{S}$ such that $\breve{S}_{m}$ is a polynomial, and $\breve{S}_{m}(n)=\displaystyle\sum_{0\leq k<n}k^{m}$ . That is, $\breve{S}=\displaystyle[x,-\frac{1}{2}x+\frac{1}{2}x^{2},\frac{1}{6}x-\frac{1}{2}x^{2}+\frac{1}{3}x^{3},\ldots]$ . It is striking how accessible the mathematics behind $S(z,u)$ is, and how easy it is to define the infinite $S(z,u)$ and $\breve{S}$ in a program.

2 The basics

Two sequences $f$ and $g$ are equal if they are equal at all indices: $\forall n\geq 0.[x^{n}]f=[x^{n}]g$ . There are many instances when it is obvious that a statement is subject to universal quantification, and in such cases we leave the $\forall$ part to be inferred by the reader. Becoming fluent with the clumsy-looking coefficient extraction operator, $[x^{n}]$ , pays dividends [31, 51]. It obeys the precedence rules, $[x^{n}]f+g=([x^{n}]f)+g$ , $[x^{n}]fg=[x^{n}](fg)$ , and $[x^{n}]f\circ g=[x^{n}](f\circ g)$ . Moreover, it is a linear operator:

[TABLE]

The generalisation $[u^{m}z^{n}]$ will be used to identify a term in a bivariate sequence. Let $E$ be the tail or shift-left operator: $E[f_{0},f_{1},f_{2},\ldots]=[f_{1},f_{2},\ldots]$ . In formal power series language, $E\,f=\displaystyle\frac{1}{x}(f-f_{0})$ , and $f=f_{0}+xE\,f$ ; this is referred to as the head-tail property (in [72] it is the “fundamental theorem of stream calculus”, see section 3, item P). We also have $[x^{n+1}]f=[x^{n}]E\,f$ and the head-tail expansion rule, $[x^{n}]f=[x^{0}]E^{n}f$ . We are free to mix notations – here is the definition of convolution product, $f*g$ , in which, typically, the $*$ is suppressed:

[TABLE]

Observe that, since $x_{1}=1$ is the only non-zero element of $x=[0,1]$ , we have

[TABLE]

Thus $x[f_{0},f_{1},f_{2},\ldots]=[0,1]*[f_{0},f_{1},f_{2},\ldots]=[0,f_{0},f_{1},f_{2},\ldots]$ , and $x$ can be viewed as a right-shift operator. The absorption law, $[x^{n}]x^{m}f=[x^{n-m}]f$ , is simple but effective. The product in both $f=f_{0}+xE\,f$ and $f=\sum_{i}f_{i}x^{i}$ is convolution product, $+$ is pointwise addition, the element $f_{i}$ is automatically identified with the singleton sequence, $[f_{i}]$ , as required, and $x=[0,1]$ . A recursive equation for product is easily derived (and just as easily translated into Haskell); $E\,f$ is abbreviated to $f^{\prime}$ :

[TABLE]

Sequences, ${\mathbb{S}}$ , over an integral domain or field $F$ (known from context) form an integral domain [64, 55] ( $F[[x]]$ is the standard notation for formal power series in $x$ over $F$ ). We will keep $F={\mathbb{Q}}$ in mind. The subsets ${\mathbb{S}}_{0},\ {\mathbb{S}}_{1},$ and ${\mathbb{S}}_{\neq\!0}$ comprise sequences with zeroth term 0, 1, and non-zero, respectively. A subset ${\mathbb{S}}_{C}$ of ${\mathbb{S}}_{0}$ comprises sequences $f$ in which $f_{1}\neq 0$ , that is $E\,f\in{\mathbb{S}}_{\neq\!0}$ . Unique square (and $n$ th) roots exist for sequences in ${\mathbb{S}}_{1}$ . Sequence composition $f\circ g=\sum_{k}f_{k}g^{k}$ is defined for $g\in{\mathbb{S}}_{0}$ . A unique compositional inverse, $f^{\circ}$ , called converse, exists for $f\in{\mathbb{S}}_{C}$ . The notation follows [6] and distinguishes converse $f\circ f^{\circ}=x$ from multiplicative inverse $f*f^{-1}=1$ . The latter exists for $f\in{\mathbb{S}}_{\neq\!0}$ . Differentiation, $D$ , is term-wise, as for formal polynomials: $[x^{n}]D\,f=(n+1)[x^{n+1}]f$ . A little induction gives the Maclaurin expansion rule: $[x^{n}]f=\displaystyle\frac{1}{n!}[x^{0}]D^{n}f$ . From the definition of $\int$ as a right inverse to $D$ , $[x^{n}]D\int f=[x^{n}]f$ , we deduce $[x^{n+1}]\int f=\displaystyle\frac{1}{n+1}[x^{n}]f$ . Setting $[x^{0}]\int f=0$ , the fundamental theorem of sequence calculus (FTC) is immediate:

[TABLE]

The familiar rules in tables 1 and 2 have easy sequence-algebraic proofs (the third $\int$ -product rule is called the differential Baxter axiom in [10]). For example, here is a typical proof of the differential composition rule, or chain rule [36, Ch. 1]. A proof by coinduction is presented for contrast in item Q of the next section. Let $f\in{\mathbb{S}},\ g\in{\mathbb{S}}_{0}$ , then,

[TABLE]

3 Sequence Algebra examples

The following itemization (A-Z) of snippets provides a brief survey of the character of sequence algebra. It is, of course, only a small fraction of the subject.

(A) Sequence algebra is foundational in the sense that it is a low-level concrete extension of arithmetic. To appreciate this, try making the following simpler. Define $\exp$ by the sequence differential equation $D\,\exp=\exp;\ \exp_{0}=1$ . Then, by the Maclaurin rule, $\exp=1+x/1!+x^{2}/2!+x^{3}/3!+\cdots=[1,1,1/2,1/3!,\ldots]$ . Define $x^{*}$ by the sequence difference equation $E\,x^{*}=x^{*};\ x^{*}_{0}=1$ ; then, by the head-tail property, $x^{*}=1+x+x^{2}+\cdots=[1,1,1,\ldots]=1/(1-x)$ ; the notation is based on the Kleene star [19]. Let $\log f=\mbox{\rm lgn}\circ(f-1)$ where lgn (for which there is no established name other than $\log(1+x)$ ) is the converse of $\exp-1$ ; that is, $\mbox{\rm lgn}=(\exp-1)^{\circ}$ . Observe that $(\exp-1)\circ\mbox{\rm lgn}=x$ implies $\exp\circ\mbox{\rm lgn}=1+x$ , so $D\,\mbox{\rm lgn}=D\,(\exp-1)^{\circ}=(\exp\circ\mbox{\rm lgn})^{-1}=1/(1+x)$ . The FTC gives $\mbox{\rm lgn}=\int 1/(1+x)$ , and the coefficients can be calculated:

[TABLE]

Some well-known rules relating to $\log$ and $\exp$ , derivable from the differential equation for $\exp$ using sequence algebra, appear in table 3 (preconditions are omitted to avoid clutter). Most of these are meticulously proven by Niven [64]. However, Niven does not use composition $(\circ)$ , but by using composition and its associativity and distributivity laws [36], his theorem 17 and proof can be rendered as follows: let $g=1+f,\ f,h\in{\mathbb{S}}_{0}$ ,

[TABLE]

Proof of Euler’s identity, $\exp\circ ix=\cos+i\sin$ , illustrates appeal to the uniqueness of solution to certain differential equations. Let $D\,\sin=\cos;\ \sin_{0}=0$ , $D\,\cos=-\sin;\ \cos_{0}=1$ , and $i^{2}=-1$ . Then both $\cos+i\sin$ and $\exp\circ ix$ satisfy $D\,g=i\,g;\ g_{0}=1$ , and therefore must be equal. De Moivre’s theorem follows:

[TABLE]

(B) A counting sequence, $c$ , is either ordinary, $c=[c_{0},c_{1},c_{2},c_{3}\ldots]$ or exponential, $c=[c_{0},c_{1}/1!,c_{2}/2!,c_{3}/3!,\ldots]$ . In either case, $c_{n}$ counts the number of objects of size $n$ generated by some structure specification, $C$ . In the former case $c_{n}=[x^{n}]c$ , and in the latter case $c_{n}=n![x^{n}]c$ . Roughly speaking, a structure is built from nodes, and its size is the number of nodes it has. The nodes may be labelled or unlabelled. Exponential sequences are used for labelled objects because, in that case, convolution product automatically counts all possible labellings in making an ordered product. Let $f$ and $g$ count labelled structures (generated by some $F$ and $G$ , respectively); then ordered pairs of such structures are counted by

[TABLE]

The ordered $k$ -fold product, $F^{k}$ , has counting sequence $f^{k}$ . Ordered lists of $F$ -objects are counted by $\mbox{\sf list\,}\circ f=x^{*}\circ f=f^{*}$ . If the order does not count, then we use $\mbox{\sf set\,}\circ f=\sum_{k}f^{k}/k!=\exp\circ f.$

The fact that permutations can be written as sets of cycles can be made explicit in the definition of the counting sequence for permutations [25, 11]:

[TABLE]

Just put $\mbox{\sf set}=\exp$ and $\mbox{\sf cycle}=\log x^{*}$ . The sequence perm is $x^{*}$ regarded as an exponential sequence, $x^{*}=[1,1!/1!,2!/2!,3!/3!\ldots]$ . Removal of the factorial divisors is performed by $\Lambda$ , and $\Lambda\mbox{\sf perm}=[1,1,2,6,24,120,\ldots]$ , $[x^{n}]\Lambda\mbox{\sf perm}=n![x^{n}]\mbox{\sf perm}=n!$ , the number of permutations of $n$ symbols. There are $(n-1)!$ cyclic permutations of $n$ symbols, and the exponential counting sequence for these is:

[TABLE]

(C) The number of ways, $s_{n}$ , of inserting brackets into a list of $n$ symbols, subject to well-formedness, is counted by the Hipparchus-Schröder sequence [76, 79], $s=\displaystyle\frac{1}{4}(1+x-\sqrt{1-6x+x^{2}}\,)=[0,1,1,3,11,45,197,903,4279,20793,103049,\ldots]$ . This can be derived from the specification of a bivariate counting sequence for Schröder trees:

[TABLE]

Here $\mbox{\sf pluralList}=\mbox{\sf list\,}-x-1$ . The coefficient of $u^{k}z^{n}$ in schroeder(z,u) gives the number of Schröder bracketings of $n$ symbols using $k$ pairs of brackets. Observe that equation (2) can also be read [25] as a set-theoretic specification of Schröder trees, with $z$ and $u$ naming different kinds of nodes. Using $\mbox{\sf pluralList}=x^{*}-x-1$ , the above definition of $s$ derives, via the quadratic formula, from the equation $s=\mbox{\sf schroeder(x,1)}=x+(x^{*}-x-1)\circ s=x+s^{*}-s-1=\displaystyle x+\frac{1}{1-s}-s-1$ . Here is a foretaste of computing the Schröder numbers in Haskell, which is explained in section 4:

schroeder   =  z + u*(pluralList ‘o‘ schroeder)
> takeBiv [1..6] schroeder
[[0],[1,0],[0,1,0],[0,1,2,0],[0,1,5,5,0],[0,1,9,21,14,0]]
> takeW 11 ((1+x-sqroot(1-6*x+x^2))/4)
[0,1,1,3,11,45,197,903,4279,20793,103049]

This reveals, for example, $[u^{2}z^{5}]\mbox{\sf schroeder}=9$ , that is 9 bracketings of 5 symbols with 2 pairs of brackets. One can see that the elements of the second sequence are totals of corresponding elements of the first. We remark that the sequence $r=2s/x-1$ is called the large Schröder sequence [25, p. 474] (it solves $r=1+xr+xr^{2}$ ).

(D) The dual interpretation of sequence expressions as set-theoretic structure specifications, is exploited in [24, 25] (influenced by [44]). One might write the set-theoretic counterpart of say, $\mbox{\sf perm}=\mbox{\sf set}\circ\mbox{\sf cycle}$ , as $\mbox{\sf Perm}\cong\mbox{\sf Set}\circ\mbox{\sf Cycle}$ , indicating by the initial capital letters a set-theoretic interpretation. Essentially this is done in [25], but for brevity we only give the equations defining the counting sequences. Table 4 lists some univariate examples, and table 5 some bivariate ones. These can be typed more-or-less verbatim into Haskell, as illustrated in section 5. Many more could be lifted from chapters I-III of [25], from the appendices of [4], and from [15, 31, 79, 76].

Let us examine a less-than-obvious expression, ascents from table 5, the origin of which illustrates the principle of inclusion-exclusion [25, 90]. It counts permutations according to the number of ascents. For example, the permutation $|248|3679|5|1|$ of $\{1..9\}$ has 4 up-runs demarcated with vertical bars, 3 descents, and 5 ascents. If there are $k$ descents then there are $k+1$ up-runs. The reversal of a permutation with $k$ descents delivers a permutation with $k$ ascents. The count $n![u^{k}][z^{n}]\mbox{\sf ascents}$ is called an Eulerian number, and gives the number of permutations of $\{1..n\}$ with $k$ ascents [32]. To come up with the sequence expression, first note that an up-run with at least one ascent corresponds to a plural set. The counting sequence for such a set in which the $k$ of $u^{k}$ records the ascents is $(\mbox{\sf pluralSet}\circ uz)/u$ . Let us specify permutations in which some parts of up-runs are identified as sets, and other elements are undistinguished: $b(z,u)=\mbox{\sf list\,}\circ(z+(\mbox{\sf pluralSet}\circ uz)/u)$ . Now propose that $\mbox{\sf ascents}(z,u)$ is the exact counting sequence we are after, then the inclusion-exclusion principle says that $\mbox{\sf ascents}(z,u+1)=b(z,u)$ and $\mbox{\sf ascents}(z,u)=b(z,u-1)$ , which is cited in the table.

(E) The sequence defined by $D\,\tan=1+\tan*\tan;\ \tan_{0}=0$ is the Maclaurin expansion of the tangent function (the $*$ is explicit just for emphasis). The numbers in $t=\Lambda\tan=[0,1,0,2,0,16,0,272,0,7936,\ldots]$ are called tangent numbers. The tangent numbers count certain kinds of alternating permutations (or ordered binary trees) [78]. One can define $t$ also by $E\,t=1+t\otimes t$ , where $\otimes$ is shuffle (or Hurwitz [47]) product. Convolution product (1) and shuffle product can be defined in head-tail form ( $E\equiv\ ^{\prime}$ ):

[TABLE]

The rule $E(s\otimes t)=E\,s\otimes t+s\otimes E\,t$ matches $D(fg)=D\,f+fD\,g$ and we have the Leibniz formulae

[TABLE]

Applying $[x^{0}]$ to the latter gives the pointwise definition of $\otimes$ (note that $a\otimes b=ab$ when $a$ and $b$ are scalars):

[TABLE]

A shuffle inverse, $s^{-1_{\otimes}}$ , derives from the specification $s\otimes s^{-1_{\otimes}}=1$ together with the shuffle product rule (in exactly the same way that the rule for $D\,f^{-1}$ is derived):

[TABLE]

(F) Let ${\mathbb{S}}_{F}(*)$ denote the ring $({\mathbb{S}}_{F},+,*,0,1)$ , of sequences over some field $F$ (of characteristic 0) with the availability of inverses implied. Then $({\mathbb{S}}_{F}(*),D,\int)$ is an integro-differential algebra [10], and so too is $({\mathbb{S}}_{F}(\otimes),E,x)$ , where $x$ is the right-shift operator: $xf=x*f$ . The transform $(\Lambda,\Lambda^{-1})$ is an isomorphism between them, and is a formal Laplace transform [67, 26]. In the previous example, the equation defining $\tan$ is transformed by $\Lambda$ into the equation defining $t$ . The sequence of factorial numbers , $x^{\otimes}=\Lambda x^{*}$ , can be defined by applying $\Lambda$ to $x^{*}=1+xx^{*}$ to get $x^{\otimes}=1+x\otimes x^{\otimes}$ . From this we deduce $x^{\otimes}=(1-x)^{-1_{\otimes}}$ . Observe, for $n>0$ :

[TABLE]

Also, $[x^{n}]x^{\otimes}=n[x^{n-1}]x^{\otimes}=(n-1)[x^{n-1}]x^{\otimes}+[x^{n-1}]x^{\otimes}=[x^{n}]x^{2}D\,x^{\otimes}+[x^{n}]xx^{\otimes}$ leads to the differential equation for the factorials:

[TABLE]

Furthermore,

[TABLE]

(G) We recall a classic proof [64] of the binomial theorem. Let $r\in F$ , $r^{\underline{k}}=r(r-1)\cdots(r-k+1)$ (the falling factorial), and $r^{\overline{k}}=r(r+1)\cdots(r+k-1)$ (the rising factorial):

[TABLE]

A corollary is $[x^{k}](x^{*})^{r}=[x^{k}](1-x)^{-r}=(-1)^{k}(-r)^{\underline{k}}/k!=r^{\overline{k}}/k!=\left(\begin{array}[]{c}r+k-1\\ r-1\end{array}\right)$ . This result, and $[x^{m}]\exp^{n}=[x^{m}]\exp\circ nx=n^{m}/m!$ , are basic ingredients in the search for $n$ th term formulas. They are applied next.

(H) A Lagrange inversion formula [77, 59, 60, 29] gives an expression for the $n$ th term of the converse of a sequence. For example, let $g=x(r\circ g)$ , then $x=g/(r\circ g)=(x/r)\circ g$ , so $g$ is the converse of $x/r$ . Below is the Lagrange inversion formula for this case, followed by its application to the counting sequences for Catalan trees, $c=x(\mbox{\sf list}\circ c)=x(x^{*}\circ c)$ , and Cayley trees, $t=x(\mbox{\sf set}\circ t)=x(\exp\circ t)$ :

[TABLE]

For a history of the Catalan numbers, see [66]. Cayley trees are rooted versions of the connected acyclic graphs counted by Cayley in [13]. Cayley also counted the Catalan trees in [12], and the first part of Niven [64] sets out to legitimise the sequence algebra underlying Cayley’s proof (Niven cites [42], not Cayley; but Raney [71] cites both).

A slightly more general statement of Lagrange inversion is that it solves $h=g\circ f$ (equivalently $g=h\circ f^{\circ}$ ) for $g$ , where $h\in{\mathbb{S}},\ f\in{\mathbb{S}}_{C}$ . The theory of Lagrange inversion sometimes employs Laurent series – series with negative powers (or sequences with negative indicies). In the following formula for the $n$ th term of $g=h\circ f^{\circ}$ , the coefficient of $x^{-1}$ (called the residue) is identified:

[TABLE]

Let $s=x(r\circ s)$ , $s=(x/r)^{\circ}$ , and $h=g\circ(x/r)$ ; then Lagrange inversion applied to $g=h\circ s$ , gives, for $h=x^{k}$ , the $n$ th term formula $[x^{n}]s^{k}=\displaystyle\frac{k}{n}[x^{n-k}]r^{n}$ . This result specialises, when $r=x^{*}$ , to a variant of the cycle lemma [17], called Raney’s lemma in [32], which has a history in statistics [69] related to Ballot numbers [25, p. 68]. There are various Lagrange inversion formulas and many proofs; [14] takes an approach that also facilitates proof of Faà di Bruno’s formula formula for $D^{n}(f\circ g)$ [43].

(I) The (forward) difference operator, $\Delta s=[E-1]s=s^{\prime}-s$ produces the sequence of term-to-term differences, $[x^{n}]\Delta s=s_{n+1}-s_{n}$ . The definition of an anti-difference operator $\Sigma$ on sequences is calculated [37] as a right identity to $\Delta$ , with $(\Sigma s)_{0}=0$ :

[TABLE]

Thus, $\Sigma s=xx^{*}s$ computes all the prefix sums of $s$ (including the empty one). Applied to $\Delta s$ we get:

[TABLE]

There follows the fundamental theorem of discrete calculus (FDC) on sequences: $s=s_{0}^{\tiny\mbox{\tiny\,$ \bullet $\,}}+\Sigma\Delta s;\ s=\Delta\Sigma s$ , where $a^{\tiny\mbox{\tiny\,$ \bullet $\,}}=ax^{*}$ is the sequence with $a$ everywhere.

(J) Here are the $E$ to $\Delta$ translations extended to powers:

[TABLE]

The identity $\left(\begin{array}[]{c}n\\ k\end{array}\right)=[x^{n}]x^{k}/(1-x)^{k+1}$ turns equation (9) into the Euler expansion, $s=\displaystyle\sum_{k}\frac{(\Delta^{k}s)_{0}x^{k}}{(1-x)^{k+1}}$ . This expansion can also be derived from $s\otimes(-x)^{*}=\sum_{k}(\Delta^{k}s)_{0}x^{k}$ , plus the facts $(-x)^{*}\otimes x^{*}=1$ , and $x^{k}\otimes x^{*}=x^{k}/(1-x)^{k+1}$ (see [3] and items K and P):

[TABLE]

Let $g=\mbox{\rm lgn}\circ-x$ , whence $\mbox{\rm lgn}=g\circ-x$ , and apply Euler’s expansion to $g$ ,

[TABLE]

It is instructive to use this to approximate $\log(2)=\mbox{\rm lgn}(1)$ .

(K) The sequence ${\cal N}s=(-x)^{*}\otimes s=[s_{0},(\Delta s)_{0},(\Delta^{2}s)_{0},\ldots]$ is the sequence of Newton coefficients [3]. It may also be specified by $({\cal N}s)^{\prime}={\cal N}(\Delta s);\ ({\cal N}s)_{0}=s_{0}$ . The operator ${\cal N}=((-x)^{*}\otimes\_)$ , called the Newton transform in [3], has the converse ${\cal N}^{-1}=(x^{*}\otimes\_)$ , called the Binomial transform in [39]. The identity $x^{*}\otimes(-x)^{*}=1$ holds because the head of $x^{*}\otimes(-x)^{*}$ is 1, and the tail is

[TABLE]

The following products introduce two new rings, the Hadamard ring $(S_{R},+,-,\odot,0,1^{\tiny\mbox{\tiny\,$ \bullet $\,}})$ , and the infiltration ring $(S_{R},+,-,\uparrow,0,1)$ [3]:

[TABLE]

The rules in table 6 apply, and $({\cal N},{\cal N}^{-1})$ is an isomorphism between the Hadamard and infiltration rings. The following is a point-wise definition of $s\uparrow t$ :

[TABLE]

The proof that ${\cal N}$ is a morphism from $\odot$ to the new product $\uparrow$ can be re-imagined as a discovery of what the definition of $\uparrow$ should be. The morphism presumption is signalled on the right below.

[TABLE]

(L) The following defines permutation cycle numbers, $\left[\begin{array}[]{c}n\\ k\end{array}\right]=n![u^{k}z^{n}]\mbox{\sf cycles}$ , and set partition numbers $\left\{\begin{array}[]{c}n\\ k\end{array}\right\}=n![u^{k}z^{n}]\mbox{\sf parts}$ . These are also called Stirling numbers of the first and second kind, respectively.

[TABLE]

The well-known recurrences [32],

[TABLE]

translate, using $c=\mbox{\sf cycles}$ and $p=\mbox{\sf parts}$ , into

[TABLE]

where $D_{z}$ and $D_{u}$ are the partial differentiation operators with respect to $z$ and $u$ . To see this, note that $\left[\begin{array}[]{c}n+1\\ k\end{array}\right]=(n+1)![u^{k}z^{n+1}]c=n![u^{k}z^{n}]D_{z}c$ , $\left[\begin{array}[]{c}n\\ k-1\end{array}\right]=n![u^{k-1}z^{n}]c=n![u^{k}z^{n}]uc$ , and so on. The recurrences can be checked: first, $D_{z}c=D_{z}\exp\circ(u\log z^{*})=ucz^{*}=uc+uczz^{*}=uc+zD_{z}c$ ; and second, $up+uD_{u}p=up+u(\exp\circ z-1)p=pu\exp\circ z=D_{z}p$ . We may write $n![z^{n}]\exp\circ(u\,\log\circ z^{*})=n![z^{n}](1-z)^{-u}=u^{\overline{n}}$ . The cycles recurrence can also be written $[x^{k}]x^{\overline{n}}=[x^{k-1}]x^{\overline{n-1}}+(n-1)[x^{k}]x^{\overline{n-1}}$ , which follows from $x^{\overline{n}}=x^{\overline{n-1}}(x+n-1)=xx^{\overline{n-1}}+(n-1)x^{\overline{n-1}}$ .

(M) A factorial polynomial uses falling factorials instead of powers. For example, let $p=1+2x+x^{2}$ , then the falling factorial counterpart is $\underline{p}=1+3x^{\underline{1}}+x^{\underline{2}}$ . Coefficients in $\underline{p}$ are identified by $[x^{\underline{k}}]\underline{p}$ , for example $[x^{\underline{1}}]\underline{p}=3$ .

The symbols $\Sigma$ and $\Delta$ are overloaded as operators on factorial polynomials and obey rules identical to those for $D$ and $\int$ on polynomials: let $\underline{p}$ denote a polynomial in falling factorials, then $[x^{\underline{k}}]\Delta\underline{p}=(k+1)[x^{\underline{k+1}}]\underline{p}$ and $[x^{\underline{n+1}}]\Sigma\underline{p}=\displaystyle\frac{1}{n+1}[x^{\underline{n}}]\underline{p}$ . The fundamental theorem of the discrete calculus on factorial polynomials is immediate: $\underline{p}=\underline{p}_{0}+\Sigma\Delta\underline{p};\ \underline{p}=\Delta\Sigma\underline{p}$ . In [32], the theorem provides one of seven ways of deducing the polynomial for summing squares: given $\Delta\underline{p}=(1+x)^{2}=1+3x^{\underline{1}}+x^{\underline{2}}$ , apply $\Sigma$ to both sides,

[TABLE]

Note that if $p$ is a polynomial $n$ th term formula for sequence $s$ , $p(n)=s_{n}$ , then $(\Delta\underline{p})(n)=(\Delta s)_{n}$ and $(\Sigma\Delta\underline{p})(n)=p(n)-p(0)=s_{n}-s_{0}=(\Sigma\Delta s)_{n}$ .

(N) An analogue of the Maclaurin rule holds: $[x^{\underline{n}}]\underline{p}=\displaystyle\frac{1}{n!}[x^{\underline{0}}]\Delta^{n}\underline{p}=\frac{1}{n!}(\Delta^{n}p)(0)$ . The latter equality involves yet another interpretation of $\Delta$ : $(\Delta p)(n)=p(n+1)-p(n)$ . Gregory-Newton (interpolation) formulas for $p$ of degree $m$ follow; the second (see also (9)) uses $n^{\underline{k}}/k!=\left(\begin{array}[]{c}n\\ k\end{array}\right)$ :

[TABLE]

Let $s=[0,1,5,14,30,55,\ldots]$ be the sequence $s_{n}=1^{2}+2^{2}+\cdots+n^{2}$ , for which we seek the polynomial $p$ such that $p(n)=s_{n}$ . Then the above expansions produce the polynomial(s) in (17). By contrast, the Euler expansion (16) produces the sequence expression $s=(x+x^{2})/(1-x)^{4}$ .

(O) We have seen $[x^{k}]x^{\overline{n}}=\left[\begin{array}[]{c}n\\ k\end{array}\right]$ , so we can express the polynomial for $x^{\overline{n}}$ in terms of cycle numbers, and by change of signs, also the polynomial for $x^{\underline{n}}$ :

[TABLE]

This shows how to translate falling factorials into powers. The converse is

[TABLE]

and for a proof see [9, p. 343] and [32, p. 262].

(P) Infinite sequences are called streams when they are identified with the final object in a category of head-tail coalgebras [41, 72]. This accounts for the name “stream” in the following:

[TABLE]

The co-algebraic stream calculus [72] introduces a proof principle called coinduction. For example, the identity $x^{k}\otimes x^{*}=x^{k}/(1-x)^{k+1}=x^{*}(xx^{*})^{k}$ used in the proof of (16) can be proved using coinduction (it can also be proved from the point-wise definition of $\otimes$ and the binomial theorem). Here is the gist of the coinductive proof. Propose the relation $x^{k}\otimes x^{*}\sim x^{*}(xx^{*})^{k}$ . This is used as a coinductive hypothesis. Head-equality holds, $(x^{k}\otimes x^{*})_{0}=(x^{*}(xx^{*})^{k})_{0}$ . The proof is completed by showing that the tails are equal under the hypothesis, signified below by the use of $(\sim)$ :

[TABLE]

The bracketed (co-) in the paper’s title indicates that we only touch lightly on co-algebraic concepts. There is more to coinduction than the above example suggests, and we refer to the survey [33] for background.

(Q) One can check from the pointwise definitions of $D$ and $\otimes$ that $D\,f=(x\otimes f^{\prime})^{\prime}$ . Alternatively, equality can be proved by showing that these expressions satisfy the same head-tail equations. The head-tail equation for $D\,f$ is calculated:

[TABLE]

Hence, $(D\,f)_{0}=f_{1}$ and $(D\,f)^{\prime}=f^{\prime\prime}+D\,f^{\prime}$ . Now let $F\,f=(x\otimes f^{\prime})^{\prime}$ . We find $(F\,f)_{0}=f_{1}$ , and $(F\,f)^{\prime}=(f^{\prime}+(x\otimes f^{\prime\prime}))^{\prime}=f^{\prime\prime}+F\,f^{\prime}$ . Thus, $D\,f=F\,f$ since they satisfy the same head-tail equations.

To give a little more feeling for the coinduction game, let us reveal the machine-level minutiae that proves $D(f\circ g)=(D\,f\circ g)D\,g$ . We use head-tail properties such as $(f\circ g)_{0}=f_{0};\ (f\circ g)^{\prime}=(f^{\prime}\circ g)g^{\prime}$ (see section 4, equation (33)). We will also make use of $(Dh)_{0}=h^{\prime}_{0}$ . Head equality is confirmed:

[TABLE]

Tails are proved equal under the coinductive hypothesis, $D(f\circ g)\sim(Df\circ g)Dg$ :

[TABLE]

(R) Coinduction is also used in [73] to show how continued fractions can be obtained from combinatorially-inspired automata. For example, the tangent sequence can be defined by $t=xu_{1}$ , where $u_{k}=1/(1-k(k+1)x^{2}u_{k+1})$ . Thus $t$ can be displayed as a continued fraction:

[TABLE]

More combinatorially-inspired continued fraction expressions for sequences appear in [23, 31, 73]. Below is one for $\Lambda_{z}\mbox{\sf cycles}=(1-z)^{-u_{\otimes}}$ ( $\Lambda_{z}$ removes the factorial divisors of powers of $z$ ).

[TABLE]

Setting $u=1$ and $z=x$ gives a continued fraction for the factorials, $(1-x)^{-1_{\otimes}}=x^{\otimes}$ .

(S) A $k$ th-order linear ordinary homogeneous differential equation,

[TABLE]

can be written $b(D)f=0$ . Similarly, a difference equation (also called a recurrence equation) can be written $b(E)s=0$ . Let $\grave{b\,}=b_{k}+b_{k-1}x^{1}+\cdots+b_{1}x^{k-1}+b_{0}x^{k}$ be the reverse of $b=b_{0}+b_{1}x+b_{2}x^{2}+\cdots+b_{k}x^{k}$ . Klarner [48] presents this fact: the solution $s$ to $b(E)s=0$ is

[TABLE]

where inits $=s_{0}+s_{1}x+\cdots+s_{k-1}x^{k-1}=s[0..k-1]$ are the initial $k$ elements of $s$ . Also

[TABLE]

For example, $E^{2}s+s=0,\ s_{0}=0,\ s_{1}=1$ has solution $x/(1+x^{2})$ , and $\Lambda^{-1}s=\sin$ , the Maclaurin expansion for $\sin$ , that is, the solution to $D^{2}s+s=0;\ s_{0}=0,\ s_{1}=1$ . Another example is $[z^{n}]C-2u[z^{n-1}]C+[z^{n-2}]C=0,\ C_{0}=1,\ C_{1}=u$ , where $[z^{n}]C$ is a Chebyshev polynomial [9] in $u$ . Then, $b=1-2uz+z^{2}=\grave{b\,}$ , and

[TABLE]

By the translation rules of item J, difference equations can be written using either $E$ or $\Delta$ . The equation $b(E)s=0$ transforms into $b(1+\Delta)s=0$ , or $\hat{b}(\Delta)s=0$ , where $\hat{b}=b\circ(1+x)$ . The converse is $b=\hat{b}\circ(x-1)$ , reflecting $\hat{b}(\Delta)=\hat{b}(E-1)$ . Clearly, $b=b\circ(1+x)\circ(x-1)$ .

(T) A sequence $s$ is called rational if it is the quotient, $s=a/b$ , of polynomials; it is called a LODE solution, written LODE( $s$ ), if it is a solution of a linear ordinary homogeneous difference equation; and it is called recognizable if it is the behaviour of a finite automaton. Then,

[TABLE]

Following [19], a finite automaton can be modelled as a system of linear equations over sequences, $E\,S=AS;\ S(0)=v$ . Here, matrix $A$ records transition labels connecting pairs of states, and $S$ is a vector of sequences, one for each state, with initial values $S_{i}(0)=v_{i}$ . The solution is $S=(Ax)^{*}v$ where

[TABLE]

is the Kleene star. Notice that $(Ax)^{*}$ can be viewed as a matrix of sequences or as a sequence of matrices. Using $(Ax)^{*}=(I-Ax)^{-1}$ , we get $(I-Ax)S=v$ and Cramer’s rule applies: $S_{i}=\displaystyle\frac{\det(I-Ax)[i\leftarrow v]}{\det(I-Ax)}$ (column $i$ replaced by $v$ ). Thus $S_{i}$ is rational. Justification of the above equivalences is completed by noting that a LODE can be transformed into a system of linear equations. We remark that a quotient of polynomials, $a/b$ , can also be written as the solution to a system of linear equations [72].

(U) Let $b=\det(xI-A)=b_{0}+b_{1}x^{1}+\cdots+b_{n-1}x^{n-1}+x^{n}$ be the characteristic polynomial of matrix $A$ . The Cayley-Hamilton theorem [19, 49] can be stated as $b(A)=0$ , or as $b(E)(Ax)^{*}=0$ . This will hold if, taking the sequence of matrix powers, $(Ax)^{*}$ , now as a matrix of sequences, we have $b(E)((Ax)^{*})_{ij}=0$ , which in turn holds if $((Ax)^{*})_{ij}=\displaystyle\frac{a}{\grave{b\,}}$ . We have $\grave{b\,}=x^{n}b(1/x)=\det(I-xA)$ . So we are done if we come up with an $a$ such that

[TABLE]

Let $M=Ax$ , and $J=M^{*}\underline{j}=\underline{j}+MJ$ , where $\underline{j}$ is the vector with 1 at position $j$ and zero elsewhere. Then, $(I-M)J=\underline{j}$ and Cramer’s rule delivers the $a$ we are looking for:

[TABLE]

(V) Consider the matrix exponential, $\exp\circ Ax=\Lambda^{-1}(Ax)^{*}$ as a matrix of sequences. We know that $(Ax)^{*}$ solves $E\,S=AS,\ S(0)=I$ , and $(Ax)^{*}[0..k-1]=[I,A,A^{2},\ldots,A^{k-1}]$ . Cayley-Hamilton says that $b(E)(Ax)^{*}=0$ , where $b$ is the characteristic polynomial of $A$ , of degree $k$ , say. By uniqueness of solution, we have [54, 56, 53] $\phi=(Ax)^{*}$ if $b(E)\phi=0$ and $\phi[0..k-1]=[I,A,A^{2},\ldots,A^{k-1}]$ . Let $S$ be a vector of sequences such that $b(E)S_{i}=0$ and $S_{i}[0..k-1]=x^{i}$ . Set

[TABLE]

Clearly $\phi[0..k-1]=[I,A,A^{2},\ldots,A^{k-1}]$ , and $b(E)\phi=0$ because

[TABLE]

(W) The elements of the sequence $B=x/(\exp-1)=[1,-1/2,1/6,0,-1/30,0,1/42,0,\ldots]$ are called Bernoulli numbers. The corresponding recurrence is calculated from $B\,\exp=B+x$ :

[TABLE]

Bernoulli numbers are used by Graham et al [32] for the most impressive of their deductions of the polynomial that sums squares – impressive because it defines the formulas for all powers at once. Let $S_{(n)}$ be the sequence such that $m![x^{m}]S_{(n)}$ is the sum of the $m$ th powers of the naturals to $n-1$ . Then

[TABLE]

Now replace $n$ by $u$ and $x$ by $z$ to get the expression advertised in the introduction:

[TABLE]

Then $m![z^{m}]S$ is a polynomial of degree $m+1$ in $u$ , and $(m![z^{m}]S)(n)=m![x^{m}]S_{(n)}$ .

(X) Observe that $B_{1}=-1/2$ is non-zero whilst all the other odd-degree coefficients of $B$ appear to be zero. Perhaps if we make $B_{1}$ zero then we will have a sequence which can be proved to be even (i.e. with zeros at odd positions). Adding $\displaystyle\frac{1}{2}x$ to $B$ cancels $B_{1}$ :

[TABLE]

Recall $\displaystyle\coth=\frac{\exp+\exp\circ-x}{\exp-\exp\circ-x}$ , so $C=\displaystyle\frac{x}{2}(\coth\circ\frac{x}{2})$ , from which eveness, $C\circ-x=C$ , can be deduced. Thus,

[TABLE]

Now $C\circ 2x=x\coth$ , and, using $x\cot=ix(\coth\circ ix)$ and $\tan=\cot-2\cot\circ 2x$ , we get

[TABLE]

With a bit of analysis (reals, $\cot(x)$ an analytic function with period $\pi$ , and uniqueness of series expansion), one can deduce another series for $x\cot$ , due to Euler. The omitted analysis [50, 1] is hidden in the first equals sign:

[TABLE]

Equating coefficients with those in the expansion (29) yields, for $k>0$ ,

[TABLE]

Therefore, the values of Riemann’s $\zeta(s)=\displaystyle\sum_{n\geq 1}\frac{1}{n^{s}}$ at even positive integers is given by

[TABLE]

(Y) The Formal Taylor Theorem may be expressed:

[TABLE]

Write $f\circ(u+z)=g_{0}+g_{1}z+g_{2}z^{2}+g_{3}z^{3}+\cdots$ . Let $z=0$ , then $g_{0}=f(u)$ . Differentiate with respect to $z$ : $D_{z}(f\circ(u+z))=g_{1}+2g_{2}z+3g_{3}z^{2}+\cdots$ . Note that $D_{z}(f\circ(u+z))=f_{1}+2f_{2}(u+z)+3f_{3}(u+z)^{2}+\cdots=(D\,f)\circ(u+z)$ . Let $z=0$ , then $g_{1}=(D\,f)\circ u$ . Differentiate again: $D^{2}(f\circ(u+z))=2g_{2}+3!g_{3}z+\cdots$ . Let $z=0$ , then $g_{2}=((D^{2}f)\circ u)/2$ . And so on. The Maclaurin expansion is the special case with $u=0$ , and the Taylor expansion of $x^{n}\circ(u+z)$ is an instance of the binomial theorem. Lipson [55] uses the theorem in the application of Newton’s iterative root-finding algorithm to polynomial equations over sequences (see also [70]).

(Z) The following manipulations, originating with Lagrange, have a captivating charm (even if they lack rigour). We adapt them from [32] to show that elementary sequence algebra plays a role through to the final chapter of that book (where, however, things become more demanding). In the Formal Taylor theorem, let $f$ be a polynomial, $z=1$ , $u=x$ , and employ an operator style:

[TABLE]

Putting $\Delta=E-1=\exp(D)-1$ together with $\Delta\Sigma f=f$ , suggests $\Sigma=(\exp(D)-1)^{-1}$ . Then $B=x/(\exp-1)$ applied to $D$ is $D\Sigma$ , so $\Sigma=D^{-1}B(D)$ . Expanding this, and writing $\int$ for the first term $D^{-1}$ (since $B_{0}=1$ ), gives a “template” version of the Euler-Maclaurin summation formula [32, 9].

[TABLE]

Now introduce limits:

[TABLE]

An application to $x^{2}$ gives yet another derivation [32] of the sum-of-squares formula:

[TABLE]

The way the limits appear on the summation sign has significance: $\textstyle\sum_{0}^{n}\,x^{2}=\displaystyle\sum_{x=0}^{n-1}x^{2}$ . The definite summation symbol follows the pattern of definite integration:

[TABLE]

Let’s add $f(b)$ to both sides of (31) and separate out $B_{1}=-1/2$ :

[TABLE]

The Euler-Maclaurin formula can also be applied to non-polynomial functions. Let us illustrate this, without justification. To compute $\displaystyle\sum_{x=1}^{\infty}1/x^{2}$ , set $S_{9}=\displaystyle\sum_{x=1}^{9}\frac{1}{x^{2}}=1.5397677310$ and then apply (32) to $g=1/(x+10)^{2}$ :

[TABLE]

Applying the formula up to $B_{4}$ gives $\zeta(2)=\pi^{2}/6\approx 1.64493407$ .

4 A programming delight

McIlroy [57, 58], influenced by [45] and others, has gifted us some “tiny gems” of program definitions for implementing sequence manipulations. The definitions are written in Haskell, and are effortlessly derived by mathematical reasoning. A textbook introduction appears in [18]. We want to entice the reader to type up and experiment with the Haskell code (but a down-loadable file is available). The code has been tested in the Haskell GHCi system, and also in the Hugs98 system (an older system, but well-suited to beginners). Both GHCi and Hugs98 are freely available on the web at www.Haskell.org.

In this section we present all of the definitions, thus duplicating some of the contents of [57, 58]; however, there are modifications and additions. The fact that the definitions have no pre-requisites, other than the standard Haskell Prelude, means that one can take “deep” ownership, building things from the ground up. This contrasts to using a sophisticated computer algebra system – something perhaps for the newcomer to move on to with greater appreciation.

Haskell [68] has evolved to be a fairly large and sophisticated language, but we shall stick to a modest subset. It is expected that the reader can comprehend Haskell from examples. The language gives types to objects and variables, and within context, the most general type is used. Haskell’s lists are used to represent sequences (some may prefer to introduce a new type for sequences, but that introduces an overhead which we want to avoid). In Haskell, head-tail decomposition $s_{0}+xs^{\prime}$ becomes s0:s’. Here are some list-processing functions:

take n _ | n<=0     = []
take _ []           = []
take n (s0:s’)      = s0: take (n-1) s’
map f []            = []
map f (s0:s’)       = f s0 : map f s’
iterate f z         = z: iterate f (f z)
foldr f z []        = z
foldr f z (s0:s’)   = f s0 (foldr f z s’)
scanl op q s        = q: (case s of
                            []    -> []
                            s0:s’ -> scanl op (op q s0) s’)
zip (s0:s’) (t0:t’) = (s0, t0): zip s’ t’
zip _ _             = []
zipWith op s t      = [op sn tn | (sn,tn) <- zip s t]

These definitions implement the following functions which feature in the algebra of program calculation [7, 6]:

[TABLE]

The types deduced are

take    :: Int -> [a] -> [a]
map     :: (a -> b) -> [a] -> [b]
iterate :: (a -> a) -> a -> [a]
foldr   :: (a -> b -> b) -> b -> [a] -> b
scanl   :: (a -> b -> a) -> a -> [b] -> [a]
zip     :: [a] -> [b] -> [(a,b)]
zipWith :: (a -> b -> c) -> [a] -> [b] -> [c]

Type expressions are built from type names such as Int, type variables such as a, and type constructors such as -> (which associates to the right). Two more examples of type constructors are: [a] is the type for lists of objects of type a, and [(a,b)] is the type for lists of pairs of objects. Clearly, functions can take functions as arguments, in which case they are called higher-order. Lazy evaluation is used, so that for example, scanl will produce the first element, q, of the result without needing to know anything about its list argument, s. The definition of zipWith illustrates the so-called list comprehension. An alternative definition uses map and uncurry:

uncurry        :: (a -> b -> c) -> (a,b) -> c
uncurry op p   = op (fst p) (snd p)
zipWith op s t = map (uncurry op) (zip s t)

The partner to uncurry is curry f x y = f (x,y). An alternative definition illustrates use of a lambda expression: curry f = \x y -> f (x,y). The reader may like to supply the type. These two functions are so-named because Haskell Curry was an early advocate of the associated equivalence [40]. The Haskell Standard Prelude defines zipWith without using zip, and then defines zip = zipWith (,).

The following specifies that a type a is classified as Num if it has the operations (or methods) listed here:

class Num a where (+), (-), (*) :: a -> a -> a negate, abs, signum :: a -> a fromInteger :: Integer -> a x - y = x + negate y negate x = 0 - x

All of the foregoing definitions are in the Standard Prelude. From here on, the code needs to be supplied. The program file starts with a few specified lines: the first line hides the Prelude definition of cycle because we are going to re-define it for other purposes; the second line says we need rational numbers; the third line gives the order in which to resolve ambiguity in numerical data.

import Prelude hiding (cycle)
import Data.Ratio
default (Integer, Rational, Double)

We start by declaring how sequences, [a], become an instance of Num. A prerequisite is that a is an instance of Eq and Num, indicated by (Eq a, Num a) =>. The definition of (-) is derived from negate.

instance (Eq a, Num a) => Num [a] where
  negate            = map negate
  f+[]              = f
  []+g              = g
  (f0:f’)+(g0:g’)   = f0+g0 : f’ + g’
  []*_              = []
  (0:f’)*g          = 0 : f’*g
  _*[]              = []
  (f0:f’)*g@(g0:g’) = f0*g0 : (f0*|g’ + f’*g)
  fromInteger c     = [fromInteger c]
  abs _             = error "abs not defined on sequences"
  signum _          = error "signum not defined on sequences"

Observe that addition is not defined by f+g = zipWith (+) f g (why?). Convolution product is derived from (1), but there are some things to note. Firstly, if $f_{0}=0$ then the zeroth term of the result is 0 and is delivered immediately. This may be regarded as a controversial quirk, but it enables certain equations to be used directly, as in the following (Catalan) example – in examples, definitions (which are placed in a program file) are interspersed with interactive requests for expression evaluation, indicated by the prompt “> ”.

x :: Num a => [a]
x = [0,1]
b = 1 + x*b^2
> take 8 b
[1,1,2,5,14,42,132,429]

Secondly, the notation g@, is read as “g as”. Thirdly, there are clauses for finite sequences – the empty list behaves here like zero (but note that 0 is embedded as [0]). Fourthly, the term $f_{0}g^{\prime}=[f_{0}]g^{\prime}$ becomes an explicit scalar product using *|, which is defined as an infix operator with precedence 7 (the same as , and higher than +). The definition contains (a), illustrating the creation of a function by partial application of an operator (called sectioning).

infix 7 *|
(*|) :: Num a => a -> [a] -> [a]
a *| f = map (a*) f

A function like map is said to be polymorphic because any type can be assigned to its type variables (subject to consistency). By contrast, scalar multiplication (*|) has a qualified (constrained or parametric) polymorphic type: its type variable can range over only instances of class Num. The type stated could be omitted because it can be inferred due to the presence of *. On the other hand, if the explicit type given to x above was omitted, then Haskell would infer x::[Integer] and this is a monomorphic type which would restrict the use of x. With the qualified polymorphic type, x can appear in an expression where a sequence of elements of type N is expected, as long as N is an instance of Num. Any instance N of Num must provide a fromInteger method that shows how to embed integers into N, so x would be interpreted as [N.fromInteger 0, N.fromInteger 1].

The Num class invites comparison with the specification of the signature of a ring. Likewise, Haskell’s Fractional class may be compared to a ring-with-division, because a (partial) division operator (/), or a multiplicative inverse (recip), is required. Rational numbers form the archetypal instance of Fractional, and any instance F must show how to embed the rationals in F by defining fromRational :: Rational -> F. Division on sequences, $f/g$ , requires calculating the quotient $q$ satisfying $f=qg$ :

[TABLE]

Now we can say, at least approximately, how sequences become a ring-with-division:

instance (Eq a, Fractional a) => Fractional [a] where
  recip f           = 1/f
  _/[]              = error "divide by zero."
  []/_              = []
  (0:f’)/(0:g’)     = f’/g’
  (_:f’)/(0:g’)     = error "divide by zero"
  (f0:f’)/g@(g0:g’) = let q0=f0/g0 in q0:((f’ - q0*|g’)/g)
  fromRational c    = [fromRational c]

These simple definitions confront us with some of the difficulties in coding a satisfactory division operation that works for both finite and infinite sequences. One should investigate questions like: are $f/g=f*(1/g)$ and $f/f=1$ faithfully implemented? To keep things simple, compromises have to be made.

Arithmetic and the convolution product rule are used to calculate a head-tail definition for square root, $\sqrt{f}$ . The starting point is $f=\sqrt{f}\sqrt{f}$ , and we calculate $(\sqrt{f})_{0}$ and $\sqrt{f}^{\prime}$ :

[TABLE]

We shall trivialise $\sqrt{f_{0}}$ and restrict square root to fractional sequences with constant term 1. In the following code, the first clause is suggested by the identity $\sqrt{x^{2}f}=x\sqrt{f}$ , and the more general $\sqrt{x^{2n}f}=x^{n}\sqrt{f}$ is handled by recursion. An alternative definition of square root is derived in [58] by differentiating $r^{2}=f$ , rearranging and then integrating (which the reader may like to try).

sqroot (0:0:f’’) = 0:sqroot f’’
sqroot f@(1:f’)  = 1:(f’/(1+sqroot f))

Sequence composition, $f\circ g=\displaystyle\sum_{n}f_{n}g^{n}$ , is expanded thus:

[TABLE]

When $f$ is infinite, $g_{0}(f^{\prime}\circ g)$ is not computable unless $g_{0}=0$ . When $g_{0}=0$ we get

[TABLE]

However, $f\circ g$ is computable for $g_{0}\neq 0$ when $f$ is finite ( $p\circ[a]=[p(a)]$ , $p$ a polynomial). So we admit a potentially non-terminating clause and it is up to us to use it with care:

[] ‘o‘ _              = []
(f0:f’) ‘o‘ g@(0:g’)  = f0: g’*(f’ ‘o‘ g)
(f0:f’) ‘o‘ g@(g0:g’) = [f0] + (g0*|(f’ ‘o‘ g))+
                         (0:g’*(f’ ‘o‘ g))

The definition $f\circ g=\sum_{n}f_{n}g^{n}$ reveals $x$ to be a left and right identity of composition. Composition distributes leftwards through sum, product, and quotient.

To calculate the converse, $g=f^{\circ}$ , expand the composition $f\circ g=x$ :

[TABLE]

Hence, $g^{\prime}(f^{\prime}\circ g)=1$ , and

[TABLE]

The program code is:

converse(0:f’) = g where g = 0: 1/(f’ ‘o‘ g)

For the reciprocal of $f^{\prime}\circ g$ to be defined, it is necessary that $f^{\prime}_{0}$ is invertible, which entails that $f^{\prime}$ is invertible. The set of such “conversible” sequences forms a group, $({\mathbb{S}}_{C},\circ,()^{\circ},x)$ .

The transforms $\Lambda$ and $\Lambda^{-1}$ are given names e2o and o2e, respectively [18]. Here they are, together with some other useful sequences:

e2o f    = zipWith (*) f facs
o2e f    = zipWith (/) f facs
from     :: Num a=>a->[a]
from     = iterate (+1)
nats, pos, zeros, facs :: Num a=>[a]
nats     = from 0
pos      = from 1
zeros    = 0:zeros
facs     = scanl (*) 1 pos

Differentiation and integration enjoy the appropriately succinct definitions,

deriv f = zipWith (*) pos (tail f)
integ f = 0:zipWith (/) f pos

The definitions $D\ \exp=\exp;\ \exp_{0}=1$ and $x^{*}=1+xx^{*}$ have the following solutions in Haskell. An x is affixed to prevent name clashes with existing names (for example exp in Haskell implements the function $e^{x}$ ). One can test $x^{*}=\Lambda\exp$ by checking the first few terms of their difference.

expx  :: (Eq a,Fractional a) => [a]
expx  = 1 + integ expx
starx :: (Eq a, Num a) => [a]
starx = 1 : starx
> takeW 6 (starx - e2o expx)
[0,0,0,0,0,0]

A rational $a/b$ is presented in Haskell as a%b. The elements of expx are rationals, and e2o removes the factorial divisors, yielding [1%1,1%1, …]. The following defines takeW n which is take n preceded by the conversion of whole rationals into integers (the (.) is function composition, and properFraction is in the Prelude).

makeWhole r  = case properFraction r of
                (n,0)         -> n
                otherwise     -> error "not whole"
makeAllWhole = map makeWhole
takeW n      = take n . makeAllWhole

Table 7 contains further core sequences defined by differential equations. All should be given the type (Eq a, Fractional a) => [a], like expx above. The core sequence $x^{*}$ , which we defined earlier, could be defined by starx = 1+integ (starx^2), since $D\,x^{*}=(x^{*})^{2};\ x^{*}_{0}=1$ (but its elements would then be fractional). For two more examples, let us calculate definitions for $\arctan$ and $\arcsin$ .

[TABLE]

The latter uses the Pythagorean identity, $\sin^{2}+\cos^{2}=1$ , which follows from the defining differential equations. Taking initial values into consideration, the solutions rendered in Haskell are immediate:

[TABLE]

Here are checks of $\exp=(\sec+\tan)\circ\mbox{\rm gd}$ (gd is the Guddermanian function) and $D\,\sin^{\circ}=1/\sqrt{1-x^{2}}$ .

> takeW 6 (expx - ((secx + tanx) ‘o‘ gdx))
[0,0,0,0,0,0]
> takeW 6 (deriv (converse sinx) - (1/(sqroot (1-x^2))))
[0,0,0,0,0,0]

A bivariate sequence, $b(z,u)$ , may be regarded as a (potentially doubly-infinite) matrix, $t=(b_{i,j})$ , of coefficients of $z^{i}u^{j}$ . It is implemented as a univariate sequence, $s$ , of (homogeneous) polynomials such that $s_{n}$ is the diagonal, $[b_{0,n},\ b_{1,n-1},\ldots b_{n,0}]$ of $t$ . Thus, $b_{0,n}z^{0}u^{n}+b_{1,n-1}z^{1}u^{n-1}+\cdots+b_{n,0}z^{n}u^{0}$ is represented by $s_{n}=b_{0,n}x^{0}+b_{1,n-1}x^{1}+\cdots+b_{n,0}x^{n}$ ( $z$ becomes $x$ , and $u$ is redundant). So, $[u^{n-k}z^{k}]b=[x^{k}][x^{n}]s$ ; equivalently $[u^{k}z^{n}]b=[x^{n}][x^{n+k}]s$ . The following depicts the $t$ to $s$ map on a portion of $t$ :

[TABLE]

The ring-with-division and square root operations pertaining to $b(z,u)$ are isomorphically transferred to the diagonal representation $s$ . The representations for $u$ and $z$ are, $u=0u^{0}z^{0}+(1u^{1}z^{0}+0u^{0}z^{1})\cong[0x^{0},1x^{0}+0x^{1}]=[[0],[1,0]]$ and $z=0+(0u+1z)\cong[0,0+1x]=[[0],[0,1]]$ . Here is $(u+z)^{*}$ represented by $s$ =pascal in Haskell,

u,z, pascal :: (Eq a, Num a) => [[a]]
u      = [[0],[1,0]]
z      = [[0],[0,1]]
pascal = starx ‘o‘ (u+z)
> take 6 pascal
[[1],[1,1],[1,2,1],[1,3,3,1],[1,4,6,4,1],[1,5,10,10,5,1]]

This displays $[u^{n-k}z^{k}](u+z)^{*}=[x^{k}][x^{n}]s=\left(\begin{array}[]{c}n\\ k\end{array}\right)$ . Commonly, we want to show a portion of $t$ , where $[x^{k}][x^{n}]t=b_{n,k}=[x^{n}][x^{n+k}]s$ , or perhaps more commonly, such that $[x^{k}][x^{n}]t=n![u^{k}z^{n}]b=n![x^{n}][x^{n+k}]s=[x^{n}]\Lambda[x^{n+k}]s$ . For example, let

[TABLE]

Then $n![u^{k}z^{n}]b=\left(\begin{array}[]{c}n\\ k\end{array}\right)$ . The functions unDiag and unDiage2o transpose the $s$ -representation into the desired $t$ -representation (the reverse of the $t\mapsto s$ map depicted above). The function unDiage2o first removes factorial divisors associated with $z$ . The functions select and selectW take an argument $[n_{0},n_{1},\ldots,n_{m}]$ saying what length to take from rows 0 to $m$ of $t$ . The version selectW converts whole rationals to integers. Bivariate counting sequences, $b_{n,k}$ , are typically zero for $k>n$ , and from these we select a lower triangular section. The schroeder sequence from section 3, item C, is an example which is ordinary in $u$ and $z$ , whilst ebinom below is exponential in $z$ . The sequence powerSums of polynomials for summing powers has a polynomial of order $n+1$ at position $n$ , so a lower trapezium section $[2..4]$ is selected.

list, pluralList :: (Eq a, Num a) => [a]
list       = starx
pluralList = list - x - 1
schroeder  =  z + u*(pluralList ‘o‘ schroeder)
> select [1..6] (unDiag schroeder)
[[0],[1,0],[0,1,0],[0,1,2,0],[0,1,5,5,0],[0,1,9,21,14,0]]
ebinom =  expx ‘o‘ (z+u*z)
> selectW [1..6] (unDiage2o ebinom)
[[1],[1,1],[1,2,1],[1,3,3,1],[1,4,6,4,1],[1,5,10,10,5,1]]
powerSums  = ((expx ‘o‘ (u*z))-1)/((expx ‘o‘ z)-1)
> select [2..4] (unDiage2o powerSums)
[[0 % 1,1 % 1],[0 % 1,(-1) % 2,1 % 2],[0 % 1,1 % 6,(-1) % 2,1 % 3]]

A number of supporting functions are needed. The unDiag function expects its argument to be perfectly triangular, so padTri is first applied to fill out the triangle with zeros if necessary. Then transpose m detaches the heads of the rows of m, which make up the first column, c, and this becomes the first row of the result. A recursive invocation transposes the remaining sub-matrix, m’.

select s t   = zipWith take s t
selectW s t  = zipWith takeW s t
unDiag       :: Num a=> [[a]]->[[a]]
unDiag       = transpose . padTri
unDiage2o    :: Fractional a=> [[a]]->[[a]]
unDiage2o    = unDiag . (map e2o)
padTri t     = zipWith padRight t [1..]
padRight r k = r++(take (k-(length r)) zeros)

transpose []   = []
transpose m    = c:transpose m’
  where (c,m’) = foldr detachHead ([],[]) m
        detachHead   ([r0]) b  = (r0:fst b,snd b)
        detachHead   (r0:r’) b = (r0:fst b,r’:snd b)

There is plenty of room for adding functions to taste. Perhaps the main difficulty is deciding on an effective naming convention. Here are some examples.

takeEBivW r = (selectW r) . unDiage2o
takeEBiv  r = (select r)  . unDiage2o
takeBivW  r = (selectW r) . unDiag
takeBiv   r = (select r)  . unDiag

Differentiation with respect to $z$ (or $u$ ) is performed by dz (or du). Below are the definitions, plus a test based on the set partitions recurrence from section 3, item L. Instead of using allZeros [1..6], the reader may wish to simply use selectW [1..6] and view the zeros (incidentally, one would then observe that the diagonal representation is not always a perfect triangle).

dz s = map deriv (tail s)
du s = map (reverse . deriv . reverse) (tail (padTri s))
set, nonEmptySet, emptySet :: (Eq a, Fractional a) => [a]
set          = expx
emptySet     = 1
nonEmptySet  = set - emptySet
parts        = set ‘o‘ (u*(nonEmptySet ‘o‘ z))

allEq c r    = foldr (\a b-> (a==c) && b) True r
allZeros s t = allEq True (map (allEq 0) (select s t))
> allZeros [1..6] ((dz parts) - (u*parts +u*du parts))
True

5 Exercising the implementation

As it stands, the implementation facilitates a great range of experimentation. We will demonstrate a few concrete examples. It will be seen that the implementation is a valuable assistant in the study of otherwise theoretical material.

The definitions in tables 4 and 5 transliterate into Haskell. Here are some examples (see items B and D). One has to be vigilant about when to use e2o, take, takeW, select, selectW, unDiag, unDiage2o, etc. It is not necessary to give type declarations to all definitions, but it will be necessary for some. We leave that as a trial-and-error exercise.

cycle, perm :: (Eq a, Fractional a) => [a]
perm  = starx
lg g  = lgnx ‘o‘ (g-1)
cycle = lg starx
> takeW 6 (perm - (set ‘o‘ cycle))
[0,0,0,0,0,0]
cayleyTree            = x*(set ‘o‘ cayleyTree)
connectedAcyclicGraph = cayleyTree - cayleyTree^2 / 2
> takeW 8 (e2o connectedAcyclicGraph)
[0,1,1,3,16,125,1296,16807]

infix 7 |^
(|^)      :: (Eq a, Fractional a) => [a] -> a -> [a]
f |^ r    = expx ‘o‘ (r *| lg f)
legendre  = (1-2*u*z+z^2) |^ [-1%2]
> select [1..4] (unDiag legendre)
[[1 % 1],[0 % 1,1 % 1],[(-1) % 2,0 % 1,3 % 2],
[0 % 1,(-3) % 2,0 % 1,5 % 2]]
hermite   = expx ‘o‘ (2*u*z-z^2)
> select [1..4] (unDiage2o hermite)
[[1 % 1],[0 % 1,2 % 1],[(-2) % 1,0 % 1,4 % 1],
[0 % 1,(-12) % 1,0 % 1,8 % 1]]

No attention has been paid to efficiency or prettiness of results – all the computations are expected to work on small examples, resulting in small results. Endless examples could be given related to items A-Z. We must choose only a few, and we aim for variety.

The factorials are defined in the previous section using scanl; below they are generated directly from their differential equation, by a continued fraction recurrence, and by shuffle inverse (see items E and P). Shuffle product can also be defined by $f\otimes g=\Lambda(\Lambda^{-1}f*\Lambda^{-1}g))$ (but that involves rationals, even when $f$ and $g$ are integer sequences).

fac = 1+x*fac + x^2*(deriv fac)
> take 6 fac
[1,1,2,6,24,120]
cf_fac  = 1/(cfdenom 1)
 where cfdenom n = 1 - x*(2*n-1)-x^2*n^2*(1/(cfdenom (n+1)))
> takeW 6 cf_fac
[1,1,2,6,24,120]
infix 7 |><|    -- shuffle product
f@(f0:f’) |><| g@(g0:g’) = (f0 * g0): ((f’ |><| g)+(f |><| g’))
_ |><| []                = []
[] |><| _                = []
shInv f@(f0:f’) = (1/f0): (-f’ |><| ((shInv f) |><| (shInv f)))
> takeW 6 (shInv (1-x))
[1,1,2,6,24,120]

The Newton transform, $({\cal N},{\cal N}^{-1})$ , is an isomorphism between the Hadamard and infiltration rings (see item K). It is implemented here by (h2i, i2h), with a recursive variant, rh2i. We also translate the definitions of $\Delta$ and $\Sigma$ directly to delta and sigma. Later, we shall define another version of $\Sigma$ , named prefixSums, which produces a finite result on a finite sequence. Let’s throw into our test the ubiquitous fibonacci sequence, defined by $f_{n+2}-f_{n+1}-f_{n}=0;\ f_{0}=f_{1}=1$ . The first part can be re-expressed $b(E)f=0$ , where $b=x^{2}-x-1$ , with solution $f=\displaystyle\frac{(\grave{b\,}*[1,1])[0..1]}{\grave{b\,}}=1/(1-x-x^{2})$ (see item S). For illustration, we let Haskell take the last step.

h2i s          = (1/(1+x)) |><| s
i2h s          = (1/(1-x)) |><| s
rh2i s@(s0:s’) = s0: rh2i (delta s)
delta s        = (tail s) - s
sigma s        = x*starx*s
recur          :: (Eq a, Num a) => a -> [a]
recur a        = a:recur a
fib            = (take 2 (rb*[1,1]))/rb where rb = reverse (x^2-x-1)
> takeW 10 (recur 1 + sigma (delta fib))
[1,1,2,3,5,8,13,21,34,55]
> takeW 10  (i2h (h2i fib))
[1,1,2,3,5,8,13,21,34,55]

The Hadamard product, |*|, and the infiltration product, |^| (and infProd) are now introduced, and testing them is left as an exercise.

infix 7 |*|
f |*| g = zipWith (*) f g

infix 7 |^|
f@(f0:f’) |^| g@(g0:g’) = (f0*g0): ((f’|^|g)+(f|^|g’))+(f’|^|g’)
_ |^| []                = []
[] |^| _                = []

infProd f g = h2i (i2h f |*| i2h g)

Translation to and from falling factorial polynomials can be exercised as follows. There are, of course, more efficient ways of generating the data used here (cycles, parts), but we stick to a simple translation of the mathematical definitions. In the first test, we use the fact (item N) that $[0,1,5,14,30,55,\ldots]$ is representable by a polynomial of degree 3. The second test compares falling factorials and cycle numbers.

cycles        = set ‘o‘ (u* (cycle ‘o‘ z))
fall n m      = product [n-i | i<-take m nats]
alt           :: Num a => a->[a]
alt r         = r:alt (-r)
altMat m      = zipWith op (alt 1) m
                where op sign r = zipWith (*) (alt sign) r
monom2FacPoly = takeEBiv [1..] parts
facMonom2Poly = altMat (takeEBiv [1..] cycles)

toFacPoly p   = sum (zipWith (*|) p monom2FacPoly)
fromFacPoly p = sum (zipWith (*|) p facMonom2Poly)
squaresFacPoly= o2e (take 4 (h2i [0,1,5,14,30,55]))
> fromFacPoly squaresFacPoly
[0 % 1,1 % 6,1 % 2,1 % 3]
> [fall x i | i<- [0..5]] == take 6 facMonom2Poly
True

The Maclaurin and Taylor expansions (item Y) can be coded and tested:

maclaurin f = o2e (map head (iterate deriv f))
taylor f    = map o2e (zp (map (‘o‘ u) (iterate deriv f)))
              where zp (g0:g’) = g0 + z* (zp g’)

bsinx, tsinx :: [[Rational]]
bsinx = sinx ‘o‘ (u+z)
tsinx = taylor sinx
> select [1..8] bsinx == select [1..8] tsinx
True

Let us move beyond the A-Z items and look at some other examples. The Logan polynomials [32, sect. 6.5], have the tangent numbers as constant terms. Here they are, defined by a closed expression, $(\sin\circ z+u\cos\circ z)/(\cos\circ z-u\sin\circ z)$ , and by an iteration.

logan = (((sinx ‘o‘ z)+u*(cosx ‘o‘ z))/
        ((cosx ‘o‘ z)-u*(sinx ‘o‘ z)))
> takeEBivW [2..5] logan
[[0,1],[1,0,1],[0,2,0,2],[2,0,8,0,6]]
loganPolys = iterate (\p -> (1+x^2) * deriv p) x
> take 4 loganPolys
[[0,1],[1,0,1],[0,2,0,2],[2,0,8,0,6]]

The Entringer triangle, $E$ [81, 78], has the tangent numbers on the first column (disregarding the first element), and the secant numbers on the diagonal. Below, the triangle is generated first by a backwards and forwards (boustrophedonic) computation of partial sums, then as the diagonal (homogeneous) presentation of coefficients of $A=(\sin\circ u+\cos\circ u)/\cos\circ(u+z))$ (named zigzags in table 5. The bivariate $A$ is exponential in both $u$ and $z$ , and $E_{n,k}=(n-k)!k![u^{n-k}z^{k}]A$ can be shown [32, ex. 6.75]. Forward partial sums are prefix sums. Earlier, in item I, we met $\Sigma s=xx^{*}s$ for computing them, and we used this above to define sigma. But that operator always results in an infinite sequence, even when applied to a finite one. So here we use a different definition.

prefixSums      = scanl (+) 0
suffixSums      = reverse.prefixSums.reverse
alternate f g a = a:alternate g f (f a)
entringer       = alternate suffixSums prefixSums [1]
> take 7 entringer
[[1],[1,0],[0,1,1],[2,2,1,0],[0,2,4,5,5],[16,16,14,10,5,0],
[0,16,32,46,56,61,61]]
zigzags =   (((sinx ‘o‘ u) + (cosx ‘o‘ u))/(cosx ‘o‘ (u+z)))
ue2o   :: Fractional a => [a]->[a]
ue2o   = reverse.e2o.reverse
> selectW [1..7] (map e2o (map ue2o zigzags))
[[1],[1,0],[0,1,1],[2,2,1,0],[0,2,4,5,5],[16,16,14,10,5,0],
[0,16,32,46,56,61,61]]

The Moessner sieve generates the sequence $M_{r}=[1^{r},2^{r},3^{r},4^{r},\ldots]$ , given a positive integer $r$ . Kozen and Silva [52] cite a variety of proofs including sequence-calculational [38] and coinduction methods [63]. Let us see how easily we can implement some of the computations in [52]. There it is shown that the Moessner procedure can be described as the computation of a succession of bivariate sequences, $b_{n}(z,u)$ , usefully represented in diagonal (homogeneous) form, $s_{n}$ , and that $[u^{0}z^{r}]b_{n}=[x^{r}][x^{r}]s_{n}=n^{r}$ . The sequence of triangles begins with Pascal’s triangle, $b_{0}=p=(u+z)^{*}$ , represented in diagonal form by $s_{0}$ . Then the triangle-to-triangle step, $s_{n}\mapsto s_{n+1}$ , is: take the row $\rho=[x^{r}]s_{n}=[\rho_{0},\rho_{1},\ldots,\rho_{r}]$ , representing $h_{n}(z,u)=\rho_{0}u^{r}z^{0}+\rho_{1}u^{r-1}z^{1}+\cdots+\rho_{r}u^{0}z^{r}$ , and compute $s_{n+1}$ representing $b_{n+1}=h_{n}(z,1)*p=(\rho_{0}z^{0}+\rho_{1}z^{1}+\cdots+\rho_{r}z^{r})*p$ . Here is the computation of the first three triangles, followed by the selection of $n^{r}=[x^{r}][x^{r}]s_{n}=$ sn!!r!!r from the first 5 triangles. The function x2z converts $\rho=[x^{r}]s_{n}=\rho_{0}+\rho_{1}x+\cdots+\rho_{r}x^{r}$ to $h_{n}(z,1)$ in bivariate diagonal form (there are many ways of doing this).

x2z rho     = sum (zipWith (\c zn->[[c]]*zn) rho zPowers)
              where zPowers = (iterate (*z) 1)
moessnerT r = iterate (\s -> (x2z (s!!r))*pascal) pascal
> select [5,5,5] (moessnerT 4)
[[[1],[1,1],[1,2,1],[1,3,3,1],[1,4,6,4,1]],
[[1],[1,5],[1,6,11],[1,7,17,15],[1,8,24,32,16]],
[[1],[1,9],[1,10,33],[1,11,43,65],[1,12,54,108,81]]]
nats2Power r = [sn!!r!!r | sn <- moessnerT r]
> take 5 (nats2Power 4)
[1,16,81,256,625]

The function moessnerT is easily changed to one which produces the $n$ -indexed sequence $[x^{r}]s_{n}$ representing $h_{n}(z,1)$ , and this makes way for a generalisation. The iteration step is $[x^{r}]s_{n}\mapsto[x^{r}]s_{n+1}$ , and the iteration starts with $[x^{r}]s_{0}=1$ , representing $h_{0}(z,1)$ .

moessnerH r = iterate (\rho -> ((x2z rho)*pascal)!!r) 1
> select [5,5,5,5] (moessnerH 4)
[[1],[1,4,6,4,1],[1,8,24,32,16],[1,12,54,108,81]]

Kozen and Silva generalise Moessner’s theorem to encompass theorems by Long and Paasche [52]. In the generalised implementation, there are two new parameters, $h_{0}(z,1)$ (regarded as univariate, to be converted to bivariate by x2z), and $d$ , a sequence $[d_{0},d_{1},\ldots]$ of non-negative integers. The iteration step implements the following recurrence, in which the final subscript indicates the selection of the homogeneous polynomial of degree $(\mbox{\rm deg\ }h_{n})+d_{n}$ :

[TABLE]

Thus, rather than a simple iteration, we scan along $d$ , because step $n$ requires $d_{n}$ . Let $c_{n},\ n>0$ , be the leading coefficient of $h_{n}(z,1)$ (which we extract using last, since the highest-order coefficient is at the end). The generalised theorem entails: (a) when $h_{0}(z,1)=1$ and $d=[r,0,0,\ldots]$ we get Moessner’s result, $c_{n}=n^{r}$ ; (b) when $h_{0}(z,1)=b+(a-b)z$ and $d=[r,0,0,\ldots]$ , we get Long’s result, $c_{n}=(a+(n-1)b)n^{r}$ ; and (c) when $h_{0}(z,1)=1$ and $d=[d_{0},d_{1},\ldots]$ , we get Paasche’s result, $c_{n}=\displaystyle\prod_{i=0}^{n-1}(n-i)^{d_{i}}$ . Here is a rather succinct implementation.

ksmlp h0 d = map last (scanl step h0 d)
 where step hn dn = ((x2z hn)*pascal)!!((length hn - 1)+dn)

moessner r = ksmlp 1 (r:zeros)
long a b r = ksmlp [b,a-b] (r:zeros)
paascheFac = ksmlp 1 [1,1..]
superFac   = ksmlp 1 [1,2..]

The above computations convey some Haskell by example, and demonstrate a wealth of experimentation assisted by sequence operations. The core set of definitions are kept to a minimum, so that they are manageable in one file, and should not daunt beginners. In keeping to this principle, we have not implemented an equivalent of formal Laurent series, so we cannot accommodate a sequence for $\cot$ (and $\csc$ , and so on). However, we can define $x\cot=(x\cos)/\sin$ . Then, with reference to item X, let $c(r)=rx(\coth\circ rx)$ , and test $x\coth=c(1/2)\circ(2x)$ , $c(1/2)\circ-x=c(1/2)$ and $c(i)=x\cot$ (Gaussian rationals, introducing $i$ , are in the next section):

xcotx, xcothx :: (Eq a, Fractional a) => [a]
xcotx   = (x*cosx)/sinx
xcothx  = (x*coshx)/sinhx
xcth r  = [r]*(x*(coshx ‘o‘ ([r]*x)))/(sinhx ‘o‘ ([r]*x))
> take 10 xcothx == take 10 ((xcth (1%2)) ‘o‘ (2*x))
True
> take 10 (xcth (1%2)) == take 10 ((xcth (1%2)) ‘o‘ (-x))
True
> take 10 xcotx == take 10 (xcth i)
True

Sometimes there is simply extra work to be done to convert a sequence expression into a form acceptable by our definitions. For example, the following expression for counting permutations by number of valleys is derived in [22]:

[TABLE]

This fails to compute in our implementation for three reasons (can you spot them?). But, by using the double-angle identity for $\tan$ , and $i\tanh=\tan\circ ix$ , it can be manipulated into the following form [20], which does compute:

[TABLE]

valleys = r/(r - (tanhx ‘o‘ (z*r))) where r = sqroot (1-u)
> takeEBivW [1..6] valleys
[[1],[1,0],[2,0,0],[4,2,0,0],[8,16,0,0,0],[16,88,16,0,0,0]]

The question of how to circumscribe a minimal core set of definitions that perhaps manifest a timeless quality, is a challenging one. It seems only too easy to keep adding stuff, as the next section testifies.

6 Building on the implementation

Implementing sequence algebra is an example of mathematics-programming synergy, as found for example, in [62, 55, 80, 74, 65, 87, 82, 18]. One should note the chronology of language use: [62] uses Fortran, [55] uses pseudo-Algol, [80] uses Pascal, [74] uses Standard ML, [87] uses Maple, [82] uses Scheme, and [65, 18] use Haskell. Haskell is one of the most recent and ambitious in the evolution of programming languages. The story of its development [40] is an informative account of collaborative design in a scientific context. It clearly reveals the tensions between the pursuit of elegant tried-and-tested universal concepts, and pragmatically-motivated more complex and speculative features.

One has to face the fact that Haskell presents the casual newcomer with subtleties, some of which cause bafflement. This slightly detracts from our goal, but also means that the implementation of sequence algebra is a fine benchmark test: Haskell ought to host it well for relative beginners. There are two prominent sources of subtleties: lazy evaluation and type classes. The former might be discovered in working with infinite matrices, for example try rewriting transpose. The latter is likely to cause the most frustration. One could write an elucidation of potential “surprises” centred around implementing sequence algebra. That is beyond our scope, but we draw attention to the fact that some type declarations can be omitted, and some not. To take just one example, the final test of the previous section, take 10 xcotx == take 10 (xcth i), does not go through if the type declaration for xcotx is omitted (then the system doesn’t know to translate the rationals in xcotx to Gaussian rationals for comparison). On the other hand, we may omit an explicit type for xcot and use the test

makeReal (r:&0)  = r
makeReal _       = error "not real"
makeAllReal g    = map makeReal g
> take 10 xcotx == makeAllReal (take 10 (xcth i))
True

However, if the g is omitted from the definition of makeAllReal, then makeAllReal is given a different type and the test fails to type-check. Of course, such things have interesting explanations, but they are potentially off-putting for beginners.

These remarks notwithstanding, one cannot resist adding to the implementation in a myriad of ways. Here are a few next-steps, which the author has already taken, and which are left as fruitful exercises.

•

Translate [87], and elements of [80], to use Haskell.

•

Introduce Gaussian rationals as an instance of Num and Fractional, and test De Moivre’s theorem (item A). Here is part of a definition and a test of Euler’s identity:

infix  6  :&
data Gaussian a  = a :& a deriving (Eq, Read, Show)
i    :: (Eq a, Num a) => Gaussian a
i    = 0:&1
ix   :: (Eq a, Num a) => [Gaussian a]
ix = [0,i]

instance  (Num a) => Num (Gaussian a)  where
  -- define negate, +, abs, signum, fromInteger
  (x:&y) * (x’:&y’) =(x*x’-y*y’) :& (x*y’+y*x’)

instance  (Fractional a) => Fractional (Gaussian a)  where
  (x:&y) / (x’:&y’) =  (x*x’+y*y’) / d :& (y*x’-x*y’) / d
                       where d  = x’*x’ + y’*y’
  fromRational a    = fromRational a :& 0

> take 10 (cosx + [i]*sinx) == take 10 (expx ‘o‘ ix)
True

•

Introduce an instance, Shuffle a, of class Num, so that one can write s |><| t as S s * S t, and shuffle power $s^{n_{\otimes}}$ as (S s)^n. Extend the following code, making Shuffle a an instance of Fractional. The first test below illustrates shuffle power. The second test involves the secant numbers, $s=\Lambda\sec$ . These can be defined (see items E and F) by applying $\Lambda$ to the differential equation for $\sec$ to give $s^{\prime}=s\otimes(\Lambda\tan)$ , and since $\sec_{0}=1$ we get the Haskell secNums = 1:secNums |><| tanNums (where tanNums=e2o tanx). Contrast this to the use of S:

newtype Shuffle a = S [a] deriving (Eq, Read, Show)
unS (S s) = s

instance (Eq a, Num a) => Num (Shuffle a) where
  negate (S s)  = S (negate s)
  (S s) + (S t) = S (s+t)
  (S s) * (S t) = S (s |><| t)
  fromInteger n = S [fromInteger n]
  abs _         = error "abs undefined on Shuffle"
  signum _      = error "signum undefined on Shuffle"

> takeW 6 (unS ((S starx)^2))
[1,2,4,8,16,32]
tanNums = e20 tanx
secNums = 1:unS (S secNums * (S tanNums))
> takeW 10 secNums
[1,0,1,0,5,0,61,0,1385,0]

•

Introduce matrix computations. To keep the definitions simple, use the type [[a]] for a matrix, and presume, controversially, that it is used responsibly, in the sense that a matrix is presented as a list of rows of agreed length. Transpose is already defined in our implementation (written to work also for infinite matrices). Operations to define include determinant, characteristic polynomial, adjugate, Gaussian elimination, and different methods of inversion. Then one can test computations in proofs of the Cayley-Hamilton theorem, and experiment with bivariate Lagrange inversion (using $2\times 2$ Jacobians).

•

Sequences of sequences can become confused with matrices, so it is instructive to define:

data Matrix a    = M [[a]] | D a deriving (Eq, Read, Show)

instance (Eq a, Num a) => Num (Matrix a) where
  negate (M m)   = M (map (map negate) m)
  negate (D r)   = D (negate r)
  (M a) + (M b)  = M ...
  ... clauses for + and *

  fromInteger n = D (fromInteger n)

The idea is that if s is a square matrix of type [[a]], then we can have M s. Definitions of addition, M s + M t, and multiplication, M s * M t, can (with dereliction of duty) assume that s and t are square of the same dimension. The element D r stands for the square diagonal matrix (of any dimension) with r along the diagonal. The instance definitions of addition and multiplication each require four clauses (MM, DM, MD, DD), negate has two clauses (M, D):

•

Rewrite part III of [55] to use Haskell, making good use of classes and instances to reflect the algebraic structure. At one level this can be approached as a program translation exercise, and is rewarding in demonstrating Haskell to be a good host language. At other levels it invites study of a good bit of theory (Euclidean domains, finite fields, Chinese Remainder Theorem, interpolation, homomorphic image schemes, Fast Fourier Transform, and Newton’s algorithm applied to power series).

Further to these tried-and-tested steps, there is, of course, unlimited scope for add-ons. Related software can be found in the Hackage repository of the Haskell website (www.Haskell.org).

7 Concluding remarks

It is clear that sequence algebra serves calculus: many sequence identities foretell relationships between analytic functions; it serves combinatorics: many counting sequences for discrete structures can be derived by sequence algebra; and it serves computation: it expresses the behaviour of certain kinds of automata; it leads to interpolation methods and summation formulae, and supports program calculation. The theory could hardly be more foundational, and constructing an implementation from scratch emphasises its concreteness, and has the potential to reinforce understanding.

We have exercised the implementation on examples from [32, 25], demonstrating that it makes a valuable companion to those texts. It could be applied to other texts, for example [15, 31, 4, 79, 77]. It can also serve as a centre-piece in a course on functional programming in mathematics. And, indeed, the experience of typing up and experimenting with the code, confronts one with intriguing issues in programming language design. There is zero-testing on sequence elements, which could be used to open a discussion on computability.

The on-line encyclopaedia of integer sequences [75] has hundreds of thousands of sequences. The sequences we have mentioned can be found using the OEIS search facility. It will be noticed that many of the sequences are accompanied by generating code written in various languages, including Haskell. One may like to investigate how many OEIS entries can be expressed in the “language” of tables 4 and 5. A Haskell interface to the OEIS is reported in [88].

Needless to say, to elaborate the topic more fully, with proof details and examples, one needs a book-sized exposition (draft portions of a book may be requested from the author). Beyond that, the obvious question is how to make a seamless progression. A few programming-oriented suggestions are in section 6. On the theory side, we must acknowledge that sequence algebra is so low in the mathematical hierarchy, that it doesn’t determine a narrow range of follow-up topics. Nevertheless, we mention a few. One is the classification of sequences, taking a lead from [33] and [76, Ch. 6]. Related to this is the computer algebra work done under the heading “generating functions” or “holonomic functions” [35, 46]. It remains to construct a bridge from the elementary level of the present paper to the use of a computer algebra package.

Established results on differential equations, including computer-algebraic, may be revisited with an eye to drawing out those which become particularly accessible when specialised to sequences. One suggestion is to bring the method of characteristics as used, for example in [22], into common parlance for sequence work. Another is to find a smooth passage from the level of the present paper to results obtained using the language of Species, for example those in [4, 70] (an introduction to Species for Haskell programmers is [89]).

Various multivariate directions beckon, including formal languages [5, 3] and multivariate Lagrange inversion [30]. We have also arrived at the threshold of analysis but we have not crossed it, except for bringing $\pi$ into item X. It is natural to ask whether fluency in infinite sequences, as promoted here, has any bearing on how students approach Cauchy sequences and analytic functions. Related to this is the progression from chapter 1 to chapter 2 in [36] (and chapter VII of [19]). On another tack, one may use sequence algebra to motivate abstract algebra. For example, Eilenberg [19, ch. XVI, sect. 10] gives a proof of the Cayley Hamilton theorem using module concepts, and module concepts are used in [26, 27, 28] – papers whose titles echo [48, 49], but which involve a quantum-leap in mathematical sophistication. As a final remark, we note that the eponymous Haskell B. Curry, also abstracted from concrete operations on formal power series [16].

Acknowledgements

This work originated (some years ago) when I was an occasional visitor at the University of York. I am greatly indebted to Colin Runciman for providing that opportunity, and to Colin, Jeremy Jacob, and Detlef Plump for encouragement. Special thanks are due to Daniel Siemssen, Patrik Jansson, Tim Sears and Peter Thiemann for comments on work related to this paper. (Also, if you are an anonymous JFP reviewer of an earlier related paper, then my thanks to you too!) Tim Sears has placed a version of the Haskell code on www.GitHub.com (under TimSears/SequenceAlgebra).

Bibliography90

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] M. Aigner and G.M. Ziegler. Proofs from THE BOOK, 3rd edn. Springer, 2004.
2[2] W. Basler. Formal Power Series and Linear Systems of Meromorphic Differential Equations . Springer, 2000.
3[3] H Basold, H Hansen, J-É Pin, and J Rutten. Newton series, coinductively: a comparative study of composition. Mathematical Structures in Computer Science , pages 1–29, 2017.
4[4] F. Bergeron, G. Labelle, and P. Leroux. Combinatorial species and tree-like structures , volume 67 of Encyclopaedia of Mathematics . Cambridge University Press, 1998. Translated from 1994 original in French.
5[5] J. Berstel and C. Reutenauer. Rational Series and their Languages , volume 12 of EATCS Monographs on Theoretical Computer Science . Springer Verlag, 1988.
6[6] R. Bird and O. de Moor. Algebra of Programming . Series in Computer Science. Prentice Hall International, 1997.
7[7] R.S. Bird. Algebraic identities for program calculation. Computer Journal , 32(2):122–126, 1989.
8[8] L. Brand. A Division Algebra for Sequences and Its Associated Operational Calculus. The American Mathematical Monthly , 71(7):719–728, 1964.