
TL;DR
This paper investigates small 3-manifolds from the SnapPy census, comparing their properties with those of random 3-manifolds generated by established models to understand their similarities and differences.
Contribution
It provides a comparative analysis of small 3-manifolds in the census with random models, offering insights into their geometric and topological characteristics.
Findings
Differences between census manifolds and random models identified.
Statistical properties of small 3-manifolds characterized.
Insights into the structure of small 3-manifolds gained.
Abstract
In this paper we study the manifolds in the census of "small" 3-manifolds as available in SnapPy. We compare our results with the statistics of random 3-manifolds obtained using the Dunfield Thurston and Rivin models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGeometric and Algebraic Topology · Topological and Geometric Data Analysis · Data Management and Algorithms
Experiments with the census
Igor Rivin
Mathematics Department, Temple University
Abstract.
In this paper we study the manifolds in the census of “small” 3-manifolds as available in SnapPy. We compare our results with the statistics of random 3-manifolds obtained using the Dunfield Thurston and Rivin models.
2010 Mathematics Subject Classification:
57-04;57M50
This paper was written when the author was invited to present a talk at the CUNY Graduate Center Seminar on Group Theory. The author would like to thank the organizers for their hospitality.
1. Introduction
In past work ([Riv14]) we have studied the statistics of random manifolds fibering over the circle, using a model (the “Rivin Model”) similar in spirit to that used by N. Dunfield and W.P.Thurston in [DT06]. The distributions of manifolds produced by both models is clearly skewed (the manifolds tend to be "long and skinny".) Many have argued that the "right" model is that of randomly gluing tetrahedra together, then throwing out those gluings that are not manifolds. Unfortunately, this is not at all probabilistically tractable, so we do the next best thing and look at all manifolds possessing small triangulations, thanks to the census of such manifolds built into SnapPy - [CDGW]. This paper started life as a Jupyter notebook.
2. Preliminaries
First, we import the usual (and some unusual) libraries:
\KV@do
,commandchars=
{},,
{\color{incolor}In [{\color{incolor}1}]:} \PY{k+kn}{from} \PY{n+nn}{snappy} \PY{k}{import} \PY{o}{*} \PY{k+kn}{from} \PY{n+nn}{multiprocessing} \PY{k}{import} \PY{n}{Pool} \PY{k+kn}{import} \PY{n+nn}{pandas} \PY{k}{as} \PY{n+nn}{pd} \PY{k+kn}{import} \PY{n+nn}{numpy} \PY{k}{as} \PY{n+nn}{np} \PY{k+kn}{import} \PY{n+nn}{functools} \PY{k+kn}{from} \PY{n+nn}{operator} \PY{k}{import} \PY{n}{mul} \PY{k+kn}{import} \PY{n+nn}{xgboost} \PY{k}{as} \PY{n+nn}{xg} \PY{k+kn}{import} \PY{n+nn}{matplotlib}\PY{n+nn}{.}\PY{n+nn}{pyplot} \PY{k}{as} \PY{n+nn}{plt} \PY{k+kn}{import} \PY{n+nn}{seaborn} \PY{k}{as} \PY{n+nn}{sns} \PY{k+kn}{from} \PY{n+nn}{sklearn}\PY{n+nn}{.}\PY{n+nn}{model\PYZus{}selection} \PY{k}{import} \PY{n}{train\PYZus{}test\PYZus{}split} \PY{k+kn}{from} \PY{n+nn}{sklearn}\PY{n+nn}{.}\PY{n+nn}{model\PYZus{}selection} \PY{k}{import} \PY{n}{GridSearchCV} \PY{k+kn}{from} \PY{n+nn}{sklearn}\PY{n+nn}{.}\PY{n+nn}{cluster} \PY{k}{import} \PY{n}{KMeans} \PY{k+kn}{from} \PY{n+nn}{sklearn}\PY{n+nn}{.}\PY{n+nn}{manifold} \PY{k}{import} \PY{n}{TSNE}
Now we define some utility functions to deal with SnapPy’s goofy formats.
First, the length spectrum
\KV@do
,commandchars=
{},,
{\color{incolor}In [{\color{incolor}2}]:} \PY{k}{def} \PY{n+nf}{mung\PYZus{}spec}\PY{p}{(}\PY{n}{specline}\PY{p}{)}\PY{p}{:} \PY{n}{mult} \PY{o}{=} \PY{n}{specline}\PY{p}{[}\PY{l+s+s1}{\PYZsq{}}\PY{l+s+s1}{multiplicity}\PY{l+s+s1}{\PYZsq{}}\PY{p}{]} \PY{n}{thelen} \PY{o}{=} \PY{n}{specline}\PY{p}{[}\PY{l+s+s1}{\PYZsq{}}\PY{l+s+s1}{length}\PY{l+s+s1}{\PYZsq{}}\PY{p}{]} \PY{k}{return} \PY{p}{[}\PY{n}{thelen}\PY{p}{]}\PY{o}{*}\PY{n}{mult}
\KV@do
,commandchars=
{},,
{\color{incolor}In [{\color{incolor}3}]:} \PY{k}{def} \PY{n+nf}{mung\PYZus{}all\PYZus{}spec}\PY{p}{(}\PY{n}{thespec}\PY{p}{)}\PY{p}{:} \PY{n}{thelens} \PY{o}{=} \PY{p}{[}\PY{n}{mung\PYZus{}spec}\PY{p}{(}\PY{n}{i}\PY{p}{)}\PY{k}{for} \PY{n}{i} \PY{o+ow}{in} \PY{n}{thespec}\PY{p}{]} \PY{k}{return} \PY{n+nb}{sum}\PY{p}{(}\PY{n}{thelens}\PY{p}{,} \PY{p}{[}\PY{p}{]}\PY{p}{)}
\KV@do
,commandchars=
{},,
{\color{incolor}In [{\color{incolor}4}]:} \PY{k}{def} \PY{n+nf}{mung\PYZus{}complexes}\PY{p}{(}\PY{n}{clist}\PY{p}{)}\PY{p}{:} \PY{n}{cs} \PY{o}{=} \PY{p}{[}\PY{p}{[}\PY{n+nb}{float}\PY{p}{(}\PY{n}{x}\PY{o}{.}\PY{n}{real}\PY{p}{(}\PY{p}{)}\PY{p}{)}\PY{p}{,} \PY{n+nb}{float}\PY{p}{(}\PY{n}{x}\PY{o}{.}\PY{n}{imag}\PY{p}{(}\PY{p}{)}\PY{p}{)}\PY{p}{]} \PY{k}{for} \PY{n}{x} \PY{o+ow}{in} \PY{n}{clist}\PY{p}{]} \PY{k}{return} \PY{n+nb}{sum}\PY{p}{(}\PY{n}{cs}\PY{p}{,} \PY{p}{[}\PY{p}{]}\PY{p}{)}
Now, put all about the manifold in one line. The function returns None if the Dirichlet domain can not be constructed, so the manifold is not hyperbolic:
\KV@do
,commandchars=
{},,
{\color{incolor}In [{\color{incolor}5}]:} \PY{k}{def} \PY{n+nf}{getline}\PY{p}{(}\PY{n}{m}\PY{p}{,} \PY{n}{cutoff}\PY{o}{=}\PY{l+m+mf}{3.0}\PY{p}{,} \PY{n}{numcurves}\PY{o}{=}\PY{l+m+mi}{10}\PY{p}{)}\PY{p}{:} \PY{k}{try}\PY{p}{:} \PY{n}{namelist} \PY{o}{=} \PY{p}{[}\PY{n}{m}\PY{o}{.}\PY{n}{name}\PY{p}{(}\PY{p}{)}\PY{p}{]} \PY{n}{thespec} \PY{o}{=} \PY{n}{m}\PY{o}{.}\PY{n}{length\PYZus{}spectrum}\PY{p}{(}\PY{n}{cutoff}\PY{o}{=}\PY{n}{cutoff}\PY{p}{)} \PY{n}{lenlist} \PY{o}{=} \PY{n}{mung\PYZus{}complexes}\PY{p}{(}\PY{n}{mung\PYZus{}all\PYZus{}spec}\PY{p}{(}\PY{n}{thespec}\PY{p}{)}\PY{p}{)} \PY{n}{vollist} \PY{o}{=} \PY{p}{[}\PY{n+nb}{float}\PY{p}{(}\PY{n}{m}\PY{o}{.}\PY{n}{volume}\PY{p}{(}\PY{p}{)}\PY{p}{)}\PY{p}{]} \PY{n}{homo} \PY{o}{=} \PY{n}{m}\PY{o}{.}\PY{n}{homology}\PY{p}{(}\PY{p}{)} \PY{n}{qr} \PY{o}{=} \PY{n+nb}{int}\PY{p}{(}\PY{n}{homo}\PY{o}{.}\PY{n}{betti\PYZus{}number}\PY{p}{(}\PY{p}{)}\PY{p}{)} \PY{n}{ed} \PY{o}{=} \PY{p}{[}\PY{n+nb}{int}\PY{p}{(}\PY{n}{i}\PY{p}{)} \PY{k}{for} \PY{n}{i} \PY{o+ow}{in} \PY{n}{homo}\PY{o}{.}\PY{n}{elementary\PYZus{}divisors}\PY{p}{(}\PY{p}{)} \PY{k}{if} \PY{n}{i} \PY{o}{\PYZgt{}} \PY{l+m+mi}{0}\PY{p}{]} \PY{n}{torsion} \PY{o}{=} \PY{n}{functools}\PY{o}{.}\PY{n}{reduce}\PY{p}{(}\PY{n}{mul}\PY{p}{,} \PY{n}{ed}\PY{p}{,} \PY{l+m+mi}{1}\PY{p}{)} \PY{n}{res} \PY{o}{=} \PY{n}{namelist} \PY{o}{+} \PY{n}{lenlist}\PY{p}{[}\PY{p}{:}\PY{n}{numcurves}\PY{p}{]} \PY{o}{+} \PY{n}{vollist} \PY{o}{+} \PY{p}{[}\PY{n}{qr}\PY{p}{,} \PY{n}{torsion}\PY{p}{]} \PY{k}{except}\PY{p}{:} \PY{n}{res} \PY{o}{=} \PY{k+kc}{None} \PY{k}{return} \PY{n}{res}
For some reason, it’s faster to first read in a list, then iterate over it:
\KV@do
,commandchars=
{},,
{\color{incolor}In [{\color{incolor}6}]:} \PY{n}{zoo}\PY{o}{=} \PY{n+nb}{list}\PY{p}{(}\PY{n}{OrientableClosedCensus}\PY{p}{)}
Now, read everything in:
\KV@do
,commandchars=
{},,
{\color{incolor}In [{\color{incolor}7}]:} \PY{n}{thelines} \PY{o}{=} \PY{p}{[}\PY{n}{getline}\PY{p}{(}\PY{n}{i}\PY{p}{)} \PY{k}{for} \PY{n}{i} \PY{o+ow}{in} \PY{n}{zoo}\PY{p}{]}
\KV@do
,commandchars=
{},,
{\color{incolor}In [{\color{incolor}12}]:} \PY{n}{thelines} \PY{o}{=} \PY{p}{[}\PY{n}{i} \PY{k}{for} \PY{n}{i} \PY{o+ow}{in} \PY{n}{thelines} \PY{k}{if} \PY{n}{i} \PY{o+ow}{is} \PY{o+ow}{not} \PY{k+kc}{None}\PY{p}{]}
\KV@do
,commandchars=
{},,
{\color{incolor}In [{\color{incolor}13}]:} \PY{n}{linedf} \PY{o}{=} \PY{n}{pd}\PY{o}{.}\PY{n}{DataFrame}\PY{o}{.}\PY{n}{from\PYZus{}records}\PY{p}{(}\PY{n}{thelines}\PY{p}{,} \PY{n}{columns} \PY{o}{=} \PY{p}{[}\PY{l+s+s1}{\PYZsq{}}\PY{l+s+s1}{name}\PY{l+s+s1}{\PYZsq{}}\PY{p}{,} \PY{l+s+s1}{\PYZsq{}}\PY{l+s+s1}{a}\PY{l+s+s1}{\PYZsq{}}\PY{p}{,} \PY{l+s+s1}{\PYZsq{}}\PY{l+s+s1}{A}\PY{l+s+s1}{\PYZsq{}}\PY{p}{,} \PY{l+s+s1}{\PYZsq{}}\PY{l+s+s1}{b}\PY{l+s+s1}{\PYZsq{}}\PY{p}{,} \PY{l+s+s1}{\PYZsq{}}\PY{l+s+s1}{B}\PY{l+s+s1}{\PYZsq{}}\PY{p}{,} \PY{l+s+s1}{\PYZsq{}}\PY{l+s+s1}{c}\PY{l+s+s1}{\PYZsq{}}\PY{p}{,} \PY{l+s+s1}{\PYZsq{}}\PY{l+s+s1}{C}\PY{l+s+s1}{\PYZsq{}}\PY{p}{,} \PY{l+s+s1}{\PYZsq{}}\PY{l+s+s1}{d}\PY{l+s+s1}{\PYZsq{}}\PY{p}{,} \PY{l+s+s1}{\PYZsq{}}\PY{l+s+s1}{D}\PY{l+s+s1}{\PYZsq{}}\PY{p}{,} \PY{l+s+s1}{\PYZsq{}}\PY{l+s+s1}{e}\PY{l+s+s1}{\PYZsq{}}\PY{p}{,} \PY{l+s+s1}{\PYZsq{}}\PY{l+s+s1}{E}\PY{l+s+s1}{\PYZsq{}}\PY{p}{,} \PY{l+s+s1}{\PYZsq{}}\PY{l+s+s1}{volume}\PY{l+s+s1}{\PYZsq{}}\PY{p}{,} \PY{l+s+s1}{\PYZsq{}}\PY{l+s+s1}{betti}\PY{l+s+s1}{\PYZsq{}}\PY{p}{,}\PY{l+s+s1}{\PYZsq{}}\PY{l+s+s1}{torsion}\PY{l+s+s1}{\PYZsq{}}\PY{p}{]}\PY{p}{)}
\KV@do
,commandchars=
{},,
{\color{incolor}In [{\color{incolor}14}]:} \PY{n+nb}{len}\PY{p}{(}\PY{n}{zoo}\PY{p}{)}
\KV@do
,commandchars=
{},,
{\color{outcolor}Out[{\color{outcolor}14}]:} 11031
\KV@do
,commandchars=
{},,
{\color{incolor}In [{\color{incolor}15}]:} \PY{n+nb}{len}\PY{p}{(}\PY{n}{thelines}\PY{p}{)}
\KV@do
,commandchars=
{},,
{\color{outcolor}Out[{\color{outcolor}15}]:} 10963
3. First Results
We see that out of the 11031 manifolds, 68 are not hyperbolic.
The lower case letters columns are of the shortest (5) geodesics, and the capital letters are of the twisting.
How about the homology, how is that distributed?
\KV@do
,commandchars=
{},,
{\color{incolor}In [{\color{incolor}17}]:} \PY{n}{linedf}\PY{o}{.}\PY{n}{betti}\PY{o}{.}\PY{n}{value\PYZus{}counts}\PY{p}{(}\PY{p}{)}
\KV@do
,commandchars=
{},,
{\color{outcolor}Out[{\color{outcolor}17}]:} 0 10836 1 126 2 1 Name: betti, dtype: int64
We see that 10836 out of the 10963 hyperbolic manifolds (or close to 99%) are rational homology spheres. This is consistent with the Dunfield-Thurston model, and also with random fibered manifolds having with probability approaching
There are a number of results (notably by Culler and Shalen with co-authors) on the influence of Betti numbers on volume. Let’s see what we see here:
\KV@do
,commandchars=
{},,
{\color{incolor}In [{\color{incolor}18}]:} \PY{n}{linedf}\PY{o}{.}\PY{n}{boxplot}\PY{p}{(}\PY{n}{column}\PY{o}{=}\PY{p}{[}\PY{l+s+s1}{\PYZsq{}}\PY{l+s+s1}{volume}\PY{l+s+s1}{\PYZsq{}}\PY{p}{]}\PY{p}{,} \PY{n}{by}\PY{o}{=}\PY{l+s+s1}{\PYZsq{}}\PY{l+s+s1}{betti}\PY{l+s+s1}{\PYZsq{}}\PY{p}{)}
\KV@do
,commandchars=
{},,
{\color{outcolor}Out[{\color{outcolor}18}]:} <matplotlib.axes._subplots.AxesSubplot at 0x1d270e45a90>
Of course, the number of samples is small in the higher Betti numbers groups, but it is interesting that the volume decreases as the Betti number increases.
What about torsion?
\KV@do
,commandchars=
{},,
{\color{incolor}In [{\color{incolor}19}]:} \PY{n}{linedf}\PY{o}{.}\PY{n}{torsion}\PY{o}{.}\PY{n}{hist}\PY{p}{(}\PY{n}{bins}\PY{o}{=}\PY{l+m+mi}{30}\PY{p}{)}
\KV@do
,commandchars=
{},,
{\color{outcolor}Out[{\color{outcolor}19}]:} <matplotlib.axes._subplots.AxesSubplot at 0x1d2713358d0>
We notice that the highest values of torsion are quite large (given how small our manifolds are). Let’s see how torsion and volume are related.
\KV@do
,commandchars=
{},,
{\color{incolor}In [{\color{incolor}20}]:} \PY{n}{sns}\PY{o}{.}\PY{n}{jointplot}\PY{p}{(}\PY{n}{x}\PY{o}{=}\PY{l+s+s1}{\PYZsq{}}\PY{l+s+s1}{torsion}\PY{l+s+s1}{\PYZsq{}}\PY{p}{,} \PY{n}{y}\PY{o}{=}\PY{l+s+s1}{\PYZsq{}}\PY{l+s+s1}{volume}\PY{l+s+s1}{\PYZsq{}}\PY{p}{,} \PY{n}{data}\PY{o}{=}\PY{n}{linedf}\PY{p}{)}
\KV@do
,commandchars=
{},,
{\color{outcolor}Out[{\color{outcolor}20}]:} <seaborn.axisgrid.JointGrid at 0x1d271397e10>
We see that high torsion leads to large volume, though not the other way around. This is different from the Dunfield-Thurston and random fibered models, where both volume and log of torsion grow linearly with complexity. Speaking of log torsion, let’s take a look.
\KV@do
,commandchars=
{},,
{\color{incolor}In [{\color{incolor}21}]:} \PY{n}{linedf}\PY{p}{[}\PY{l+s+s1}{\PYZsq{}}\PY{l+s+s1}{logtor}\PY{l+s+s1}{\PYZsq{}}\PY{p}{]}\PY{o}{=}\PY{n}{np}\PY{o}{.}\PY{n}{log}\PY{p}{(}\PY{n}{linedf}\PY{o}{.}\PY{n}{torsion}\PY{p}{)}
\KV@do
,commandchars=
{},,
{\color{incolor}In [{\color{incolor}22}]:} \PY{n}{sns}\PY{o}{.}\PY{n}{jointplot}\PY{p}{(}\PY{n}{x}\PY{o}{=}\PY{l+s+s1}{\PYZsq{}}\PY{l+s+s1}{logtor}\PY{l+s+s1}{\PYZsq{}}\PY{p}{,} \PY{n}{y}\PY{o}{=}\PY{l+s+s1}{\PYZsq{}}\PY{l+s+s1}{volume}\PY{l+s+s1}{\PYZsq{}}\PY{p}{,} \PY{n}{data} \PY{o}{=} \PY{n}{linedf}\PY{p}{)}
\KV@do
,commandchars=
{},,
{\color{outcolor}Out[{\color{outcolor}22}]:} <seaborn.axisgrid.JointGrid at 0x1d270e45828>
We see that both volume and log torsion are potentially tending to a gaussian distribution. We also see the linear density cutoff in the center graph, showing that the random models do have something going for them. The graph also seems to indicate that for some
4. Pairwise relationships
\KV@do
,commandchars=
{},,
{\color{incolor}In [{\color{incolor}24}]:} \PY{n}{sns}\PY{o}{.}\PY{n}{pairplot}\PY{p}{(}\PY{n}{linedf}\PY{p}{)}
\KV@do
,commandchars=
{},,
{\color{outcolor}Out[{\color{outcolor}24}]:} <seaborn.axisgrid.PairGrid at 0x1d2727ee908>
Above, on the diagonal we have the histograms of the various columns and the off-diagonal cells are the scatter plots of columns against one another. We see many interesting phenomena.
Notice that the imaginary parts of the complex lengths (the rotations) are NOT equidistributed - they are well-separated from 0 (and )
- they seem to become somewhat less so for longer geodesics.
The length of the systole (shortest geodesic) is exponentially distributed, while 3. 3.
The lengths of the -th shortest geodesics become more and more normally distributed as becomes large. 4. 4.
Both the real and the imaginary parts of the lengths seem to be uncorrelated (aside from the obvious relation of ordering on the real parts). 5. 5.
Log of torsion seems more-or-less normally distributed.
The scatter graphs of volume vs the other observables are also interesting. Let’s see if there is any volume between volume and the systole:
\KV@do
,commandchars=
{},,
{\color{incolor}In [{\color{incolor}25}]:} \PY{n}{sns}\PY{o}{.}\PY{n}{regplot}\PY{p}{(}\PY{n}{x}\PY{o}{=}\PY{l+s+s1}{\PYZsq{}}\PY{l+s+s1}{a}\PY{l+s+s1}{\PYZsq{}}\PY{p}{,} \PY{n}{y}\PY{o}{=}\PY{l+s+s1}{\PYZsq{}}\PY{l+s+s1}{volume}\PY{l+s+s1}{\PYZsq{}}\PY{p}{,} \PY{n}{data}\PY{o}{=}\PY{n}{linedf}\PY{p}{)}
\KV@do
,commandchars=
{},,
{\color{outcolor}Out[{\color{outcolor}25}]:} <matplotlib.axes._subplots.AxesSubplot at 0x1d27af28710>
While knowing that the systole is short gives us little information on the volume, knowing that it is long tells us that the volume is small, which is a bit counter-intuitive, at least to this author. Also, it is pretty that an inequality of the vorm where is the systole length, and holds.
What about other geodesics?
\KV@do
,commandchars=
{},,
{\color{incolor}In [{\color{incolor}26}]:} \PY{n}{sns}\PY{o}{.}\PY{n}{regplot}\PY{p}{(}\PY{n}{x}\PY{o}{=}\PY{l+s+s1}{\PYZsq{}}\PY{l+s+s1}{c}\PY{l+s+s1}{\PYZsq{}}\PY{p}{,} \PY{n}{y}\PY{o}{=}\PY{l+s+s1}{\PYZsq{}}\PY{l+s+s1}{volume}\PY{l+s+s1}{\PYZsq{}}\PY{p}{,} \PY{n}{data}\PY{o}{=}\PY{n}{linedf}\PY{p}{)}
\KV@do
,commandchars=
{},,
{\color{outcolor}Out[{\color{outcolor}26}]:} <matplotlib.axes._subplots.AxesSubplot at 0x1d27e1cff98>
The second longest curve has some slight (but positive) predictive power.
\KV@do
,commandchars=
{},,
{\color{incolor}In [{\color{incolor}27}]:} \PY{n}{sns}\PY{o}{.}\PY{n}{regplot}\PY{p}{(}\PY{n}{x}\PY{o}{=}\PY{l+s+s1}{\PYZsq{}}\PY{l+s+s1}{d}\PY{l+s+s1}{\PYZsq{}}\PY{p}{,} \PY{n}{y}\PY{o}{=}\PY{l+s+s1}{\PYZsq{}}\PY{l+s+s1}{volume}\PY{l+s+s1}{\PYZsq{}}\PY{p}{,} \PY{n}{data}\PY{o}{=}\PY{n}{linedf}\PY{p}{)}
\KV@do
,commandchars=
{},,
{\color{outcolor}Out[{\color{outcolor}27}]:} <matplotlib.axes._subplots.AxesSubplot at 0x1d27e1d1b70>
Same with the third…
5. The space of 3-manifolds
The question is now, whether we can deduce any of the observables if we know the others. For example, can we predict the volume from the length spectrum?
There is an existence proof, and a construction. For existence, let’s see if there are approximate linear relationships between our many fields (let’s drop the homological invariants for now):
\KV@do
,commandchars=
{},,
{\color{incolor}In [{\color{incolor}28}]:} \PY{n}{linedfmin} \PY{o}{=} \PY{n}{linedf}\PY{p}{[}\PY{p}{[}\PY{l+s+s1}{\PYZsq{}}\PY{l+s+s1}{a}\PY{l+s+s1}{\PYZsq{}}\PY{p}{,} \PY{l+s+s1}{\PYZsq{}}\PY{l+s+s1}{A}\PY{l+s+s1}{\PYZsq{}}\PY{p}{,} \PY{l+s+s1}{\PYZsq{}}\PY{l+s+s1}{b}\PY{l+s+s1}{\PYZsq{}}\PY{p}{,} \PY{l+s+s1}{\PYZsq{}}\PY{l+s+s1}{B}\PY{l+s+s1}{\PYZsq{}}\PY{p}{,} \PY{l+s+s1}{\PYZsq{}}\PY{l+s+s1}{c}\PY{l+s+s1}{\PYZsq{}}\PY{p}{,} \PY{l+s+s1}{\PYZsq{}}\PY{l+s+s1}{C}\PY{l+s+s1}{\PYZsq{}}\PY{p}{,} \PY{l+s+s1}{\PYZsq{}}\PY{l+s+s1}{d}\PY{l+s+s1}{\PYZsq{}}\PY{p}{,} \PY{l+s+s1}{\PYZsq{}}\PY{l+s+s1}{D}\PY{l+s+s1}{\PYZsq{}}\PY{p}{,} \PY{l+s+s1}{\PYZsq{}}\PY{l+s+s1}{e}\PY{l+s+s1}{\PYZsq{}}\PY{p}{,} \PY{l+s+s1}{\PYZsq{}}\PY{l+s+s1}{E}\PY{l+s+s1}{\PYZsq{}}\PY{p}{,} \PY{l+s+s1}{\PYZsq{}}\PY{l+s+s1}{volume}\PY{l+s+s1}{\PYZsq{}}\PY{p}{]}\PY{p}{]}
\KV@do
,commandchars=
{},,
{\color{incolor}In [{\color{incolor}29}]:} \PY{n}{u}\PY{p}{,} \PY{n}{s}\PY{p}{,} \PY{n}{v} \PY{o}{=} \PY{n}{np}\PY{o}{.}\PY{n}{linalg}\PY{o}{.}\PY{n}{svd}\PY{p}{(}\PY{n}{linedfmin}\PY{p}{)}
\KV@do
,commandchars=
{},,
{\color{incolor}In [{\color{incolor}30}]:} \PY{n}{s}
\KV@do
,commandchars=
{},,
{\color{outcolor}Out[{\color{outcolor}30}]:} array([573.37142695, 257.38960156, 238.87874045, 218.20623087, 209.10291146, 203.66714934, 45.86448121, 24.94621714, 17.98065278, 13.50396573, 10.96093658])
We see that most of the energy is contained in the top six singular values, so the space is approximately six dimensional (as a linear space).
Are the twist parameters redundant somehow?
\KV@do
,commandchars=
{},,
{\color{incolor}In [{\color{incolor}51}]:} \PY{n}{linedfmin2} \PY{o}{=} \PY{n}{linedf}\PY{p}{[}\PY{p}{[}\PY{l+s+s1}{\PYZsq{}}\PY{l+s+s1}{a}\PY{l+s+s1}{\PYZsq{}}\PY{p}{,} \PY{l+s+s1}{\PYZsq{}}\PY{l+s+s1}{b}\PY{l+s+s1}{\PYZsq{}}\PY{p}{,} \PY{l+s+s1}{\PYZsq{}}\PY{l+s+s1}{c}\PY{l+s+s1}{\PYZsq{}}\PY{p}{,} \PY{l+s+s1}{\PYZsq{}}\PY{l+s+s1}{d}\PY{l+s+s1}{\PYZsq{}}\PY{p}{,} \PY{l+s+s1}{\PYZsq{}}\PY{l+s+s1}{e}\PY{l+s+s1}{\PYZsq{}}\PY{p}{,} \PY{l+s+s1}{\PYZsq{}}\PY{l+s+s1}{volume}\PY{l+s+s1}{\PYZsq{}}\PY{p}{]}\PY{p}{]}
\KV@do
,commandchars=
{},,
{\color{incolor}In [{\color{incolor}52}]:} \PY{n}{u}\PY{p}{,} \PY{n}{s}\PY{p}{,} \PY{n}{v} \PY{o}{=} \PY{n}{np}\PY{o}{.}\PY{n}{linalg}\PY{o}{.}\PY{n}{svd}\PY{p}{(}\PY{n}{linedfmin2}\PY{p}{)}
\KV@do
,commandchars=
{},,
{\color{incolor}In [{\color{incolor}53}]:} \PY{n}{s}
\KV@do
,commandchars=
{},,
{\color{outcolor}Out[{\color{outcolor}53}]:} array([572.95525389, 45.89914386, 24.99847935, 17.9864745 , 13.50661835, 10.96897694])
Not at all! It looks like the space of volume and the five lengths, only one or two dimensions are signifcant!
Let’s try to go another way and see if knowing the length spectrum we can predict the volume. For this we will use boosting - a very effective machine learning technique.
\KV@do
,commandchars=
{},,
{\color{incolor}In [{\color{incolor}69}]:} \PY{n}{featdf} \PY{o}{=} \PY{n}{linedfmin2}\PY{p}{[}\PY{n}{linedfmin2}\PY{o}{.}\PY{n}{columns}\PY{p}{[}\PY{p}{:}\PY{o}{\PYZhy{}}\PY{l+m+mi}{1}\PY{p}{]}\PY{p}{]}
\KV@do
,commandchars=
{},,
{\color{incolor}In [{\color{incolor}70}]:} \PY{n}{targdf} \PY{o}{=} \PY{n}{linedfmin2}\PY{p}{[}\PY{l+s+s1}{\PYZsq{}}\PY{l+s+s1}{volume}\PY{l+s+s1}{\PYZsq{}}\PY{p}{]}
\KV@do
,commandchars=
{},,
{\color{incolor}In [{\color{incolor}71}]:} \PY{n}{X\PYZus{}train}\PY{p}{,} \PY{n}{X\PYZus{}test}\PY{p}{,} \PY{n}{y\PYZus{}train}\PY{p}{,} \PY{n}{y\PYZus{}test} \PY{o}{=} \PY{n}{train\PYZus{}test\PYZus{}split}\PY{p}{(}\PY{n}{featdf}\PY{p}{,} \PY{n}{targdf}\PY{p}{)}
\KV@do
,commandchars=
{},,
{\color{incolor}In [{\color{incolor}72}]:} \PY{n}{clf1} \PY{o}{=} \PY{n}{xg}\PY{o}{.}\PY{n}{XGBRegressor}\PY{p}{(}\PY{p}{)}
\KV@do
,commandchars=
{},,
{\color{incolor}In [{\color{incolor}73}]:} \PY{n}{parameters} \PY{o}{=} \PY{p}{\PYZob{}}\PY{l+s+s1}{\PYZsq{}}\PY{l+s+s1}{objective}\PY{l+s+s1}{\PYZsq{}}\PY{p}{:}\PY{p}{[}\PY{l+s+s1}{\PYZsq{}}\PY{l+s+s1}{reg:linear}\PY{l+s+s1}{\PYZsq{}}\PY{p}{]}\PY{p}{,}\PY{l+s+s1}{\PYZsq{}}\PY{l+s+s1}{learning\PYZus{}rate}\PY{l+s+s1}{\PYZsq{}}\PY{p}{:} \PY{p}{[}\PY{l+m+mf}{0.1}\PY{p}{,} \PY{o}{.}\PY{l+m+mi}{3}\PY{p}{,} \PY{l+m+mf}{0.5}\PY{p}{]}\PY{p}{,} \PY{l+s+s1}{\PYZsq{}}\PY{l+s+s1}{max\PYZus{}depth}\PY{l+s+s1}{\PYZsq{}}\PY{p}{:} \PY{p}{[} \PY{l+m+mi}{5}\PY{p}{,} \PY{l+m+mi}{6}\PY{p}{,} \PY{l+m+mi}{7}\PY{p}{]}\PY{p}{,} \PY{l+s+s1}{\PYZsq{}}\PY{l+s+s1}{min\PYZus{}child\PYZus{}weight}\PY{l+s+s1}{\PYZsq{}}\PY{p}{:} \PY{p}{[}\PY{l+m+mi}{9}\PY{p}{]}\PY{p}{,}\PY{l+s+s1}{\PYZsq{}}\PY{l+s+s1}{silent}\PY{l+s+s1}{\PYZsq{}}\PY{p}{:} \PY{p}{[}\PY{l+m+mi}{1}\PY{p}{]}\PY{p}{,} \PY{l+s+s1}{\PYZsq{}}\PY{l+s+s1}{subsample}\PY{l+s+s1}{\PYZsq{}}\PY{p}{:} \PY{p}{[} \PY{l+m+mf}{0.6}\PY{p}{]}\PY{p}{,} \PY{l+s+s1}{\PYZsq{}}\PY{l+s+s1}{colsample\PYZus{}bytree}\PY{l+s+s1}{\PYZsq{}}\PY{p}{:} \PY{p}{[} \PY{l+m+mf}{0.8}\PY{p}{]}\PY{p}{,}\PY{l+s+s1}{\PYZsq{}}\PY{l+s+s1}{n\PYZus{}estimators}\PY{l+s+s1}{\PYZsq{}}\PY{p}{:} \PY{p}{[} \PY{l+m+mi}{1000}\PY{p}{]}\PY{p}{\PYZcb{}}
\KV@do
,commandchars=
{},,
{\color{incolor}In [{\color{incolor}74}]:} \PY{n}{clf} \PY{o}{=} \PY{n}{GridSearchCV}\PY{p}{(}\PY{n}{clf1}\PY{p}{,}\PY{n}{parameters}\PY{p}{,} \PY{n}{cv} \PY{o}{=} \PY{l+m+mi}{2}\PY{p}{,} \PY{n}{n\PYZus{}jobs} \PY{o}{=} \PY{l+m+mi}{5}\PY{p}{,} \PY{n}{verbose}\PY{o}{=}\PY{k+kc}{True}\PY{p}{)}
\KV@do
,commandchars=
{},,
{\color{incolor}In [{\color{incolor}75}]:} \PY{n}{clf}\PY{o}{.}\PY{n}{fit}\PY{p}{(}\PY{n}{X\PYZus{}train}\PY{p}{,}\PY{n}{y\PYZus{}train}\PY{p}{)}
\KV@do
,commandchars=
{},,
Fitting 2 folds for each of 9 candidates, totalling 18 fits
\KV@do
,commandchars=
{},,
\KV@do
,commandchars=
{},,
{\color{outcolor}Out[{\color{outcolor}75}]:} GridSearchCV(cv=2, error_score=’raise-deprecating’, estimator=XGBRegressor(base_score=0.5, booster=’gbtree’, colsample_bylevel=1, colsample_bytree=1, gamma=0, learning_rate=0.1, max_delta_step=0, max_depth=3, min_child_weight=1, missing=None, n_estimators=100, n_jobs=1, nthread=None, objective=’reg:linear’, random_state=0, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None, silent=True, subsample=1), fit_params=None, iid=’warn’, n_jobs=5, param_grid={’objective’: [’reg:linear’], ’learning_rate’: [0.1, 0.3, 0.5], ’max_depth’: [5, 6, 7], ’min_child_weight’: [9], ’silent’: [1], ’subsample’: [0.6], ’colsample_bytree’: [0.8], ’n_estimators’: [1000]}, pre_dispatch=’2*n_jobs’, refit=True, return_train_score=’warn’, scoring=None, verbose=True)
\KV@do
,commandchars=
{},,
{\color{incolor}In [{\color{incolor}76}]:} \PY{n}{preds} \PY{o}{=} \PY{n}{clf}\PY{o}{.}\PY{n}{predict}\PY{p}{(}\PY{n}{X\PYZus{}test}\PY{p}{)}
\KV@do
,commandchars=
{},,
{\color{incolor}In [{\color{incolor}101}]:} \PY{n}{np}\PY{o}{.}\PY{n}{linalg}\PY{o}{.}\PY{n}{norm}\PY{p}{(}\PY{n}{preds}\PY{o}{\PYZhy{}}\PY{n}{y\PYZus{}test}\PY{p}{)}\PY{o}{/}\PY{n}{np}\PY{o}{.}\PY{n}{sqrt}\PY{p}{(}\PY{n+nb}{len}\PY{p}{(}\PY{n}{y\PYZus{}test}\PY{p}{)}\PY{p}{)}
\KV@do
,commandchars=
{},,
{\color{outcolor}Out[{\color{outcolor}101}]:} 0.7417280255021659
\KV@do
,commandchars=
{},,
{\color{incolor}In [{\color{incolor} }]:}
\KV@do
,commandchars=
{},,
{\color{incolor}In [{\color{incolor}43}]:} \PY{n}{y\PYZus{}test}\PY{o}{.}\PY{n}{describe}\PY{p}{(}\PY{p}{)}
\KV@do
,commandchars=
{},,
{\color{outcolor}Out[{\color{outcolor}43}]:} count 2741.000000 mean 5.061871 std 0.805381 min 1.529477 25% 4.626565 50% 5.228348 75% 5.657743 max 6.453448 Name: volume, dtype: float64
So we explain about 10% of the standard deviation, or 20% of the variance…
What if we use all the complex length info?
\KV@do
,commandchars=
{},,
{\color{incolor}In [{\color{incolor}103}]:} \PY{n}{featdf} \PY{o}{=} \PY{n}{linedfmin}\PY{p}{[}\PY{n}{linedfmin}\PY{o}{.}\PY{n}{columns}\PY{p}{[}\PY{p}{:}\PY{o}{\PYZhy{}}\PY{l+m+mi}{1}\PY{p}{]}\PY{p}{]} \PY{n}{targdf} \PY{o}{=} \PY{n}{linedfmin}\PY{p}{[}\PY{l+s+s1}{\PYZsq{}}\PY{l+s+s1}{volume}\PY{l+s+s1}{\PYZsq{}}\PY{p}{]} \PY{n}{X\PYZus{}train}\PY{p}{,} \PY{n}{X\PYZus{}test}\PY{p}{,} \PY{n}{y\PYZus{}train}\PY{p}{,} \PY{n}{y\PYZus{}test} \PY{o}{=} \PY{n}{train\PYZus{}test\PYZus{}split}\PY{p}{(}\PY{n}{featdf}\PY{p}{,} \PY{n}{targdf}\PY{p}{)} \PY{n}{clf1} \PY{o}{=} \PY{n}{xg}\PY{o}{.}\PY{n}{XGBRegressor}\PY{p}{(}\PY{p}{)} \PY{n}{clf} \PY{o}{=} \PY{n}{GridSearchCV}\PY{p}{(}\PY{n}{clf1}\PY{p}{,}\PY{n}{parameters}\PY{p}{,} \PY{n}{cv} \PY{o}{=} \PY{l+m+mi}{2}\PY{p}{,} \PY{n}{n\PYZus{}jobs} \PY{o}{=} \PY{l+m+mi}{5}\PY{p}{,} \PY{n}{verbose}\PY{o}{=}\PY{k+kc}{True}\PY{p}{)} \PY{n}{clf}\PY{o}{.}\PY{n}{fit}\PY{p}{(}\PY{n}{X\PYZus{}train}\PY{p}{,}\PY{n}{y\PYZus{}train}\PY{p}{)}
\KV@do
,commandchars=
{},,
Fitting 2 folds for each of 9 candidates, totalling 18 fits
\KV@do
,commandchars=
{},,
\KV@do
,commandchars=
{},,
{\color{outcolor}Out[{\color{outcolor}103}]:} GridSearchCV(cv=2, error_score=’raise-deprecating’, estimator=XGBRegressor(base_score=0.5, booster=’gbtree’, colsample_bylevel=1, colsample_bytree=1, gamma=0, learning_rate=0.1, max_delta_step=0, max_depth=3, min_child_weight=1, missing=None, n_estimators=100, n_jobs=1, nthread=None, objective=’reg:linear’, random_state=0, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None, silent=True, subsample=1), fit_params=None, iid=’warn’, n_jobs=5, param_grid={’objective’: [’reg:linear’], ’learning_rate’: [0.1, 0.3, 0.5], ’max_depth’: [5, 6, 7], ’min_child_weight’: [9], ’silent’: [1], ’subsample’: [0.6], ’colsample_bytree’: [0.8], ’n_estimators’: [1000]}, pre_dispatch=’2*n_jobs’, refit=True, return_train_score=’warn’, scoring=None, verbose=True)
\KV@do
,commandchars=
{},,
{\color{incolor}In [{\color{incolor}104}]:} \PY{n}{preds} \PY{o}{=} \PY{n}{clf}\PY{o}{.}\PY{n}{predict}\PY{p}{(}\PY{n}{X\PYZus{}test}\PY{p}{)} \PY{n}{np}\PY{o}{.}\PY{n}{linalg}\PY{o}{.}\PY{n}{norm}\PY{p}{(}\PY{n}{preds}\PY{o}{\PYZhy{}}\PY{n}{y\PYZus{}test}\PY{p}{)}\PY{o}{/}\PY{n}{np}\PY{o}{.}\PY{n}{sqrt}\PY{p}{(}\PY{n+nb}{len}\PY{p}{(}\PY{n}{y\PYZus{}test}\PY{p}{)}\PY{p}{)}
\KV@do
,commandchars=
{},,
{\color{outcolor}Out[{\color{outcolor}104}]:} 0.5423190901708408
So we are doing pretty well (getting about 40% of the information).
\KV@do
,commandchars=
{},,
{\color{incolor}In [{\color{incolor}63}]:} \PY{k+kn}{from} \PY{n+nn}{sklearn}\PY{n+nn}{.}\PY{n+nn}{linear\PYZus{}model} \PY{k}{import} \PY{n}{LinearRegression}
\KV@do
,commandchars=
{},,
{\color{incolor}In [{\color{incolor}78}]:} \PY{n}{reg} \PY{o}{=} \PY{n}{LinearRegression}\PY{p}{(}\PY{p}{)}
\KV@do
,commandchars=
{},,
{\color{incolor}In [{\color{incolor}79}]:} \PY{n}{reg}\PY{o}{.}\PY{n}{fit}\PY{p}{(}\PY{n}{X\PYZus{}train}\PY{p}{,} \PY{n}{y\PYZus{}train}\PY{p}{)}
\KV@do
,commandchars=
{},,
{\color{outcolor}Out[{\color{outcolor}79}]:} LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)
\KV@do
,commandchars=
{},,
{\color{incolor}In [{\color{incolor}80}]:} \PY{n}{rpred} \PY{o}{=} \PY{n}{reg}\PY{o}{.}\PY{n}{predict}\PY{p}{(}\PY{n}{X\PYZus{}test}\PY{p}{)}
\KV@do
,commandchars=
{},,
{\color{incolor}In [{\color{incolor}100}]:} \PY{n}{np}\PY{o}{.}\PY{n}{linalg}\PY{o}{.}\PY{n}{norm}\PY{p}{(}\PY{n}{y\PYZus{}test}\PY{o}{\PYZhy{}}\PY{n}{rpred}\PY{p}{)}\PY{o}{/}\PY{n}{np}\PY{o}{.}\PY{n}{sqrt}\PY{p}{(}\PY{n+nb}{len}\PY{p}{(}\PY{n}{y\PYZus{}test}\PY{p}{)}\PY{p}{)}
\KV@do
,commandchars=
{},,
{\color{outcolor}Out[{\color{outcolor}100}]:} 0.7917050890601595
We see that the first three lengths in the length spectrum give us pretty much all the information!
\KV@do
,commandchars=
{},,
{\color{incolor}In [{\color{incolor}84}]:} \PY{n}{newdf} \PY{o}{=} \PY{n}{linedf}\PY{p}{[}\PY{p}{[}\PY{l+s+s1}{\PYZsq{}}\PY{l+s+s1}{logtor}\PY{l+s+s1}{\PYZsq{}}\PY{p}{,} \PY{l+s+s1}{\PYZsq{}}\PY{l+s+s1}{a}\PY{l+s+s1}{\PYZsq{}}\PY{p}{,} \PY{l+s+s1}{\PYZsq{}}\PY{l+s+s1}{volume}\PY{l+s+s1}{\PYZsq{}}\PY{p}{]}\PY{p}{]}
\KV@do
,commandchars=
{},,
{\color{incolor}In [{\color{incolor}85}]:} \PY{n}{u}\PY{p}{,} \PY{n}{s}\PY{p}{,} \PY{n}{v} \PY{o}{=} \PY{n}{np}\PY{o}{.}\PY{n}{linalg}\PY{o}{.}\PY{n}{svd}\PY{p}{(}\PY{n}{newdf}\PY{p}{)}
\KV@do
,commandchars=
{},,
{\color{incolor}In [{\color{incolor}86}]:} \PY{n}{s}
\KV@do
,commandchars=
{},,
{\color{outcolor}Out[{\color{outcolor}86}]:} array([659.45727388, 106.79900214, 15.42933463])
\KV@do
,commandchars=
{},,
{\color{incolor}In [{\color{incolor}105}]:} \PY{n}{reg2} \PY{o}{=} \PY{n}{LinearRegression}\PY{p}{(}\PY{p}{)}
\KV@do
,commandchars=
{},,
{\color{incolor}In [{\color{incolor}107}]:} \PY{n}{reg2}\PY{o}{.}\PY{n}{fit}\PY{p}{(}\PY{n}{newdf}\PY{p}{[}\PY{p}{[}\PY{l+s+s1}{\PYZsq{}}\PY{l+s+s1}{logtor}\PY{l+s+s1}{\PYZsq{}}\PY{p}{,} \PY{l+s+s1}{\PYZsq{}}\PY{l+s+s1}{volume}\PY{l+s+s1}{\PYZsq{}}\PY{p}{]}\PY{p}{]}\PY{p}{,} \PY{n}{newdf}\PY{o}{.}\PY{n}{a}\PY{p}{)}
\KV@do
,commandchars=
{},,
{\color{outcolor}Out[{\color{outcolor}107}]:} LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)
\KV@do
,commandchars=
{},,
{\color{incolor}In [{\color{incolor}108}]:} \PY{n}{reg2}\PY{o}{.}\PY{n}{coef\PYZus{}}
\KV@do
,commandchars=
{},,
{\color{outcolor}Out[{\color{outcolor}108}]:} array([-0.02357474, -0.01766166])
\KV@do
,commandchars=
{},,
{\color{incolor}In [{\color{incolor} }]:}
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[CDGW] Marc Culler, Nathan M. Dunfield, Matthias Goerner, and Jeffrey R. Weeks. Snap Py, a computer program for studying the geometry and topology of 3 3 3 -manifolds. Available at http://snappy.computop.org (DD/MM/YYYY).
- 2[DT 06] Nathan M Dunfield and William P Thurston. Finite covers of random 3-manifolds. Inventiones mathematicae , 166(3):457–521, 2006.
- 3[Riv 14] Igor Rivin. Statistics of random 3-manifolds occasionally fibering over the circle. ar Xiv preprint ar Xiv:1401.5736 , 2014.
