Enumeration of sequences with large alphabets

M. Oguzhan Kulekci

arXiv:1211.2926·cs.DS·November 14, 2012·1 cites

Enumeration of sequences with large alphabets

M. Oguzhan Kulekci

PDF

Open Access

TL;DR

This paper develops efficient enumerative coding schemes for sequences over large alphabets, introducing a new method that outperforms basic schemes, especially for DNA sequence applications.

Contribution

The paper proposes a novel enumeration-based coding method for large alphabet sequences, improving efficiency over naive representations and extending to DNA sequences.

Findings

01

The new coding scheme reduces bits needed by approximately ( extsigma -1) log( extsigma -1) compared to naive methods.

02

Experimental results show the new method outperforms basic schemes for large alphabets.

03

The approach is effective for DNA sequence encoding, demonstrating practical utility.

Abstract

This study focuses on efficient schemes for enumerative coding of $σ$ --ary sequences by mainly borrowing ideas from \"Oktem & Astola's \cite{Oktem99} hierarchical enumerative coding and Schalkwijk's \cite{Schalkwijk72} asymptotically optimal combinatorial code on binary sequences. By observing that the number of distinct $σ$ --dimensional vectors having an inner sum of $n$ , where the values in each dimension are in range $[0... n]$ is $K (σ, n) = \sum_{i = 0}^{σ - 1} (σ - 1 - i n - 1) (i σ)$ , we propose representing $C$ vector via enumeration, and present necessary algorithms to perform this task. We prove $lo g K (σ, n)$ requires approximately $(σ - 1) lo g (σ - 1)$ less bits than the naive $(σ - 1) ⌈ lo g (n + 1)⌉$ representation for relatively large $n$ , and examine the results for varying alphabet sizes experimentally.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · semigroups and automata theory · Fractal and DNA sequence analysis