A novel algorithm to get the Fourier power spectra of a real sequence
Jiasong Wang, Changchuan Yin

TL;DR
This paper introduces a new algorithm for computing Fourier power spectra of real sequences, enabling the detection of new frequencies beyond traditional DFT methods, with applications demonstrated on protein sequences.
Contribution
The paper presents a novel algorithm that relates Fourier power spectra of integer and fractional periods, revealing new frequency components in real sequences.
Findings
Identifies new frequencies in Fourier spectra not found by traditional DFT.
Mathematically proves the relation between spectra of integer and fractional periods.
Demonstrates the algorithm's effectiveness on protein sequences.
Abstract
For a real sequence of length of m = nl, we may deduce its congruence derivative sequence with length of l. The discrete Fourier transform of original sequence can be calculated by the discrete Fourier transform of the congruence derivative sequence. Based on the relation of discrete Fourier transforms between the two sequences, the features of Fourier power spectra of the integer and fractional periods for a real sequence have been investigated. It has proved mathematically that after calculating the Fourier power spectrum at an integer period, the Fourier power spectra of the fractional periods associated this integer period can be easily represented by the computational result of the Fourier power spectrum at the integer period for the sequence. A computational experience using a protein sequence shows that some of the computed results are a kind of Fourier power spectra…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFractal and DNA sequence analysis · Machine Learning in Bioinformatics · RNA and protein synthesis mechanisms
A novel algorithm to get the Fourier power spectra of a real sequence
Jiasong Wang1, Changchuan Yin2,∗
1.Department of Mathematics, Nanjing University, Nanjing, Jiangsu 210093, China
2.Department of Mathematics, Statistics, and Computer Science, The University of Illinois at Chicago, Chicago, IL 60607-7045, USA
Corresponding author, email: [email protected]
Abstract
For a real sequence of length of we may deduce its congruence derivative sequence with length of . The discrete Fourier transform of original sequence can be calculated by the discrete Fourier transform of the congruence derivative sequence. Based on the relation of discrete Fourier transforms between the two sequences, the features of Fourier power spectra of the integer and fractional periods for a real sequence have been investigated. It has proved mathematically that after calculating the Fourier power spectrum at an integer period, the Fourier power spectra of the fractional periods associated this integer period can be easily represented by the computational result of the Fourier power spectrum at the integer period for the sequence. A computational experience using a protein sequence shows that some of the computed results are a kind of Fourier power spectra corresponding to new frequencies which can’t be obtained from the traditional discrete Fourier transform. Therefore, the algorithm would be a new realization method for discrete Fourier transform of the real sequence.
keywords:
Fourier power spectrum , real sequence , congruence derivative sequence , integer periods , associated fractional periods
††journal: arXiv.org
1 Introduction
Digital signal processing (DSP) techniques have been widely applied in the periodicity analysis of time series and becomes a major research technique for bioinformatics (Anastassiou, 2001; Chen et al., 2003). The main requisite in applying signal processing to symbolic sequences is to map the sequences onto numerical time series. For example, the Voss (Voss, 1992) or Z-curve (Zhang and Zhang, 1994) representations may encode a symbolic DNA sequence as a numerical sequence, and hydrophobicity mapping may transform a protein sequence to numerical one (Kyte and Doolittle, 1982). After numerical mapping, DSP methods can be employed to study features, structures, and functions of the symbolic sequences. The most common signal processing approach is Fourier transform (Welch, 1967), which has been commonly used to study periodicity and repetitive regions in symbolic DNA sequences (Silverman and Linsker, 1986; Anastassiou, 2001), as well as in genome comparison (Yin et al., 2014; Yin and Yau, 2015). We have surveyed the mathematical properties of the Fourier spectrum for symbolic sequences (Wang et al., 2014). For a review of the DSP methods for the study of biological sequences, one may refer to our book (Wang and Yan, 2013).
Periods in biological sequences can be categorized into two types: integer periods and fractional periods. For integer periods, the 3-base periodicity of protein-coding regions is often used in gene finding (Tiwari et al., 1997; Yin and Yau, 2005, 2007; Yin, 2015); 2-base periodicity was found in introns of genomes (Arquès and Michel, 1987; Zhao et al., 2018). 2-periodicity exists in protein sequence regions for -sheet structures. Fractional periods, corresponding to fractional cycles in numerical series, are prevalent and important in structures and functions of protein sequences and genomes. For example, the 3.6-periodicity in protein sequences determines the -helix secondary structure (Eisenberg et al., 1984; Gruber et al., 2005; Leonov and Arkin, 2005; Yin and Yau, 2017). Strong 10.4- or 10.5-base periodicity in genomes are associated with nucleosomes (Trifonov, 1998; Salih and Trifonov, 2015). The 6.5-base periodicity is present in C. elegans introns (Messaoudi et al., 2013). Fractional period spectrum may offer high resolution and precise features of sequences, but accurately identifying fractional periods in a large genome is challenging. The demand of the periodicity research of DNA and protein sequences motivates us to investigate advanced techniques for fractional periodicity analysis of time series.
Due to the critical applications of Fourier transform for solving the problems of science and technology fields, new advancements of Fourier transform are constantly emerging. One of these achievements, fractional Fourier transform (FRFT), is a generalization of the Fourier transform, rediscovered many times over the past hundred years (Ozaktas et al., 1996; Sejdić et al., 2011). FRFT can be realized as the development of continues Fourier transform. Discrete fractional Fourier transform (DFRFT) should be considered the extension of discrete Fourier transform (DFT) (Candan et al., 2000). FRFT and DFRFT have been successfully used to analyze the timefrequency information in quantum mechanics and quantum optics, the study of timefrequency distributions, and many other applications (Sejdić et al., 2011). The emergence of various forms of Fourier transformation inspires people to find new formulas of Fourier transform. It is known that Fourier power spectra of traditional Fourier transform, continuous or discrete fractional Fourier transform are often used as a primary criterion for their applications. For example, based on the power spectra of DFRFT of the DNA sequences, the phylogenetic trees of numerous species can be constructed for evolutionary history (Qian and Luan, 2018). Based on our experience in calculating discrete Fourier power spectrum, a new realization method of DFT is proposed in this paper.
We previously proposed a method, periodic power spectrum (PPS), which directly computes Fourier power spectrum based on periodic distributions of signal strength on periodic positions (Wang et al., 2012; Yin and Wang, 2016). The advantage of the PPS method is that it avoids spectral leakage and reduces background noise, which both appear in the power spectrum of Fourier transform. Therefore, the PPS method can capture all latent integer periodicities in DNA sequences. We have utilized this method in the detection of latent periodicities in different genome elements, including exons and microsatellite DNA sequences.
Based on the main idea of PPS method, to employ periodic distributions of the elements in a real sequence, efficiently and directly calculates Fourier power spectra at a specific integer period of the sequence. By using the computing results of Fourier power spectrum for the integer period an algorithm of easily calculating Fourier power spectra of the fractional periods associated the integer period for a real sequence is suggested and the theoretically mathematical proof of the algorithm is presented in this work. Numerical experience shows that it is an extension of DFT realization, similarly DFRFT.
2 Theorems and concepts
2.1 Fourier transform
For a numerical sequence of length , , its discrete Fourier transform at frequency is defined as (Welch, 1967)
[TABLE]
and its Fourier power spectrum () at frequency is
[TABLE]
where indicates the complex conjugate.
2.2 Concepts
The Fourier power spectra of a signal can be represented by the congruence derivative sequence of the original sequence.
Definition 2.1**.**
*For a real number sequence of length , if two positive integers and satisfy , the congruence derivative sequence of , , length of , and its element is defined by
[TABLE]
and is named congruence derivative sequence or modulo distribution sequence. If the sequence length does not satisfy , then by padding zeros makes .
Its DFT is
[TABLE]
We introduce the relationship of DFTs between the original sequence and its congruence derivative sequence (Wang et al., 2012).
Theorem 2.1**.**
For a real sequence of length , suppose , then the DFT of the sequence at frequency is equal to the DFT of its congruence derivative sequence at frequency .
[TABLE]
Proof.
We know that
[TABLE]
∎
It is clear that in formula (2.5) is the trivial case.
We know that is the DFT of sequence at frequency , in other word, is the DFT of sequence at period , and the corresponding Fourier power spectrum of at periods is written as , where . When , we call as Fourier power spectrum at integer period , and , are named the associated fractional Fourier power spectra for the integer period because all the numerators of fractional periods are .
We here introduce the self summation of a sequence .
Definition 2.2**.**
For a real sequence of length , and , its self -shift summation is defined by
[TABLE]
with taken modulo , .
A special case is for when is an even number,
[TABLE]
If we write sequence , , as a vector, , then can be realized by the autocorrelation of vector . For a special case, is equal to the inner product of vector , .
2.3 Computations of Fourier power spectra of integer period and its associated fractional periods for a real sequence
In formula (2.5) if , we may get the DFT of sequence at integer period . Our previous research suggests a fast computing algorithm to compute Fourier power spectra at integer periods of a real sequence and its application to capture all latent integer periodicities in DNA sequences (Yin and Wang, 2016), and presented a fundamental algorithm of calculating of fractional periods for a real sequence (Wang and Yin, 2016).
Here, we further investigate the properties of integer and fractional periods of the real sequence. Noticed the formula (2.4) is the DFT of real sequence, . When , is the DFT of the integer period for the sequence; when , and is the DFT of the fractional periods , respectively. These values are DFTs of the fractional periods that are associated with the integer period .
Theorem 2.2**.**
The Fourier power spectra of the integer period and its associated fractional periods for a real sequence are symmetric.
Proof.
Due to the conjugate symmetry of Fourier transform of the real sequence,
[TABLE]
The power spectrum, , satisfies the following symmetric property
[TABLE]
∎
This theorem indicates that if we need computing the of the integer period and all its related fractional periods, calculating is sufficient.
Let denote
[TABLE]
and denote
[TABLE]
where is the transpose of a vector. Formula (2.5) can be rewritten as
[TABLE]
therefore, Fourier power spectrum of the real sequence is as follows
[TABLE]
The coefficient matrix of this quadratic form is written as
[TABLE]
Theorem 2.3**.**
Matrix is a Hermitian Toeplitz matrix.
Proof.
Suppose the elements of are , , can be represented as
[TABLE]
If , then , i.e. the diagonal elements of are ones, else if , then , is at the lower triangular matrix of , otherwise is at the upper triangular matrix of . Notice the representation of in formula (2.10), , therefore, matrix is a Hermitian matrix.
A Toeplitz matrix is a matrix that has identical elements along any line parallel to diagonal. In matrix , all elements along the lines parallel to diagonal their differences of the subscripts satisfy or , or , , or , respectively, therefore, matrix is a Hermitian Toeplitz matrix. ∎
The formula (2.9) is the mathematical representation for of a real sequence. From computational mathematics of view, for efficiently calculating , we need alternative procedure instead of directly computing (2.9) for saving computational cost.
Theorem 2.4**.**
For a real sequence of length , if and its congruence derivative sequence is , then of the sequence can be expressed as follows.
If is an odd number,
[TABLE]
Otherwise,
[TABLE]
Proof.
Recall that the entries of matrix satisfy and matrix is a Toeplitz matrix. Notice the variables of the quadratic form (2.9) with the property: , and the definitions of and . So the expansions of (2.9), , can be decomposed into three parts. The first part is the diagonal elements of matrix corresponding to the quadratic expression of the congruence derivative sequence, , that is,
[TABLE]
The second part is the quadratic expression of the entries of lower triangular matrix of . Without loss of generality, if is an odd number, we have the following.
[TABLE]
The third part is the quadratic expression of the entries of upper triangular matrix of . According to conclusion of theorem 2.3 we may see that the third part is equal
[TABLE]
The summation of formulas (2.13), (2.14), and (2.15) is the value
[TABLE]
Using notation and write , the above formula can be written as follows: when is an odd number,
[TABLE]
similarly, when is an even number,
[TABLE]
∎
From the formulas (2.11 and 2.12), for , to compute , we only need to calculate quadratic expressions of , when is an odd number, otherwise, quadratic expressions and their coefficients: , respectively.
According to spectrum symmetry formula (2.8) to compute the Fourier power spectra of the integer period and all its associated fractional periods, i.e., calculating all , need to calculate Fourier power spectra. It will have much efficient way for computing them, as follows.
Theorem 2.5**.**
The Fourier power spectra of its associated fractional periods of the real sequence can be deduced by the results in computational procedure of at integer period , .
Proof.
We can prove that the computing results at is sufficient for the computing for . For a fixed , , from the result just mentioned from formula (2.8), the range of should be in . Based on the computing results at , We have obtained that and their corresponding coefficients,
[TABLE]
respectively, and for is an odd number corresponds to coefficient as well as the coefficient of is when is an even number.
If we can represent any coefficient of as one of
[TABLE]
when we calculate , then we may complete the proof, no matter is odd number or even number.
For a fixed , to compute , we need to get the coefficients , i.e., .
Suppose is the remainder , then if , otherwise, notice that , set . Then is one of the following
[TABLE]
It proved that all the coefficients of may one-by-one come from
[TABLE]
∎
3 Algorithms and computation
3.1 Algorithms
For computing , , three functions are designed. (1) function is to construct the congruence derivative sequence of the original sequence , (2) function is to yield the autocorrelation of vector for shift (formula (2.6)), and (3) function ,
[TABLE]
when is an even number.
To speed up the computation, the algorithm shall use a built-in cosine series, , which is defined as
[TABLE]
and the indices of columns are .
Algorithm 1. To compute the at period for a real sequence, length of .
Input sequence ;
Call , obtained congruence derivative sequence y;
Set , then
for to (), do ;
Then if is an odd number do ; otherwise, .
Algorithm 2. After running algorithm 1, we have obtained the congruence derivative sequence of the sequence and , . To calculate the Fourier power spectra of the fractional periods , , and when is an odd number, otherwise, .
For a fixed , set,
For to do ;
If , then else ;
If is an odd number, then , until ;
Otherwise, if is an even number, if , then , else .
3.2 Computational experiments
We chose protein sesquiterpene synthase with -helix structure (PDB:4GAX) for computational experiment (Li et al., 2013). Using hydrophobicity representation, the symbolic protein sequence is mapped to a numerical sequence. Then, integer periods and in the numerical sequence are examined, respectively. The computational results is listed as follows. For , , , , , , , , , and , maximum is 21864 which appears at period, . It shows -helix secondary structure with period 3.6.
For , , are 9718.7, 563.84, 13134, 5267.3, 13318, 3936.1, 10628, 1116.7, 9272.5, 21864, 1190.8, 4381.2, 14623, 939.8, 7902.5, 2165, 616.43, and 1681, respectively. Also the maximum is 21864 at period .
It is easy to see, for example, these spectra: , , and , can’t be obtained from the traditional DFT. If chose is a prime number which less then and is not a factor of the original sequence length, the of the associated fractional periods can’t be obtained from the traditional DFT.
4 Conclusion and prospect
Over the years, we have used Fourier transform to study the structures, functions, interactions, and evolution of biological sequences, especially, for the periodicity’s research of the sequences we suggested PPS method which instead of traditional DFT to help the studying of those sequences. Based on this investigation, in this paper, we suggest the algorithms of directly computing Fourier power spectra for an integer period and its associated fractional periods for a real sequence. The algorithm is out of the ordinary that after figure out the Fourier power spectra of the integer period then very easily and efficiently get the power spectra of its associated fractional periods of the sequence because the mathematical proof ensures the validity of the algorithm.
The computational experiences show that it not only proves the efficient of the algorithm, but also suggests a new DFT computational realization method.
There are still rooms for improving the algorithm, we will continue to study the mathematical principles of the method. Their wide applications are more worthy of discoveries and researches in the future.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Anastassiou (2001) Anastassiou, D., 2001. Genomic signal processing. Signal Processing Magazine, IEEE 18 (4), 8–20.
- 2Arquès and Michel (1987) Arquès, D. G., Michel, C. J., 1987. Periodicities in introns. Nucleic Acids Research 15 (18), 7581–7592.
- 3Candan et al. (2000) Candan, C., Kutay, M. A., Ozaktas, H. M., 2000. The discrete fractional Fourier transform. IEEE Transactions on signal processing 48 (5), 1329–1337.
- 4Chen et al. (2003) Chen, J., Li, H., Sun, K., Kim, B., 2003. How will bioinformatics impact signal processing research? Signal Processing Magazine, IEEE 20 (6), 106–206.
- 5Eisenberg et al. (1984) Eisenberg, D., Weiss, R., Terwilliger, T., 1984. The hydrophobic moment detects periodicity in protein hydrophobicity. Proc. Natl. Acad. Sci. 81, 140–144.
- 6Gruber et al. (2005) Gruber, M., Söding, J., Lupas, A. N., 2005. Repper repeats and their periodicities in fibrous proteins. Nucleic Acids Research 33 (suppl 2), W 239–W 243.
- 7Kyte and Doolittle (1982) Kyte, J., Doolittle, R. F., 1982. A simple method for displaying the hydropathic character of a protein. Journal of Molecular Biology 157 (1), 105–132.
- 8Leonov and Arkin (2005) Leonov, H., Arkin, I. T., 2005. A periodicity analysis of transmembrane helices. Bioinformatics 21 (11), 2604–2610.
