Информация о цитировании статьи получена из
Scopus
Статья опубликована в журнале из списка Web of Science и/или Scopus
Дата последнего поиска статьи во внешних источниках: 28 мая 2015 г.
Аннотация:Let s=s_1..s_n be a text (or sequence) on a finite alphabet Σ of size σ. A fingerprint in s is the set of distinct characters appearing in one of its substrings. The problem considered here is to compute the set F of all fingerprints of all substrings of s in order to answer efficiently certain questions on this set. A substring s_i..s_j is a maximal location for a fingerprint f∈F (denoted by <i,j>) if the alphabet of s_i..s_j is f and s_i−1, s_j+1, if defined, are not in f. The set of maximal locations in s is L (it is easy to see that |L|<=nσ). Two maximal locations <i,j> and <k,l> such that s_i..s_j=s_k..s_l are named copies, and the quotient set of L according to the copy relation is denoted by LC.
We first present new exact efficient algorithms and data structures for the following three problems: (1) to compute F; (2) given f as a set of distinct characters in Σ, to answer if f represents a fingerprint in F; (3) given f, to find all maximal locations of f in s. As well as in papers concerning succinct data structures, in the paper all space complexities are counted in bits. Problem 1 is solved either in View the MathML source worst-case time (in this paper all logarithms are intended as base two logarithms) using View the MathML source bits of space, or in View the MathML source randomized expected time using View the MathML source bits of space. Problem 2 is solved either in O(|f|) expected time if only View the MathML source bits of working space for queries is allowed, or in worst-case O(|f|/ϵ) time if a working space of View the MathML source bits is allowed (with ϵ a constant satisfying 0<ϵ<1). These algorithms use a data structure that occupies View the MathML source bits. Problem 3 is solved with the same time complexity as Problem 2, but with the addition of an occ term to each of the complexities, where occ is the number of maximal locations corresponding to the given fingerprint. Our solution of this last problem requires a data structure that occupies View the MathML source bits of memory.
In the second part of our paper we present a novel Monte Carlo approximate construction approach. Problem 1 is thus solved in O(n+|L|) expected time using View the MathML source bits of space but the algorithm is incorrect with an extremely small probability that can be bounded in advance.