Resumen
Re-Pairis a grammar compression scheme with favorably good compression rates. The computation of Re-Pair comes with the cost of maintaining large frequency tables, which makes it hard to compute Re-Pair on large-scale data sets. As a solution for this problem, we present, given a text of length n whose characters are drawn from an integer alphabet with size ??=????(1)
s
=
n
O
(
1
)
, an ??(min(??2,??2lglog????lglglg??/log????))
O
(
min
(
n
2
,
n
2
lg
log
t
n
lg
lg
lg
n
/
log
t
n
)
)
time algorithm computing Re-Pair with max((??/??)lg??,
max
(
(
n
/
c
)
lg
n
,
???lg???)+??(lg??)
n
lg
t
)
+
O
(
lg
n
)
bits of working space including the text space, where ??=1
c
=
1
is a fixed user-defined constant and ??
t
is the sum of ??
s
and the number of non-terminals. We give variants of our solution working in parallel or in the external memory model. Unfortunately, the algorithm seems not practical since a preliminary version already needs roughly one hour for computing Re-Pair on one megabyte of text.