Finding patterns in strings using suffix arrays
Finding regularities in large data sets requires implementations of systems that are efficient in both time and space
requirements. Here, we describe a newly developed system that
exploits the internal structure of the enhanced suffixarray to find
significant patterns in a large collection of sequences. The system
searches exhaustively for all significantly compressing patterns
where patterns may consist of symbols and skips or wildcards.
We demonstrate a possible application of the system by detecting
interesting patterns in a Dutch and an English corpus.
Share this page