The Deferred Frequency Index (DFI) is a tool for string mining under frequency constraints, i.e., predicates that evaluate solely the frequency of a pattern occurrence in the data. The frequency of a pattern is defined as the number of distinct sequences in a database that contain the pattern at least once. Currently the implementation contains 3 different predicates and can easily be extended by user-defined frequency predicates. The frequencies are calculated during the construction of a suffix tree over all databases, which enables to limit the index construction to a problem-specific minimum referred to as the optimal monotonic hull.
(c) Copyright 2010 by David Weese and Marcel H. Schulz
- Type: URI Object Database files in Fasta/Fastq or text format (lines are strings). [fq,fastq,fa,fasta,faa,ffn,fna,frn,embl,gbk,raw,sam]
- Type: URI Object Change output filename. Default: <stdout>. [txt]