The Deferred Frequency Index (DFI) is a tool for string mining under frequency constraints, i.e., predicates that evaluate solely the frequency of a pattern occurrence in the data. The frequency of a pattern is defined as the number of distinct sequences in a database that contain the pattern at least once. Currently the implementation contains 3 different predicates and can easily be extended by user-defined frequency predicates. The frequencies are calculated during the construction of a suffix tree over all databases, which enables to limit the index construction to a problem-specific minimum referred to as the optimal monotonic hull.

(c) Copyright 2010 by David Weese and Marcel H. Schulz

Input Ports

  1. Type: URI Object
    Database files in Fasta/Fastq or text format (lines are strings). [fq,fastq,fa,fasta,faa,ffn,fna,frn,embl,gbk,raw,sam]

Output Ports

  1. Type: URI Object
    Change output filename. Default: <stdout>. [txt]


