bloomfilter-2.0.1.2: Pure and impure Bloom Filter implementations.
CopyrightBryan O'Sullivan
LicenseBSD3
MaintainerBryan O'Sullivan <bos@serpentine.com>
Stabilityunstable
Portabilityportable
Safe HaskellSafe-Inferred
LanguageHaskell2010

Data.BloomFilter.Easy

Description

An easy-to-use Bloom filter interface.

Synopsis

Easy creation and querying

data Bloom a #

An immutable Bloom filter, suitable for querying from pure code.

Instances

Instances details
Show (Bloom a) # 
Instance details

Defined in Data.BloomFilter

Methods

showsPrec :: Int -> Bloom a -> ShowS

show :: Bloom a -> String

showList :: [Bloom a] -> ShowS

NFData (Bloom a) # 
Instance details

Defined in Data.BloomFilter

Methods

rnf :: Bloom a -> ()

easyList #

Arguments

:: Hashable a 
=> Double

desired false positive rate (0 < e < 1)

-> [a]

values to populate with

-> Bloom a 

Create a Bloom filter with the given false positive rate and members. The hash functions used are computed by the cheapHashes function from the Hash module.

elem :: a -> Bloom a -> Bool #

Query an immutable Bloom filter for membership. If the value is present, return True. If the value is not present, there is still some possibility that True will be returned.

notElem :: a -> Bloom a -> Bool #

Query an immutable Bloom filter for non-membership. If the value is present, return False. If the value is not present, there is still some possibility that False will be returned.

length :: Bloom a -> Int #

Return the size of an immutable Bloom filter, in bits.

Example: a spell checker

This example reads a dictionary file containing one word per line, constructs a Bloom filter with a 1% false positive rate, and spellchecks its standard input. Like the Unix spell command, it prints each word that it does not recognize.

import Data.BloomFilter.Easy (easyList, elemB)

main = do
  filt <- (easyList 0.01 . words) `fmap` readFile "usrsharedictwords"
  let check word | elemB word filt = ""
                 | otherwise         = word ++ "\n"
  interact (concat . map check . lines)
 

Useful defaults for creation

safeSuggestSizing #

Arguments

:: Int

expected maximum capacity

-> Double

desired false positive rate (0 < e < 1)

-> Either String (Int, Int) 

Suggest a good combination of filter size and number of hash functions for a Bloom filter, based on its expected maximum capacity and a desired false positive rate.

The false positive rate is the rate at which queries against the filter should return True when an element is not actually present. It should be a fraction between 0 and 1, so a 1% false positive rate is represented by 0.01.

suggestSizing #

Arguments

:: Int

expected maximum capacity

-> Double

desired false positive rate (0 < e < 1)

-> (Int, Int) 

Behaves as safeSuggestSizing, but calls error if given invalid or out-of-range inputs.