hamsterdb Embedded Database 2.0.0.rc3UNSTABLE
Remember the upper and lower bound kays for this database; update them when we insert a new key, maybe even update them when we delete/erase a key.
These bounds are collected on the fly while searching (find()): they are stored in here as soon as a find() operation hits either the lower or upper bound of the key range stored in the database.
The purpose of storing these bounds is to speed up out-of-bounds key searches significantly: by comparing incoming keys with these bounds, we can immediately tell whether a key will have a change of being found or not, thus precluding the need to traverse the btree - which would produce the same answer in the end anyhow.
WARNING: having these key (copies) in here means we'll need to clean them up when we close the database connection, or we'll risk leaking memory in the key->data here.
NOTE #1: this is the humble beginning of what in a more sophisticated database server system would be called a 'histogram' (Oracle, etc.). Here we don't spend the effort to collect data for a full histogram, but merely collect info about the extremes of our stored key range.
NOTE #2: I'm pondering whether this piece of statistics gathering should be allowed to be turned off by the user 'because he knows best' where premium run-time performance requirements are at stake. Yet... The overhead here is a maximum of two key comparisons plus 2 key copies (which can be significant when we're talking about extended keys!) when we're producing find/insert/erase results which access a btree leaf node which is positioned at the upper/lower edge of the btree key range.
Hence, worst case happens for sure with tiny databases, as those will have ONE btree page only (root=leaf!) and the worst case is reduced to 1 key comparison + 1 key copy for any larger database, which spans two btree pages or more.
To further reduce the worst case overhead, we also store the within-btree-node index of the upper/lower bound key: when this does not change, there is no need to compare the key - unless the key is overwritten, which is a special case of the insert operation.
common rescale tracker as the rescaling is done on all operations data at once, so these remain 'balanced'.
Fringe case consideration: when there's, say, a lot of FIND going on with a few ERASE operations in between, is it A Bad Thing that the ERASE stats risc getting rescaled to almost nil then? Answer: NO. Because there's a high probability that the last ERASE btree leaf node isn't in cache anymore anyway -- unless it's the same one as used by FIND.
The reason we keep track of 3 different leaf nodes is only so we can supply good hinting in scanerios where FIND, INSERT and/or ERASE are mixed in reasonable ratios; keeping track of only a single btree leaf would deny us some good hinting for the other operations.