PageIndex

Index structure

This article covers the indexing mechanisms those are supported by jCoreDB. The first, but not the only one, is the I(ndex) S(equential) A(ccess) M(ethod). ISAM clusters the data. So if you have n values by using m clusters, then the cluster size is s = n / m . ISAM additionally uses multiple levels to achieve the sub-clustering. So an additional parameter is the number of levels. Here an example. If you have the values {v_1, v_2, v_3, v_4, v_5, ..., v_n} by assuming the order (v_1 <= v_2 <= ... <= v_n) and that we use a cluster size of 6 then the first clustering is defined by the following elements:  (v_1, v_7, v_13, v_18, ...) whereby the cluster with the name v_1 contains the elements {v_1, v_2, v_3, v_4, v_5, v_6}. So if we search for an element v_j with v_i <= v_j <= v_k and v_i and v_k are cluster identifiers (boundaries) then we no longer need to iterate over the n elements in order to find it. Instead we just need to iterate over the lower boundary elements in order to know that an element is within a specific cluster. Then we just need to iterate over the cluster itself in order to find the element. If the cluster is too big then it makes sense to cluster the cluster, which means to add a new level. So let's split the cluster into 2 other clusters. We now find our cluster v_1 divided into the clusters v_1 and v_4: ( (v_1, v_2, v_3),(v_4, v_5, v_6) ). It's easy to see that we are talking about a 2 layer tree here:

 

Important is that ISAM is a static storage / index structure. So it does not adapt very well regarding new data. So you need to recreate the index from time to time get best search results. So how to add new values to an ISAM index? Our implementation will just add the value to the lowest level (ordered list of values) of the index by keeping the number of levels constant but by increasing the size of the cluster. This sounds easy but we have to make sure that the effort of inserting to and reorganizing this ordered list is not too high. A reorganization operation is required in order to rebuild the index properly from time to time.