Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 16 Current »

Index structure

This article covers the indexing mechanisms those are supported by jCoreDB. The first, but not the only one, is the I(ndex) S(equential) A(ccess) M(ethod). ISAM clusters the data. So if you have n values by using m clusters, then the cluster size is s = n / m . ISAM additionally uses multiple levels to achieve the sub-clustering. So an additional parameter is the number of levels. Here an example. If you have the values {v_1, v_2, v_3, v_4, v_5, ..., v_n} by assuming the order (v_1 <= v_2 <= ... <= v_n) and that we use a cluster size of 6 then the first clustering is defined by the following elements:  (v_1, v_7, v_13, v_18, ...) whereby the cluster with the name v_1 contains the elements {v_1, v_2, v_3, v_4, v_5, v_6}. So if we search for an element v_j with v_i <= v_j <= v_k and v_i and v_k are cluster identifiers (boundaries) then we no longer need to iterate over the n elements in order to find it. Instead we just need to iterate over the lower boundary elements in order to know that an element is within a specific cluster. Then we just need to iterate over the cluster itself in order to find the element. If the cluster is too big then it makes sense to cluster the cluster, which means to add a new level. So let's split the cluster into 2 other clusters. We now find our cluster v_1 divided into the clusters v_1 and v_4: ( (v_1, v_2, v_3),(v_4, v_5, v_6) ). It's easy to see that we are talking about a 2 layer tree here:

 

Important is that ISAM is a static storage / index structure. So it does not adapt very well regarding new data. So you need to recreate the index from time to time get best search results. So how to add new values to an ISAM index? Our implementation will just add the value to the lowest level (ordered list of values) of the index by keeping the number of levels constant but by increasing the size of the cluster. This sounds easy but we have to make sure that the effort of inserting to and reorganizing this ordered list is not too high. A reorganization operation is required in order to rebuild the index properly from time to time.

Implementation idea

The idea is that we use jCoreDB's FileSystem and PageBuffer to realize the index. The parameters should differ. So for instance smaller segment sizes and page sizes may be required. Data is stored in containers, containers are divided in segments and segments are made of blocks (pages). So an index is represented by a container. All we need to specify is the number of levels. The next level then contains twice as much elements as the previous one (factor 2). So the first level is realized by using one segments, the second one by using 2 segments, the third one by using 4 segments and so on. The segment size is not constant. Instead it depends on the number of elements those should be stored. The last level can be extended by new segments in order to have room for additional data. An extend happens only on the last level. It is necessary to insert new data by keeping the existing order of the values.

 

So on the last level we want to keep a kind of order without the need to reorganize the level again and again. The easiest way is something like 'append only'. Which means that we append at the end of the file by updating just pointers.

Index operations

Let's investigate the following operations:

  • INSERT
  • SCAN
  • UPDATE
  • DELETE
  • CREATE
  • RECREATE

SCAN

  1. Let's assume we are searching for the value v_17
  2. We iterate over the first level: v_17 >= v_1 --> v_17 >= v_7 --> v_17 >= v_13 --> v_17 < v_18 --> v_17 is inside the cluster with the id v_13.
  3. We iterate over the second level: v_17 >= v_13 --> v_17 >= v_16 --> v_14 --> v_17 < v_18 --> v_17 is inside the cluster with the id v_16
  4. We iterate over the elements of the cluster v_16: v_16 --> v_17 --> We found v_17 and it's corresponding pointer to the data set.

INSERT

  1. Let's only look at the last level, which is this ordered list. We want to avoid reorganization efforts and so need something which avoids to sort the list again and again.
  2. We iterate over levels by finding that v_16 is the right cluster.
  3. Then we iterate over the elements of the cluster v_16: v_17 >= v_16  --> v_17 <= v_18 --> So what we need to do is to add v_17 between v_16 and v_18 (surprise (wink) )
  4. Instead adding v_17 between them, we perform an append operation by setting pointers

 

UPDATE

An Update does only affect the last level, too. An Update means to find a specific value in the index by replacing it by another one. Usually we would just append the new value by changing the pointers as shown in the Insert example.

  • No labels