Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Important is that ISAM is a static storage / index structure. So it does not adapt very well regarding new data. So you need to recreate the index from time to time get best search results. So how to add new values to an ISAM index? Our implementation will just add the value to the lowest level (ordered list of values) of the index by keeping the number of levels constant but by increasing the size of the cluster. This sounds easy but we have to make sure that the effort of inserting to and reorganizing this ordered list is not too high. A reorganization operation is required in order to rebuild the index properly from time to time.

Implementation idea

The idea is that we use jCoreDB's FileSystem and PageBuffer to realize the index. The parameters should differ. So for instance smaller segment sizes and page sizes may be required. Data is stored in containers, containers are divided in segments and segments are made of blocks (pages). So an index is represented by a container. All we need to specify is the number of levels. The next level then contains twice as much elements as the previous one (factor 2). So the first level is realized by using one segments, the second one by using 2 segments, the third one by using 4 segments and so on. The segment size is not constant. Instead it depends on the number of elements those should be stored. The last level can be extended by new segments in order to have room for additional data. An extend happens only on the last level. It is necessary to insert new data by keeping the existing order of the values.

 

Image Removed

So on the last level we want to keep a kind of order without the need to reorganize the level again and again. The easiest way is something like 'append only'. Which means that we append at the end of the file by updating just pointers.

Index operations

Let's investigate the following operations:

  • INSERT
  • SCAN
  • UPDATE
  • DELETE
  • CREATE
  • RECREATE

SCAN

  1. Let's assume we are searching for the value v_17
  2. We iterate over the first level: v_17 >= v_1 --> v_17 >= v_7 --> v_17 >= v_13 --> v_17 < v_18 --> v_17 is inside the cluster with the id v_13.
  3. We iterate over the second level: v_17 >= v_13 --> v_17 >= v_16 --> v_14 --> v_17 < v_18 --> v_17 is inside the cluster with the id v_16
  4. We iterate over the elements of the cluster v_16: v_16 --> v_17 --> We found v_17 and it's corresponding pointer to the data set.

INSERT

  1. Let's only look at the last level, which is this ordered list. We want to avoid reorganization efforts and so need something which avoids to sort the list again and again.
  2. We iterate over levels by finding that v_16 is the right cluster.
  3. Then we iterate over the elements of the cluster v_16: v_17 >= v_16  --> v_17 <= v_18 --> So what we need to do is to add v_17 between v_16 and v_18 (surprise (wink) )
  4. Instead adding v_17 between them, we perform an append operation by setting pointers

Image Removed

 

UPDATE

An Update does only affect the last level, too. An Update means to find a specific value in the index by replacing it by another one. Usually we would just append the new value by changing the pointers as shown in the Insert example.