Dot Net Thoughts

January 1, 2008

Persisting the Doubly Linked List

Filed under: Misc Thoughts — dotnetthoughts @ 9:54 am
Tags: , , ,

Recently, we discovered that I needed a doubly linked list to chain objects together in our code. .Net has made this an incredibly easy process, as it provides a LinkedList generic object which manages the creation of the list, as well as the inserting and the deletion of the nodes.

Our project also requires that we persist our linked list to a database. The task seems easy enough. All that needed to be done was to create a table which contains our data, as well as pointers to the previous and next nodes. In other words, our initial table structure looked like this:

Id            int 
ParentId      int (FK to Id) 
ChildId       int (FK to Id) 
Description   nvarchar(30)

This structure seems to work on the surface, but we very quickly realized two very critical problems.

The first problem is that inserting the data into this data structure requires two passes. On the first pass, we insert all of the records into the database. Only after all of the records have been inserted can we assign links to both the parent and child records in the ParentId and ChildId columns.

foreach (link in theChain) 
   {  //Insert the record. }  

foreach (link in theChain) 
   {  //Update the record to include the parent and child pointers }

 The second problem is that the data can fall out of sync with itself. For example, what happens if the data ends up looking like this due to some misbehaving code? Id 1 believes that the child record should be Id 2, but Id 2 believes that it is the top of its own chain.

Id: 1   ParentId: null   ChildId: 2 
Id: 2   ParentId: null   ChildId: 3

Both of these problems can be solved by treating the doubly linked list as a singly linked list in the database. If you have the links of a chain going in one direction, you should be able to determine the links going the other way. We initally avoided this option, because we thought the query to retrieve the data would be extremely complex. (Query the parent with a union of the child, maybe into a temporary table. Ugh.)

While on a walk, yesterday, though, I came up with the idea of simply writing a query with an additional join that would return the data with the links in both directions. Our database would no longer need the ChildId column. If we order our data so that parents always fall above their children (the natural state of a linked list), we can insert all of this data in a single pass. Since there is no ChildId, the data can’t become inconsistent.

Id            int 
ParentId      int (FK to Id) 
Description   nvarchar(30)

We retrieving data to recreate the LinkedList in code, we can get both parent and child ids by linking the LinkedList table to itself.

SELECT parentList.Id, parentList.ParentId, childList.Id AS ChildId
FROM LinkedList parentList
LEFT JOIN LinkedList childList ON parentList.Id = childList.ParentId

It’s always a neat experience when an elegant solution comes out of the blue to solve a complex problem. I’m amazed at how often walking away and letting the subconscience mind work will lead to a better solution than when it is being actively developed. Seems like a good New Year’s resolution will be to walk more. Leads to a healthier me, and healthier code.

Good luck and code safe!

MW

Create a free website or blog at WordPress.com.