Dot Net Thoughts

January 1, 2008

Persisting the Doubly Linked List

Filed under: Misc Thoughts — dotnetthoughts @ 9:54 am
Tags: , , ,

Recently, we discovered that I needed a doubly linked list to chain objects together in our code. .Net has made this an incredibly easy process, as it provides a LinkedList generic object which manages the creation of the list, as well as the inserting and the deletion of the nodes.

Our project also requires that we persist our linked list to a database. The task seems easy enough. All that needed to be done was to create a table which contains our data, as well as pointers to the previous and next nodes. In other words, our initial table structure looked like this:

Id            int 
ParentId      int (FK to Id) 
ChildId       int (FK to Id) 
Description   nvarchar(30)

This structure seems to work on the surface, but we very quickly realized two very critical problems.

The first problem is that inserting the data into this data structure requires two passes. On the first pass, we insert all of the records into the database. Only after all of the records have been inserted can we assign links to both the parent and child records in the ParentId and ChildId columns.

foreach (link in theChain) 
   {  //Insert the record. }  

foreach (link in theChain) 
   {  //Update the record to include the parent and child pointers }

 The second problem is that the data can fall out of sync with itself. For example, what happens if the data ends up looking like this due to some misbehaving code? Id 1 believes that the child record should be Id 2, but Id 2 believes that it is the top of its own chain.

Id: 1   ParentId: null   ChildId: 2 
Id: 2   ParentId: null   ChildId: 3

Both of these problems can be solved by treating the doubly linked list as a singly linked list in the database. If you have the links of a chain going in one direction, you should be able to determine the links going the other way. We initally avoided this option, because we thought the query to retrieve the data would be extremely complex. (Query the parent with a union of the child, maybe into a temporary table. Ugh.)

While on a walk, yesterday, though, I came up with the idea of simply writing a query with an additional join that would return the data with the links in both directions. Our database would no longer need the ChildId column. If we order our data so that parents always fall above their children (the natural state of a linked list), we can insert all of this data in a single pass. Since there is no ChildId, the data can’t become inconsistent.

Id            int 
ParentId      int (FK to Id) 
Description   nvarchar(30)

We retrieving data to recreate the LinkedList in code, we can get both parent and child ids by linking the LinkedList table to itself.

SELECT parentList.Id, parentList.ParentId, childList.Id AS ChildId
FROM LinkedList parentList
LEFT JOIN LinkedList childList ON parentList.Id = childList.ParentId

It’s always a neat experience when an elegant solution comes out of the blue to solve a complex problem. I’m amazed at how often walking away and letting the subconscience mind work will lead to a better solution than when it is being actively developed. Seems like a good New Year’s resolution will be to walk more. Leads to a healthier me, and healthier code.

Good luck and code safe!

MW

Advertisements

4 Comments »

  1. Hi MW,

    I’ve been reading your blog for a while, very interesting read thanks.

    Reading this post almost made me wince, as I can image it would be an annoyingly difficult thing to accomplish in a nice way. You couldn’t store the list in the database as a serialised blob or similar?

    Comment by kwiksand — January 1, 2008 @ 2:05 pm | Reply

  2. Kwiksand–

    Thanks for reading. It’s good to know that people find the blog useful.

    I really like your question, and I think I’ll blog a more complete answer in the next week or two.

    The quick answer is, yes you could serialize the object to a BLOB and store it in the database, but I really wouldn’t recommend that solution. If the signature of the object you serialize ever changes, things can get messy in a hurry. Also, you’re going to lose the ability to see precisely what is contained in the database itself. I seem to recall that there are some decorators you can use to allow you to update objects, but there are some pretty specific rules. When writing to a database, I personally choose to leave the BLOBs for data such as jpegs, gifs or other very well defined binary types.

    All that said, I wonder exactly how something like a BizTalk orchestration is persisted to the database. I’d be willing to bet it is serialized in a manner similar to your recommendation.

    Thanks again for reading!

    MW

    Comment by dotnetthoughts — January 1, 2008 @ 7:28 pm | Reply

  3. Somehow i missed the point. Probably lost in translation 🙂 Anyway … nice blog to visit.

    cheers, Vindicatory!!

    Comment by Vindicatory — June 19, 2008 @ 7:34 am | Reply

  4. […] like most of the guys here, so I couldn’t give an advice based on my experience, but the following article is by a bloke who seems to know what he is doing. The only problem I think is that when done this […]

    Pingback by Never modifying data: Are pointers a good solution? - dBforums — February 14, 2009 @ 11:05 pm | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a free website or blog at WordPress.com.

%d bloggers like this: