Recursive Hierarchical Joins in C# and LINQ

Here's a clean solution to doing recursive hierarchical joins in LINQ and C#. I couldn't find an example that didn't use an intermediate "node" class, so I'd like to introduce my approach which provides extension methods to IEnumerabe<T> for doing recursive hierarchical joins but doesn't use any baked in intermediate traversal interface/class and instead allows you to define your own classes for the resulting hierarchy.

Modelling Hierarchical Data

First off, here's some flat data that has a parent-child relationship:

Text | ID | Parent ID
A    | 1  | -
B    | 2  | -
C    | 3  | 1
D    | 4  | 1
E    | 5  | 2

This data can be represented using the following class:

public class FlatData
{
   public string Text { get; set; }
   public int Id { get; set; }
   public int ParentId { get; set; }
}

However I'd much rather have it in the format of a hierarchical model:

A
 \_ 
    C
 \_ 
    D
B
 \_ 
    E

Which could be represented by this class:

public class NodeData
{
   public string Text { get; set; }
   public IEnumerable<NodeData> Children { get; set; }
}

So, how do we get from FlatData to NodeData?

Enter the Recursive Hierarchical Join LINQ Extension Methods

Using my RecursiveJoin LINQ extension method, you can transform the flat data (represented by FlatData) into hierarchical data (represented by NodeData) as follows:

FlatData[] elements = new FlatData[]
{
   new FlatData {Id = 1, Text = "A"},
   new FlatData {Id = 2, Text = "B"},
   new FlatData {Id = 3, ParentId = 1, Text = "C"},
   new FlatData {Id = 4, ParentId = 1, Text = "D"},
   new FlatData {Id = 5, ParentId = 2, Text = "E"}
};

IEnumerable<NodeData> nodes = elements.RecursiveJoin(element => element.Id,
   element => element.ParentId,
   (FlatData element, IEnumerable<NodeData> children) => new NodeData()
   {
      Text = element.Text,
      Children = children
   });

There will be two root nodes: A and B. A will have Children C and D. B will have children E.

The example above uses the simplest form of the RecursiveJoin method which has the following signature:

public static IEnumerable<TResult> RecursiveJoin<TSource, TKey, TResult>(this IEnumerable<TSource> source,
   Func<TSource, TKey> parentKeySelector,
   Func<TSource, TKey> childKeySelector,
   Func<TSource, IEnumerable<TResult>, TResult> resultSelector);

When using the RecursiveJoin method, it's necessary to specify the parameters in the resultSelector lambda, otherwise the compiler gets confused (unfortunately!).

More Complex Recursive Hierarchical Joins

We can also get the index and depth of each element in the list we are recursively joining on. Let's say we want to store the depth of each node in the hierarchy using the following class:

public class DeepNodeData
{
   public int Depth { get; set; }
   public string Text { get; set; }
   public IEnumerable<DeepNodeData> Children { get; set; }
}

We can use one of the overloaded versions of the RecursiveJoin method to populate the Depth property:

FlatData[] elements = new FlatData[]
{
   new FlatData {Id = 1, Text = "A"},
   new FlatData {Id = 2, Text = "B"},
   new FlatData {Id = 3, ParentId = 1, Text = "C"},
   new FlatData {Id = 4, ParentId = 1, Text = "D"},
   new FlatData {Id = 5, ParentId = 2, Text = "E"}
};

IEnumerable<DeepNodeData> nodes = elements.RecursiveJoin(element => element.Id,
   element => element.ParentId,
   (FlatData element, int index, int depth, IEnumerable<DeepNodeData> children) =>
   {
      return new DeepNodeData()
      {
         Text = element.Text,
         Children = children,
         Depth = depth
      };
   });

The example above uses the following form of the RecursiveJoin method which has this signature:

public static IEnumerable<TResult> RecursiveJoin<TSource, TKey, TResult>(this IEnumerable<TSource> source,
   Func<TSource, TKey> parentKeySelector,
   Func<TSource, TKey> childKeySelector,
   Func<TSource, int, int, IEnumerable<TResult>, TResult> resultSelector);

The Recursive Hierarchical Join LINQ Extension Methods

I chose to call the extension methods RecursiveJoin rather than HierarchicalJoin as I think "recursive" is easier to type and spell than "hierarchical" or "hierarchy" (although intellisence does a good job of helping out)!

using System;
using System.Collections.Generic;
using System.Linq;

public static class RecursiveJoinExtensions
{
   public static IEnumerable<TResult> RecursiveJoin<TSource, TKey, TResult>(this IEnumerable<TSource> source,
      Func<TSource, TKey> parentKeySelector,
      Func<TSource, TKey> childKeySelector,
      Func<TSource, IEnumerable<TResult>, TResult> resultSelector)
   {
      return RecursiveJoin(source, parentKeySelector, childKeySelector,
         resultSelector, Comparer<TKey>.Default);
   }

   public static IEnumerable<TResult> RecursiveJoin<TSource, TKey, TResult>(this IEnumerable<TSource> source,
      Func<TSource, TKey> parentKeySelector,
      Func<TSource, TKey> childKeySelector,
      Func<TSource, int, IEnumerable<TResult>, TResult> resultSelector)
   {
      return RecursiveJoin(source, parentKeySelector, childKeySelector,
         (TSource element, int depth, int index, IEnumerable<TResult> children)
            => resultSelector(element, index, children));
   }

   public static IEnumerable<TResult> RecursiveJoin<TSource, TKey, TResult>(this IEnumerable<TSource> source,
      Func<TSource, TKey> parentKeySelector,
      Func<TSource, TKey> childKeySelector,
      Func<TSource, IEnumerable<TResult>, TResult> resultSelector,
      IComparer<TKey> comparer)
   {
      return RecursiveJoin(source, parentKeySelector, childKeySelector,
         (TSource element, int depth, int index, IEnumerable<TResult> children)
            => resultSelector(element, children), comparer);
   }

   public static IEnumerable<TResult> RecursiveJoin<TSource, TKey, TResult>(this IEnumerable<TSource> source,
      Func<TSource, TKey> parentKeySelector,
      Func<TSource, TKey> childKeySelector,
      Func<TSource, int, IEnumerable<TResult>, TResult> resultSelector,
      IComparer<TKey> comparer)
   {
      return RecursiveJoin(source, parentKeySelector, childKeySelector,
         (TSource element, int depth, int index, IEnumerable<TResult> children)
            => resultSelector(element, index, children), comparer);
   }

   public static IEnumerable<TResult> RecursiveJoin<TSource, TKey, TResult>(this IEnumerable<TSource> source,
      Func<TSource, TKey> parentKeySelector,
      Func<TSource, TKey> childKeySelector,
      Func<TSource, int, int, IEnumerable<TResult>, TResult> resultSelector)
   {
      return RecursiveJoin(source, parentKeySelector, childKeySelector,
         resultSelector, Comparer<TKey>.Default);
   }

   public static IEnumerable<TResult> RecursiveJoin<TSource, TKey, TResult>(this IEnumerable<TSource> source,
      Func<TSource, TKey> parentKeySelector,
      Func<TSource, TKey> childKeySelector,
      Func<TSource, int, int, IEnumerable<TResult>, TResult> resultSelector,
      IComparer<TKey> comparer)
   {
      // prevent source being enumerated more than once per RecursiveJoin call
      source = new LinkedList<TSource>(source);

      // fast binary search lookup
      SortedDictionary<TKey, TSource> parents = new SortedDictionary<TKey, TSource>(comparer);
      SortedDictionary<TKey, LinkedList<TSource>> children
         = new SortedDictionary<TKey, LinkedList<TSource>>(comparer);

      foreach (TSource element in source)
      {
         parents[parentKeySelector(element)] = element;

         LinkedList<TSource> list;

         TKey childKey = childKeySelector(element);

         if (!children.TryGetValue(childKey, out list))
         {
            children[childKey] = list = new LinkedList<TSource>();
         }

         list.AddLast(element);
      }

      // initialize to null otherwise compiler complains at single line assignment
      Func<TSource, int, IEnumerable<TResult>> childSelector = null;

      childSelector = (TSource parent, int depth) =>
      {
         LinkedList<TSource> innerChildren = null;

         if (children.TryGetValue(parentKeySelector(parent), out innerChildren))
         {
            return innerChildren.Select((child, index)
               => resultSelector(child, index, depth, childSelector(child, depth + 1)));
         }

         return Enumerable.Empty<TResult>();
      };

      return source.Where(element => !parents.ContainsKey(childKeySelector(element)))
         .Select((element, index) => resultSelector(element, index, 0, childSelector(element, 1)));
   }
}

It took me a while to arrive at the solution presented here as I couldn't get past wanting to use a generic class to represent each node in the output for traversal, but I'm pleased with the resultSelector approach to allow an arbitrary class to represent the hierarchy. Go forth recursively!

Published Saturday 07 April 2012, 09:43 AM by Keith
Filed under: Algorithms C# LINQ

4 Comments

# re: Recursive Hierarchical Joins in C# and LINQ

Thursday 19 April 2012, 05:51 AM by Brian Noronha

Very Nice.....This Method would fit perfectly for me while developing a nested commenting forum just like DISQUS

# re: Recursive Hierarchical Joins in C# and LINQ

Tuesday 02 October 2012, 01:05 AM by Kevin

Keith, Thanks for your instant reply about Sort feature extension. I will have a try! Best wishes to you.

# re: Recursive Hierarchical Joins in C# and LINQ

Wednesday 15 May 2013, 11:30 AM by jay

Too Good and very useful

# re: Recursive Hierarchical Joins in C# and LINQ

Tuesday 13 August 2013, 12:33 PM by Nirvit Jani

its really great.

It is very helpful to bind any tree control or menu control....

Thanks

Leave a Comment

(required)
   

(required)
   


   

(required)
   

(required)
 

Valid XHTML 1.0 Strict