Using Linq and Extension Methods to chunk large data sets

Ever needed to take a large list and split it into smaller subsets of data for processing? Well this is the Extension Method for you. Tonight we had to split a small dataset (500 items) into even smaller sets of 10 so the provider’s web service wouldn’t timeout.

Seeing as I was going to miss out on my evening, I thought I’d see if I could do it a little differently using Linq and this is what I came up with:

/// <summary>
/// Simple method to chunk a source IEnumerable into smaller (more manageable) lists
/// </summary>
/// <param name="source">The large IEnumerable to split</param>
/// <param name="chunkSize">The maximum number of items each subset should contain</param>
/// <returns>An IEnumerable of the original source IEnumerable in bite size chunks</returns>
public static IEnumerable<IEnumerable<TSource>> ChunkData<TSource>(this IEnumerable<TSource> source, int chunkSize)
{
    for (int i = 0; i < source.Count(); i += chunkSize)
        yield return source.Skip(i).Take(chunkSize);
} 

It should extend any IEnumerable and allow you to split it into smaller chunks which you can then process to your heart’s content.

Here’s a quick example of it in use:

var list = new List<string>() { "Item 1", "Item 2", "Item 3", "Item 4", "Item 5", "Item 6", "Item 7", "Item 8", "Item 9", "Item 10" };
Console.WriteLine("Original list is {0} items", list.Count);
var chunked = list.ChunkData(3);
Console.WriteLine("Returned the data in {0} subsets", chunked.Count());
int i = 1;
foreach (var subset in chunked)
{
    Console.WriteLine("{0} items are in subset #{1}", subset.Count(), i++);
    int si = 1;
    foreach (var s in subset)
        Console.WriteLine("\t\tItem #{0}: {1}", si++, s);
}

And this will output

Original list is 10 items
Returned the data in 4 subsets
3 items are in subset #1
		Item #1: Item 1
		Item #2: Item 2
		Item #3: Item 3
3 items are in subset #2
		Item #1: Item 4
		Item #2: Item 5
		Item #3: Item 6
3 items are in subset #3
		Item #1: Item 7
		Item #2: Item 8
		Item #3: Item 9
1 items are in subset #4
		Item #1: Item 10

2 lines of code to do all that work -Neat

Author

Tim

comments powered by Disqus