Tim

Footprints in the snow of a warped mind

Using Linq and Extension Methods to chunk large data sets

Where to find me

Flickr Icon  Twitter Icon  Linked In Icon  FaceBook Icon  Windows Live Alerts Butterfly  RSS 2.0 

Business Protection by Crisis Cover

Tag Cloud

AJAX (4) Analysis (1) ASP (6) ASP.Net (56) Error Reporting (4) Web Service (2) WSDL (1) Atlas (2) Born In The Barn (1) Business (85) Business Start-up Advice (28) Client (16) Expanding Your Business (20) Recruitment (1) C# (20) Canoeing (4) Canoe Racing (5) Cheshire Ring Race (5) Racing (2) Training (4) CIMA (1) Cisco (1) 7970G (1) CMS (1) Code Management (1) Cohorts (1) Commerce4Umbraco (1) Content (1) Content Management (1) Content Management System (1) CSS (3) dasBlog (5) DDD (1) Design (10) Icons (1) Development (21) eCommerce (8) Employment (2) General (39) Christmas (6) Fun and Games (11) Internet (22) Random (46) RX-8 (8) Helpful Script (3) Home Cinema (2) Hosting (2) HTML (1) IIS (11) iPhone (1) JavaScript (4) jQuery (1) Marketing (6) Email (1) Multipack (1) MVC (1) Networking (3) Nintendo (1) Nuget (1) OS Commerce (1) Payment (1) Photography (1) PHP (1) PowerShell (2) Press Release (1) Productivity (2) Random Thought (1) Security (2) SEO (5) Server Maintenance (6) Server Management (11) Social Media (2) Social Networking (3) Experiment (1) Software (10) Office (5) Visual Studio (13) Windows (4) Vista (1) SQL (8) SQL Server (19) Statistics (1) Stored Procedure (1) TeaCommerce (1) Testing (2) The Site Doctor (124) Turnover Challenge (1) Twitter (3) uCommerce (9) Umbraco (29) 2009 (1) 2011 (1) Web Development (65) WebDD (33) Wii (1) XSLT (1)

Blog Archive

Search

<August 2010>
SunMonTueWedThuFriSat
25262728293031
1234567
891011121314
15161718192021
22232425262728
2930311234

Recent Comments

Blog Archive

Various Links

Blogs I Read

[Feed] Google Blog
Official Google Webmaster Central Blog
[Feed] Matt Cutts
Gadgets, Google, and SEO
[Feed] Ol' Deano's Blog
My mate Dean's blog on my space, equally as random as mine but not off on as much of a tangent!
[Feed] Sam's Blog
Sam is one of my younger brothers studying Product Design and Manufacture at Loughborough, this is his blog :) Enjoy!

Disclaimer
The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.

newtelligence dasBlog 2.2.8279.16125

Send mail to the author(s) Email Me (Tim Gaunt)

© 2012 Tim Gaunt.

Sign In

# Thursday, August 12, 2010

Using Linq and Extension Methods to chunk large data sets

Thursday, August 12, 2010 9:32:44 AM (GMT Daylight Time, UTC+01:00)

Ever needed to take a large list and split it into smaller subsets of data for processing? Well this is the Extension Method for you. Tonight we had to split a small dataset (500 items) into even smaller sets of 10 so the provider’s web service wouldn’t timeout.

Seeing as I was going to miss out on my evening, I thought I’d see if I could do it a little differently using Linq and this is what I came up with:

/// <summary>
/// Simple method to chunk a source IEnumerable into smaller (more manageable) lists
/// </summary>
/// <param name="source">The large IEnumerable to split</param>
/// <param name="chunkSize">The maximum number of items each subset should contain</param>
/// <returns>An IEnumerable of the original source IEnumerable in bite size chunks</returns>
public static IEnumerable<IEnumerable<TSource>> ChunkData<TSource>(this IEnumerable<TSource> source, int chunkSize)
{
    for (int i = 0; i < source.Count(); i += chunkSize)
        yield return source.Skip(i).Take(chunkSize);
} 

It should extend any IEnumerable and allow you to split it into smaller chunks which you can then process to your heart’s content.

Here’s a quick example of it in use:

var list = new List<string>() { "Item 1", "Item 2", "Item 3", "Item 4", "Item 5", "Item 6", "Item 7", "Item 8", "Item 9", "Item 10" };
Console.WriteLine("Original list is {0} items", list.Count);
var chunked = list.ChunkData(3);
Console.WriteLine("Returned the data in {0} subsets", chunked.Count());
int i = 1;
foreach (var subset in chunked)
{
    Console.WriteLine("{0} items are in subset #{1}", subset.Count(), i++);
    int si = 1;
    foreach (var s in subset)
        Console.WriteLine("\t\tItem #{0}: {1}", si++, s);
}

And this will output

Original list is 10 items
Returned the data in 4 subsets
3 items are in subset #1
		Item #1: Item 1
		Item #2: Item 2
		Item #3: Item 3
3 items are in subset #2
		Item #1: Item 4
		Item #2: Item 5
		Item #3: Item 6
3 items are in subset #3
		Item #1: Item 7
		Item #2: Item 8
		Item #3: Item 9
1 items are in subset #4
		Item #1: Item 10

2 lines of code to do all that work -Neat

 

Don't forget to follow me on Twitter.