How parallel LINQ works
.Net Framework 4.0 has parallel computing extensions for LINQ. Previously it was possible to download parallel extensions for LINQ separately from CodePlex. Of course, you can still use these extensions if you have older version that 4.0. I wrote a little, simple and pretty pointless example that illustrates how parallel queries work in .Net Framework 4.0.
My example creates simple data source that contains integers from 1 to 100. Then it queries this source two times. At first we will run query with parallel extensions and then we will run usual LINQ query. After querying we will write out the results. So it is really simple code but it returns some interesting results.
private static void ParallelAndSerialQueries()
{
// Source data that parallel and serial queries use.
var source = Enumerable.Range(1, 100);
// Perform parallel query
var parallelQuery = from p in source.AsParallel()
where p.ToString().Contains("1")
select p;
parallelQuery.ForAll((x) =>
{
Console.WriteLine("Parallel: " + x);
});
// Perform serial query
var serialQuery = from s in source
where s.ToString().Contains("1")
select s;
serialQuery.All((x) =>
{
Console.WriteLine("Serial: " + x);
return true;
});
Console.WriteLine("Press any key to exit...");
Console.ReadLine();
}
Serial | Parallel |
---|---|
1 10 11 12 13 14 15 16 17 18 19 21 31 41 51 61 71 81 91 100 | 10 11 1 14 12 13 16 17 18 19 21 31 15 51 61 41 71 81 100 91 |
Now let’s compare data that queries returned. We can see that in the case of classic serial LINQ query and data processing all the values are handled as they were sorted. Bigger number is always after smaller number. This is the same order as numbers have in our data source.
Parallel results are different. If we look at the results we can see that numbers are not ordered all the time. Some numbers are printed out much sooner or later than we may be expected before. This is because of the parallel nature of first query.
Parallel activities doesn’t run on same processor core. Some cores have more things to do and some cores have more free resources. That’s why parallel results have seemingly random order. Okay, their order is not random and as I just said it depends on workload of cores.
Is parallel processing more powerful?
Parallel computing is not another silver bullet that automagically solves performance issues. In the current example serial query was about two times faster that parallel one (exact numbers with 10000 elements in source: serial – 0.68 seconds, parallel – 1.31 seconds). Of course there are scenarios when parallel computing performs way better than serial computing. I recommend you to test the performance of your LINQ queries before you make your final decision over serial and parallel processing.