User Tools

Site Tools


notes:csharp:linq

LINQ

LINQ is based on query operators, defined as extension methods. It works with any object that implements the IEnumerable<T> or IQueryable<T> interface.

A LINQ query is not executed until it is iterated. This is called deferred execution. Iterating occurs when a method like ToList is called or the results are iterated using the foreach statement (which calls the MoveNext method on the iterator).

LINQ queries may be written using query syntax or method syntax. Below, there are a few examples of equivalent expressions in query syntax and method syntax.

Not all LINQ operators are supported in query syntax. The compiler always transforms the query syntax into method syntax.

Example #1:

// Collection used in example #1.
class Item
{
    public int Id { get; set; }
    public string Text { get; set; }
 
    public override string ToString() { return $"Id={Id}, Text={Text}"; }  
}
...
List<Item> items = new List<Item>()
{
    new Item { Id = 1, Text = "aaa" },
    new Item { Id = 2, Text = "bbb" },
    new Item { Id = 3, Text = "ccc" }
};
Query Syntax Method Syntax Result
var query = from n in items 
            select n.Text; 
var query = items 
           .Select(n => n.Text); 
aaa bbb ccc
var query = from n in items
            where n.Id < 3
            select n.Text; 
var query = items
           .Where(n => n.Id < 3)
           .Select(n => n.Text); 
aaa bbb
var query = from n in items
            orderby n.Text descending, n.Id
            select n; 
var query = items
           .OrderByDescending(n => n.Text)
           .ThenBy(n => n.Id)
           .Select(n => n); // optional 
Id=3, Text=ccc
Id=2, Text=bbb
Id=1, Text=aaa

Example #2:

// Collection used in example #2.
class Code
{
    public string Prefix { get; set; }
    public int[] Numbers { get; set; }
}
...
List<Code> codes = new List<Code>()
{
    new Code { Prefix = "A", Numbers = new[] { 1,2,3 } },
    new Code { Prefix = "B", Numbers = new[] { 4,5,6 } },
    new Code { Prefix = "C", Numbers = new[] { 7,8,9 } }
};  
Query Syntax Method Syntax Result
var query = from c in codes
            from n in c.Numbers
            where n < 5
            select n; 
var query = codes
           .SelectMany(c => c.Numbers)
           .Where(n => n < 5); 
1 2 3 4

Example #3:

// Collection used in example #3.
int[] nums = { 82, 35, 20 };
Query Syntax Method Syntax
var query = 
    from n in nums
    let p = n * 0.1
    orderby p
    select new { Number = n, TenPercent = p }; 
var query = 
    nums
   .Select(n => new { n, p = n * 0.1 })
   .OrderBy(x => x.p) // 'x' is a transparent identifier
   .Select(x => new { Number = x.n, TenPercent = x.p }); 

Results:

{ Number = 20, TenPercent = 2 }
{ Number = 35, TenPercent = 3.5 }
{ Number = 82, TenPercent = 8.2 }

Example #4:

// Collection used in example #4.
string[] names = { "A B", "X Y Z", "B 52" };
Query Syntax Method Syntax Result
var query = from name in names
            from word in name.Split()
            select word; 
var query = names
           .SelectMany(name => name.Split()); 
A B X Y Z B 52

To show the query results, simply iterate over elements:

foreach (var x in query)
    Console.WriteLine(x.ToString());

Inner Joins

The result of the inner join is a sequence of all the pairs of elements where the key from the first element is the same as the key from the second element.

Example: Match parents to children. The left sequence is streamed (Children), but the right sequence is buffered (Parents). It means that if one sequence is significantly bigger than the other, it's worth using the small sequence on the right of the join.

// Note that we are using a smaller sequence on the right of the join for performance reasons.
var query = from c in children
            join p in parents
                on c.ParentId equals p.Id
            select new { p.Name, ChildName = c.Name };
 
foreach (var entry in query)
    Console.WriteLine($"{entry.Name}: {entry.ChildName}");
Fred: Sophia
Fred: Jackson
Fred: Abigail
Alice: Madison
Alice: Alexander

Example: Filter a joined sequence. Note that we apply filtering before the join. Also, the query expression is simpler if the left sequence is the one requiring filtering.

Filtering before the join is more efficient than filtering afterward.

var query = from c in children
            where c.Name.StartsWith("A") // filter the left sequence
            join p in parents
                on c.ParentId equals p.Id
            select new { p.Name, ChildName = c.Name };
 
foreach (var entry in query)
    Console.WriteLine($"{entry.Name}: {entry.ChildName}");
Fred: Abigail
Alice: Alexander

Test data:

class Parent
{
    public int Id { get; set; }
    public string Name { get; set; }
}
 
class Child
{
    public int Id { get; set; }
    public int ParentId { get; set; }
    public string Name { get; set; }
}
...
List<Parent> parents = new List<Parent>();
parents.Add(new Parent { Id = 1, Name = "Fred" });
parents.Add(new Parent { Id = 2, Name = "Alice" });
parents.Add(new Parent { Id = 3, Name = "Sam" });
 
List<Child> children = new List<Child>();
children.Add(new Child { Id = 1, ParentId = 1, Name = "Sophia" });
children.Add(new Child { Id = 2, ParentId = 1, Name = "Jackson" });
children.Add(new Child { Id = 3, ParentId = 1, Name = "Abigail" });
children.Add(new Child { Id = 4, ParentId = 2, Name = "Madison" });
children.Add(new Child { Id = 5, ParentId = 2, Name = "Alexander" });

Group Joins

Each element of a group join consists of an element from the left sequence and also a sequence of all the matching elements of the right sequence. The sequence is empty if the left element doesn't match any right elements.

Example: Return all parents and their children (if any):

var query = from p in parents
            join c in children
                on p.Id equals c.ParentId
                into groupedChildren
            select new
            {
                Parent = p,
                Children = groupedChildren
            };
 
foreach (var entry in query)
{
    Console.Write(entry.Parent.Name + ": ");
    foreach (var child in entry.Children)
        Console.Write(child.Name + " ");
    Console.WriteLine();
}
Fred: Sophia Jackson Abigail
Alice: Madison Alexander
Sam:

Example: Return all the parents and count their children:

var query = from p in parents
            join c in children
                on p.Id equals c.ParentId
                into groupedChildren
            select new
            {
                Parent = p,
                Count = groupedChildren.Count()
            };
 
foreach (var entry in query)
    Console.WriteLine($"{entry.Parent.Name}: {entry.Count}");
Fred: 3
Alice: 2
Sam: 0

Test data is the same as for Inner Join.

Cross Joins

The result of a cross join contains every pair of elements from input sequences.

Example: Multiply each element of one array with each element of another array:

int[] data1 = { 3, 5, 7 };
int[] data2 = { 4, 5, 6 };
 
var query = from d1 in data1
            from d2 in data2
            select new { d1, d2, mul = d1 * d2};
 
foreach(var data in query)
    Console.WriteLine("{0} * {1} = {2}", data.d1, data.d2, data.mul);
3 * 4 = 12
3 * 5 = 15
3 * 6 = 18
5 * 4 = 20
5 * 5 = 25
5 * 6 = 30
7 * 4 = 28
7 * 5 = 35
7 * 6 = 42

Example: In this example the right sequence (numbers) depends on the current value of the left sequence (codes):

Code[] codes = 
{
    new Code { Prefix = "A", Numbers = new int[] { 1, 2, 3 } },
    new Code { Prefix = "B", Numbers = new int[] { 4, 5, 6 } },
    new Code { Prefix = "C", Numbers = new int[] { 7, 8, 9 } }
};
 
var query = from c in codes
            from n in c.Numbers // SelectMany
            select new { code = c.Prefix + n };
 
foreach (var entry in query)
    Console.WriteLine(entry.code); // A1 A2 A3 B4 B5 B6 C7 C8 C9
 
...
 
class Code
{
    public string Prefix { get; set; }
    public int[] Numbers { get; set; }
}

This is how it works:

Step #1 - Each element of the left sequence is used to generate a right sequence:

left right
A 1, 2, 3
B 4, 5, 6
C 7, 8, 9

Step #2 (flattening) - The left element is paired with each element of the new sequence:

left right
A 1
A 2
A 3
B 4
B 5
B 6
C 7
C 8
C 9

Grouping

The result of grouping is a sequence where each element is itself a sequence of grouped elements. The result has also a Key property, which is the key for that group.

Query continuation is closely related to grouping. It provides a way of using the result of one query expression as the initial sequence of another. It applies to group by and select.

Test data used with grouping examples:

class Person
{
    public string Name { get; set; }
    public int Age { get; set; }
    public string City { get; set; }
    public string Country { get; set; }
}
...
Person[] people =
{
    new Person { Name = "Mike", Age = 41, City = "Toronto", Country = "Canada" },
    new Person { Name = "Henryk", Age = 23, City = "Lublin", Country = "Poland" },
    new Person { Name = "Alice", Age = 31, City = "Ottawa", Country = "Canada" },
    new Person { Name = "John", Age = 16, City = "Toronto", Country = "Canada" },
    new Person { Name = "Bruce", Age = 59, City = "Toronto", Country = "Canada" },
    new Person { Name = "Piotr", Age = 48, City = "Lublin", Country = "Poland" }
};  

Example: Group people by their country:

var query = from p in people
            group p by p.Country; // key
 
foreach (var entry in query) // each entry has a sequence of people
{
    Console.Write($"{entry.Key}: ");
    foreach (var person in entry)
        Console.Write($"{person.Name} ");
    Console.WriteLine();
}
Canada: Mike Alice John Bruce
Poland: Henryk Piotr

Example: Group people by their country and city:

var query = from p in people
            group p by new { p.Country, p.City }; // key is an anonymous type
 
foreach (var entry in query)
{
    Console.Write($"{entry.Key.Country}/{entry.Key.City}: ");
    foreach (var person in entry)
        Console.Write($"{person.Name} ");
    Console.WriteLine();
}
Canada/Toronto: Mike John Bruce
Poland/Lublin: Henryk Piotr
Canada/Ottawa: Alice

Example: Obtain the number of the people in each country/city:

var query = from p in people
            group p by new { p.Country, p.City } into g
            select new
            {
                Place = g.Key,
                Count = g.Count()
            };
 
foreach (var entry in query)
{
    Console.WriteLine($"{entry.Place.Country}/{entry.Place.City}: {entry.Count}");
}
Canada/Toronto: 3
Poland/Lublin: 2
Canada/Ottawa: 1

Example: Obtain the number of the people in each country/city and sort them ascending by the count:

var query = from p in people
            group p by new { p.Country, p.City } into g
            select new
            {
                Place = g.Key,
                Count = g.Count()
            }
            into result
            orderby result.Count
            select result; // the 'select' statement is required in query syntax
 
foreach (var entry in query)
{
    Console.WriteLine($"{entry.Place.Country}/{entry.Place.City}: {entry.Count}");
}
Canada/Ottawa: 1
Poland/Lublin: 2
Canada/Toronto: 3

Example: Group people by their age and show the average age in each country:

var query = from p in people
            group p by p.Country into g
            orderby g.Key // order by country
            select new
            {
                Country = g.Key,
                AvgAge = g.Average(x => x.Age)
            };
 
foreach (var entry in query)
    Console.WriteLine($"{entry.Country}: {entry.AvgAge}");
Canada: 36.75
Poland: 35.5

Correlated Subqueries

You can use correlated subqueries for mapping a relational object model to a hierarchical object model. Keep in mind that with local queries this technique is inefficient because every combination of outer and inner elements must be enumerated to get the few matching combinations. A better choice for local queries is Join or GroupJoin.

Example: Obtain a list of subfolders located in a given folder (C:\Temp) and all the files in each subfolder:

using System.IO;
using System.Linq;
...
DirectoryInfo[] dirs = new DirectoryInfo(@"C:\Temp").GetDirectories();
 
var query = from d in dirs
            where (d.Attributes & FileAttributes.System) == 0
            select new
            {
                DirectoryName = d.FullName,
                Created = d.CreationTime,
 
                // This is a correlated subquery. It references the object 'd' from the outer query.
                Files = from f in d.GetFiles()
                        where (f.Attributes & FileAttributes.Hidden) == 0
                        select new { FileName = f.Name, f.Length, }
            };
 
foreach (var dir in query)
{
    Console.WriteLine("Directory: {0}, Created: {1:d}", dir.DirectoryName, dir.Created);
    foreach (var file in dir.Files)
        Console.WriteLine("  {0} ({1:N0})", file.FileName, file.Length);
}
Directory: C:\Temp\OldNotes, Created: 2015-05-05
  abc.txt (187)
  Something Important.txt (327)
  Old document.doc (4,944)
Directory: c:\Temp\Backup, Created: 2015-04-20
  notes.tar (6,078)
  Pictures from Kathmandu.zip (2,345,020)

Example: Obtain the list of products each customer ordered at least once. Also, obtain a list of suppliers in the same city as the customer:

var customers = new List<Customer>();
var suppliers = new List<Supplier>();
 
// ... populate the customers, their orders, and the suppliers
 
var query = from c in customers
            select new
            {
                c.CustomerName,
                c.City,
                Products = (from o in c.Orders
                            select new
                            {
                                o.ProductName
                            }).Distinct(),
                Suppliers = from s in suppliers
                            where s.City == c.City
                            select s
            };         
...
public class Customer
{
    public string CustomerName { get; set; }
    public string City { get; set; }
    public List<Order> Orders { get; set; }
}
 
public class Order
{
    public int Quantity { get; set; }
    public string ProductName { get; set; }
}
 
public class Supplier
{
    public string SupplierName { get; set; }
    public string City { get; set; }
}            

Let

The let clause introduces a new variable with a value that can be based on other variables.

Example: Introduce a variable 'p' representing 10% of the variable 'n':

int[] nums = { 82, 35, 20 };
 
var query = from n in nums
            let p = n * 0.1
            orderby p
            select new { Number = n, TenPercent = p };
 
foreach (var entry in query)
    Console.WriteLine($"{entry.Number}: {entry.TenPercent:F2}");
20: 2.00
35: 3.50
82: 8.20

Operators

According to the book C# 5.0 in a Nutshell (Chapter 9, pages 377-379) LINQ operators can be grouped into three categories:

  • Operators that accept a sequence (or multiple sequences) and emit a single sequence.
  • Operators that accept a sequence and emit an element or a scalar.
  • Operators that generate an output sequence.

Cast and TypeOf

Both Cast and TypeOf are:

  • extension methods on the nongeneric IEnumerable type
  • take an untyped sequence
  • return a strongly typed sequence
  • Cast casts each element to the target type and fails if any element is not of the right type.
  • OfType skips any elements of the wrong type.

Example: OfType

ArrayList list = new ArrayList(); // ArrayList stores object elements
list.Add(2);
list.Add(4);
list.Add("AA");
list.Add(5);
list.Add("BB");
 
IEnumerable<int> nums = list.OfType<int>();
IEnumerable<string> strings = list.OfType<string>();
 
foreach (int i in nums) Console.WriteLine(i); // 2 4 5
foreach (string s in strings) Console.WriteLine(s); // AA BB
 
// Check if a collections contains any elements of a given type.
bool hasStrings = list.OfType<string>().Any();    // true
bool hasObjects = list.OfType<object>().Any();    // true - all elements derive from object
bool hasDateTime = list.OfType<DateTime>().Any(); // false - there are no DateTime elements

Example: Cast

int[] arr1 = { 1, 2, 3 };
List<int> result1 = arr1.Cast<int>().ToList();
foreach (int i in result1) Console.WriteLine(i); // 1 2 3
 
object[] arr2 = { 1, "A", 3 };
 
// throws an exception "Specified cast is not valid"
List<int> result2 = arr2.Cast<int>().ToList(); 

Skip and Take

Example: Use Skip and Take to implement paging:

int[] nums = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
 
const int SIZE = 4; // page size
 
var paged1 = nums.Skip(0 * SIZE).Take(SIZE); // 1 2 3 4
var paged2 = nums.Skip(1 * SIZE).Take(SIZE); // 5 6 7 8
var paged3 = nums.Skip(2 * SIZE).Take(SIZE); // 9 10

Union

Test data:

var list1 = new List<Item>();
list1.Add(new Item { Id = 1, Text = "AAA" });
list1.Add(new Item { Id = 2, Text = "BBB" });
list1.Add(new Item { Id = 3, Text = "CCC" });
 
var list2 = new List<Item>();
list2.Add(new Item { Id = 1, Text = "111" });
list2.Add(new Item { Id = 2, Text = "222" });
list2.Add(new Item { Id = 3, Text = "333" });
...
class Item
{
    public int Id { get; set; }
    public string Text { get; set; }
}

Example: Merge two collections using Union:

var all = list1.Union(list2);
 
foreach (var e in all)
    Console.WriteLine($"{e.Id}:{e.Text}"); // 1:AAA 2:BBB 3:CCC etc.

Example: Merge two collections using Union. Create an anonymous type to hold elements of the resulting collection:

// Method #1
var all = list1
         .Select(e => new { Name = e.Text })
         .Union(list2.Select(e => new { Name = e.Text }));
 
// Method #2
var all = (from e in list1 select new { Name = e.Text })
         .Union
          (from e in list2 select new { Name = e.Text });
 
foreach (var e in all)
    Console.WriteLine(e.Name);

Except

Example: Select only elements that are present in arr1 but are missing in arr2:

int[] arr1 = { 1, 2, 3, 4, 5 };
int[] arr2 = { 3, 5 };
 
List<int> missing = arr1.Except(arr2).ToList(); // 1,2,4

String comparison

Example: Use StringComparison.OrdinalIgnoreCase to improve performance in a case-insensitive filter:

string[] names = { "A1", "a2", "B1" };
 
List<string> filtered1 = names.Where(n => 
    n.StartsWith("a")).ToList(); // a1
 
List<string> filtered2 = names.Where(n => 
    n.StartsWith("a", StringComparison.OrdinalIgnoreCase)).ToList(); // A1,a2

Exception Handling

  • Wrap the enumeration of the query result with a try…catch block rather than the query itself (a consequence of deferred execution).
  • Avoid using the results of methods or constructors directly as data sources for a query. Instead, assign their results to instance variables, wrapping the variable assignment with a try…catch block

LINQ Implementations

A few of the LINQ implementations:

  • LINQ to Objects - Manipulates collections of objects (System.Linq namespace).
  • LINQ to ADO.NET - Manipulates relational data:
    • LINQ to SQL - Mapping between custom .NET types and the physical SQL table schema.
    • LINQ to Entities - Handles an Object Relational Mapping (ORM) that uses a conceptual Entity Data Model (EDM) rather than a physical data layer.
    • LINQ to DataSet - Enables querying a DataSet by using LINQ.
  • LINQ to XML (System.Xml.Linq namespace)
  • LINQ to Rx (Reactive Extensions) - Provides push model for data flow; built on IObservable<T> and IObserver<T>.

PLINQ

  • Parallel Language-Integrated Query (PLINQ) can turn a sequential query into a parallel one.
  • Parallel processing does not guarantee any particular order.
  • Extension methods for using PLINQ are defined in the System.Linq.ParallelEnumerable class.

Methods that can be used to alter how the query behaves:

  • AsUnordered - Makes an ordered query unordered. Unordered queries are more efficient than ordered ones.
  • AsSequential - Disables parallelization.
  • WithCancellation - Specifies a cancellation token.
  • WithDegreeOfParallelism - Specifies the maximum number of concurrent tasks used to execute the query.
  • WithExecutionMode - Can be used to force the query to execute in parallel.
  • WithMergeOptions - Tweaks how the results are buffered.

Example: Convert a query into a parallel one by calling AsParallel on the input sequence:

var numbers = Enumerable.Range(0, 10);
var parallelResult = numbers.AsParallel()
    .Where(i => i % 2 == 0)
    .ToArray();
 
foreach (int i in parallelResult)
    Console.WriteLine(i);
 
// Note that the result is usually in a different order as input, 
// for example: 0 4 6 8 2 

Example: Preserve ordering by using the AsOrdered operator. The query is processed in parallel and the results are buffered and sorted:

var numbers = Enumerable.Range(0, 10);
var parallelResult = numbers.AsParallel().AsOrdered()
    .Where(i => i % 2 == 0)
    .ToArray();
 
// Output: 0 2 4 6 8
foreach (int i in parallelResult)
    Console.WriteLine(i);

Example: If you just need to run a function over each element of a ParallelQuery, you may improve performance by using the ForAll method:

var numbers = Enumerable.Range(0, 10);
var parallelResult = numbers.AsParallel()
    .Where(i => i % 2 == 0);
 
// Provide a delegate to execute on each of the query.
parallelResult.ForAll(e => Console.WriteLine(e));

Example: Handle exceptions using AggregateException:

static void Main(string[] args)
{
    var numbers = Enumerable.Range(0, 11);
    try
    {
        var parallelResult = numbers.AsParallel()
            .Where(n => TestNumber(n));
 
        parallelResult.ForAll(n => Console.Write(n + " "));
    }
    catch (AggregateException exc)
    {
        Console.WriteLine("\nThere where {0} exceptions:", exc.InnerExceptions.Count);
 
        // Examine the exceptions by accessing the exc.InnerExceptions collection.
        foreach (Exception innerExc in exc.InnerExceptions)
            Console.WriteLine(innerExc.Message);
    }
}
 
public static bool TestNumber(int n)
{
    if (n == 0 || n == 10)
        throw new ArgumentException(String.Format("The number is {0}.", n));
    return true;
}

Output:

2 8 9 1 3 4 5
There where 2 exceptions:
The number is 0.
The number is 10.
notes/csharp/linq.txt · Last modified: 2017/05/04 by leszek