Don't get the title wrong! I definitely love LINQ. Most of the people know me as the one who maximizes LINQ utilization in the projects. I'm using it from its early days in .Net 2.0 via LinqBridge!
However, if there is a razor in your hand, either you can do unbelievable surgeries or you can hurt your hand if you don't use it properly.
A problem of using LINQ is that it makes you feel smart and do stupid!
Here are some of my experiences in software development teams using LINQ.
DIFFERED vs. Materialized
I think differed execution concept is one the most amazing features in LINQ. Although it is powerful, but it can be used wrong if you don't know how to utilize it. Consider following code:
var posts = new List<Post> { new Post() {Title = "Old post", CreationYear = 2000}, new Post() {Title = "New post", CreationYear = 2016} } var selectedPosts = posts.Where(p => p.CreationYear == 2016); posts[0].CreationYear = 2016; foreach(var post in selectedPosts) { Console.WriteLine(post.Title); } // --Output: // Old post // New post
Unfortunately, the output for this code contains the "Old post". But why? We changed posts[0].CreationYear after creating selectedPosts variable! The reason is that execution of .Where is postponed or differed until the result is needed. In this example the value of selectedPosts is required in the foreach loop, so the execution happens there.
In fact, the naming of the variable selectedPosts is misleading. Because it does not contain the selected posts, it just contains a query that will return selected posts whenever it is executed. So the proper naming would be query .
It is a good practice to separate the concept of query and list while using LINQ. Let's rewrite the code using this naming convention:
// This is just the query. var query = posts.Where(p => p.CreationYear == 2016); posts[0].CreationYear= 2016 // Here the query is executed and a real list is created. var selectedPosts = query.ToList(); foreach(var post in selectedPosts) { Console.WriteLine(post.Title); }
Using the new naming, the code looks more convenient. It says there is a query, then data changes, and at the end query executes and the result is stored in selectedPosts .
Mixing the providers
LINQ architecture is based on providers. It is designed in a way that you can create providers to work with different data stores. Consider this query:
// Get the posts which created in 2016 var query = posts.Where(post.CreationYear == 2016); var selectedPosts = query.ToList();
The point is that .Where behaves differently depends on the type of posts variable. There are lots of providers like:
- LINQ to SQL
- LINQ to Objects
- LINQ to Entities
- LINQ to XML
- LINQ to Twitter
- and ...
In this example I use "LINQ to Objects" and "LINQ to Entities" to demonstrate the mistakes.
Linq to Objects
If posts is List<Post> the behavior of executing query.ToList() will be something like this:
var selectedPosts = new List<Post>(); foreach( var item in posts) { if (item.CreationYear == 2016) selectedPosts.Add(item); }
LINQ to Entities
If posts is DbSet<Post> using the Entity Framework, the behavior of executing query.ToList() will be something like this:
-- Executing a SQL query like: SELECT * FROM POSTS WHERE CREATION_YEAR == 2016
You see, it has a totally different behavior, it is executing a SQL query to evaluate query.
Mixing LINQ to Objects and LINQ to Entities
Now, it is the time to show how confusion happens. Consider the following code:
var list1 = context.Set<Post>().Where(p=>p.CreationYear == 2016); var list2 = list1.Where(p=>p.Title != null).ToList(); var list3 = list2.Where(p=>p.Author == "Mehran").ToList();
What happens in this sample?
- list1: this variable contains a IQueryable<Post> and nothing is executed for it.
- list2: this is where something is executed because of using ToList() method. Since list1 is IQueryable<Post> and the provider for running this is Linq to Entities the type of the result would be List<Post> . Entity Framework executes a query to populate the list:
SELECT * FROM POSTS WHERE CreationYear = 2016 AND Title is not null
- list3: as the Where is applying on list2 which is of type List<Post> , Linq to Objects provider will do the work.
As you see in the example because there is a ToList invocation in list2.Where(...).ToList() the next where clauses is not added to SQL query. This is very bad! I'm loading lots of posts (maybe millions of them) into the memory, and then picking a few of them (posts of Mehran) using a loop. I'm not using power of database to apply filtering on the data. But if I remove the ToList this filtering will be done by database efficiently:
SELECT * FROM POSTS WHERE CreationYear = 2016 AND Title is not null AND Author = "Mehran" -- where clause added
This query returns posts of "Mehran" which would be in tens, instead of millions! And it is very important in performance.
Complicated LINQ Queries
While using LINQ you can do lots of complex jobs with one query. You can read from an XML file, apply a formula on each line and select some lines with specified conditions with just one query. This is not good!
A problem of using LINQ is that it makes you feel smart and do stupid!
Do not write complex LINQ queries. LINQ is invented to bring clarity to your code, not the complexity! Whenever you feel cool it's time to refactor your LINQ query!