Join and Group By with LINQ to Entities
I have banged my head against some LINQ to Entities stuff today. It started out when I was doing a query to get some stats out of my forum. I should point out that I’m a newbe with LINQ to Entities and the Entity Framework so if I have done anything stupid here you are free to mock me in the comments.
var query = (from s in db.ForumStatsSet where s.LogDate >= date1 && s.LogDate <= date2 group s by new { s.Topic.topicID, s.Topic.subject, s.Topic.datum, s.Topic.Forum.forumID, s.Topic.Forum.forumName, s.Topic.Forum.ForumGroup.name } into g orderby g.Count() descending select new TopicStatsData { TopicId = g.Key.topicID, Count = g.Count(), Subject = g.Key.subject, ForumId = g.Key.forumID, ForumName = g.Key.forumName, ForumGroupName = g.Key.name });
This code looked fine to me. It’s kind of massive and has a lot of joins but it should be used in an admin interface so i did not care that much. But it turned out that it timeout every time I ran it. I had a look at the SQL that was generated and that was not a pretty sight. A massive amount of subqueries and left outer joins.
I posted a question on the new and really really good code community Stackoverflow and got the tip that I should join the tables before group by clue. And that is what I tried to do.
The problem was that I got this error all the time.
from s in DB.ForumStatsSet
from t in s.Topic
Error:
An expression of type ‘WhoaAdmin.Models.Topic’ is not allowed in a subsequent from a clause in a query expression with source type ‘System.Data.Objects.ObjectQuery<WhoaAdmin.Models.ForumStats>’. Type inference failed in the call to ‘SelectMany’.
I tried to google that error and I got nothing. I realized that maybe you have to do the joins in the correct order and that was it and I ended up with this query.
var query = (from fg in db.ForumGroupSet from f in fg.Forums from t in f.Topics from s in t.ForumStats where s.LogDate >= date1 && s.LogDate <= date2 group s by new { t.topicID, t.subject, t.datum, f.forumID, f.forumName, fg.name } into g orderby g.Count() descending select new TopicStatsData { TopicId = g.Key.topicID, Count = g.Count(), Subject = g.Key.subject, ForumId = g.Key.forumID, ForumName = g.Key.forumName, ForumGroupName = g.Key.name });
The SQL I ended up with looks kind of awful but probably the SQL engine does something like this “under the hood” on regular SQL group by to. At least it is kind of fast. The big difference in the generated SQL is that it has replaced all “LEFT OUTER JOINS” with “INNER JOINS” and they are of course faster.
Please leave a comment.