In this blog post we will check a few LINQ queries and see which of them are executing N+1
SQL statements where N
is the number of selected records.
Given is a DbContext
with 2 entities Product
and ProductGroup
. (Repo with sample code: github.com/PawelGerr/Presentation-EntityFrameworkCore)
public class DemoDbContext : DbContext
{
public DbSet Products { get; set; }
public DbSet ProductGroups { get; set; }
}
public class Product
{
public Guid Id { get; set; }
public string Name { get; set; }
public Guid GroupId { get; set; }
public ProductGroup Group { get; set; }
}
public class ProductGroup
{
public Guid Id { get; set; }
public string Name { get; set; }
public ICollection Products { get; set; }
}
Let’s print out all product groups having the word "Group"
in their names with corresponding products via Include()
first and using Select()
second.
// Using Include()
var groups = Context.ProductGroups
.Include(g => g.Products)
.Where(g => g.Name.Contains("Group"))
.ToList();
Print(groups);
// Using Select()
var groups = Context.ProductGroups
.Where(g => g.Name.Contains("Group"))
.Select(g => new
{
ProductGroup = g,
g.Products
})
.ToList();
Print(groups);
In both cases 2 SQL statements are executed by EF Core: 1 for the product groups and 1 for the products. On the contrary, EF 6 executes just 1 statement. This may imply that the performance of EF 6 is better than the of EF Core, but in practice it is worse because the queries are getting huge and produce more load on the database.
-- Fetching product groups
SELECT [g].[Id], [g].[Name]
FROM [ProductGroups] AS [g]
WHERE CHARINDEX(N'Group', [g].[Name]) > 0
ORDER BY [g].[Id]
-- Fetching products
SELECT [g.Products].[Id], [g.Products].[GroupId], [g.Products].[Name]
FROM [Products] AS [g.Products]
INNER JOIN
(
SELECT [g0].[Id]
FROM [ProductGroups] AS [g0]
WHERE CHARINDEX(N'Group', [g0].[Name]) > 0
) AS [t] ON [ g.Products].[GroupId] = [t].[Id]
ORDER BY [t].[Id]
Now we don’t take all products but only those with the term "1"
in their names and print them out twice(!).
var groups = Context.ProductGroups
.Where(g => g.Name.Contains("Group"))
.Select(g => new
{
ProductGroup = g,
Products = g.Products.Where(p => p.Name.Contains("1"))
})
.ToList();
Print(groups); // 1st iteration over product groups
Print(groups); // 2nd iteration over product groups
The result is disappointing. Having 5 product groups matching the condition we get 11 SQL statement executions: 1 query for fetching 5 product groups and (2 * 5 = 10) for fetching the products. Let’s put a ToList()
at the end of the products query.
var groups = Context.ProductGroups
.Where(g => g.Name.Contains("Group"))
.Select(g => new
{
ProductGroup = g,
Products = g.Products.Where(p => p.Name.Contains("1")).ToList()
})
.ToList();
Now we have 6 (=1+5) queries being sent to the database, it is getting better but still not satisfying.
-- 1 query for fetching product groups
SELECT [g].[Id], [g].[Name]
FROM [ProductGroups] AS [g]
WHERE CHARINDEX(N'Group', [g].[Name]) > 0
-- 5 queries for fetching products (i.e. 1 query per fetched product group)
SELECT [p].[Id], [p].[GroupId], [p].[Name]
FROM [Products] AS [p]
WHERE (CHARINDEX(N''1'', [p].[Name]) > 0) AND
(@_outer_Id = [p].[GroupId])
Obviously, EF Core has some difficulties translating queries if Select()
contains a filtered collection. Let’s select just the first product.
var groups = Context.ProductGroups
.Where(g => g.Name.Contains("Group"))
.Select(g => new
{
ProductGroup = g,
Product = g.Products.FirstOrDefault()
})
.ToList();
Print(groups);
Solutions
We can reduce the number of queries by not using the navigational property Products
but doing the “JOIN” by ourselves, for example via GroupJoin
.
var productsQuery = Context.Products.Where(i => i.Name.Contains("1"));
var groups = Context.ProductGroups
.Where(g => g.Name.Contains("Group"))
.GroupJoin(productsQuery, g => g.Id, p => p.GroupId, (g, p) => new
{
ProductGroup = g,
Products = p
})
.ToList();
Print(groups);
The previous LINQ query produces just 1 query.
SELECT [g].[Id], [g].[Name], [t].[Id], [t].[GroupId], [t].[Name]
FROM [ProductGroups] AS [g]
LEFT JOIN
(
SELECT [i].[Id], [i].[GroupId], [i].[Name]
FROM [Products] AS [i]
WHERE CHARINDEX(N'1', [i].[Name]) > 0
) AS [t] ON [g].[Id] = [t].[GroupId]
WHERE CHARINDEX(N'Group', [g].[Name]) > 0
ORDER BY [g].[Id]
An alternative solution is to fetch the data separately and doing lookups in .NET.
var groupsQuery = Context.ProductGroups
.Where(g => g.Name.Contains("Group"));
var productsByGroupId = groupsQuery
.SelectMany(g => g.Products.Where(i => i.Name.Contains("1")))
.ToLookup(p => p.GroupId);
var groups = groupsQuery
.Select(g => new
{
ProductGroup = g,
Products = productsByGroupId[g.Id]
})
.ToList();
The generated SQL statements are easier to handle by the database but there are 2 of them.
-- For product groups
SELECT [g].[Id], [g].[Name]
FROM [ProductGroups] AS [g]
WHERE CHARINDEX(N'Group', [g].[Name]) > 0
-- For products
SELECT [g.Products].[Id], [g.Products].[GroupId], [g.Products].[Name]
FROM [ProductGroups] AS [g]
INNER JOIN [Products] AS [g.Products]
ON [g].[Id] = [g.Products].[GroupId]
WHERE (CHARINDEX(N'Group', [g].[Name]) > 0) AND
(CHARINDEX(N'1', [g.Products].[Name]) > 0)
Closing Words
The query generation of EF Core is not optimal yet but the Entity Framework Team is currently working on the “N+1 queries” problem so we will re-check all queries with EF Core 2.1 very soon.
In general, whether it is EF 6, EF Core or other O/R mapper it is recommended to use a database profiling tool, so we get good understanding of the technology we use yet again.