What is the "N+1 Queries Problem"?
The N+1 Queries Problem is an O/R mapper specific issue. It may occur when trying to load data from a table (like ProductGroups
) along with its related data (like Products
) having a one-to-many or many-to-many relationship when the related data is on the many-side.
Please note: I this article I use synchronous methods for better readability.
Also note: In this article we are talking about one-to-many/many-to-many relationships only. A one-to-one relationship does not cause the N+1 Queries Problem.
In general, there are 2 reasons for the problem to arise. The first one is either a limitation, bug, a (controversial) feature, or a design flow in the O/R mapper itself. For example, EF 2 executed N+1 SQL statements with following LINQ query: 1 statement for loading N ProductGroups
and N statements for loading the Products
.
var groups = Context.ProductGroups
.Where(g => g.Name.Contains("Group"))
.Select(g => new
{
ProductGroup = g,
Product = g.Products.FirstOrDefault()
})
.ToList();
The second reason is an inappropriate use of the O/R mapper itself. Loading data in a loop, as depicted below, will lead to N+1 roundtrips to the database.
var groups = Context.ProductGroups.ToList();
var products = new List();
foreach (var group in groups)
{
var product = Context.Products.FirstOrDefault(p => p.ProductGroupId == group.Id);
products.Add(product);
}
Hidden Issues in EF 7?
In the early versions, there were some LINQ queries that lead to N+1 Queries Problem. With Entity Framework Core 3 (EF 3) those queries usually raised an InvalidOperationException
.
I’m glad to say that none of the culprit queries from the past are leading to the N+1 issue. Entity Framework gets better with every version. EF 7 even translates GroupBy
, which was impossible a few versions ago.
var firstProducts = Context.Products
.GroupBy(p => p.ProductGroupId)
.Select(g => g.FirstOrDefault())
.ToList();
The previous query translates to the following SQL statement:
SELECT *
FROM (
SELECT [p].[ProductGroupId]
FROM [Products] AS [p]
GROUP BY [p].[ProductGroupId]
) AS [t]
LEFT JOIN (
SELECT *
FROM (
SELECT
*,
ROW_NUMBER() OVER(PARTITION BY [p0].[ProductGroupId] ORDER BY [p0].[Id]) AS [row]
FROM [Products] AS [p0]
) AS [t1]
WHERE [t1].[row] <= 1
) AS [t0] ON [t].[ProductGroupId] = [t0].[ProductGroupId]
Lazy Loading
Nothing has changed regarding Lazy Loading. The feature should be used with care.
Loading of all Products
for 10 ProductGroups
with the following LINQ query leads to 11 SQL statements.
var lookup = Context.Products
.ToLookup(p => p.Group.Name);
One SQL statement for loading all Products
and 10 for loading the ProductGroups
.
// executed once
SELECT * FROM [Products] AS [p]
// excuted N times
SELECT [g].[Id], [g].[Name]
FROM [ProductGroups] AS [g]
WHERE [g].[Id] = @__p_0
The issue can be solved using eager loading, i.e., via Include
or by the explicit selection of the required properties and entities. The latter approach is more optimal because EF 7 loads only the name instead of the whole ProductGroup
.
var lookup = Context.Products
.Include(p => p.Group)
.ToLookup(p => p.Group.Name);
// Alternative: explicit selection of the ProductGroup-Name
var lookup = Context.Products
.Select(p => new
{
Product = p,
p.Group.Name
})
.ToLookup(p => p.Name, p => p.Product);
The Issue Remains
I think it is safe to say that EF 7 is not susceptible to N+1 Queries Problem on its own. Still, the issue will haunt us forever, and the reason has nothing to do with Entity Framework (Core). Any database request made directly or indirectly within a loop will lead to multiple SQL statements and should be avoided. But it is easier said than done. I can’t remember any project which didn’t make requests inside a loop (unintentionally) because they are not so easy to spot, like in the example at the beginning of this article. Here is another example of a query that is executed indirectly within a loop.
var products = Context.Products.ToList();
var prices = new List();
foreach (var product in products)
{
var price = GetPrice(product.Id);
prices.Add(price);
}
// ---------------
private Price GetPrice(int productId)
{
// oversimplified
return Context.Prices.FirstOrDefault(p => p.ProductId == productId);
}
The previous example is still very easy to spot and to fix because the code fragments are close together. Now put these fragments into different C# files and add a few other methods between the foreach
-loop and the LINQ query Context.Prices
. Depending on the depth of the call stack and the complexity of the code, the issue will be very hard to detect.
Furthermore, there are loops that don’t look like loops, i.e., they don’t have the usual keywords like for
, foreach
, or while
. The following LINQ statement is one of them:
var products = Context.Products.ToList();
var prices = products.Select(product => GetPrice(product.Id))
.ToList();
Other “hidden loops” to look out for are recursions.
Summary
Although EF 7 runs into N+1 Queries Problem with Lazy Loading only, still, the problem is and will remain in our projects due to loops.
Some recommendations: better code structure, reduced call stack depth, and check the number of executed SQL statements in integration tests (see IDbCommandInterceptor).