Entity Framework Core 7 Performance: Cartesian Explosion

In Entity Framework Core 3 (EF 3) the SQL statement generation (re)introduced the Cartesian Explosion problem. A lot has happened since then, so it is time to revisit the issue with Entity Framework Core 7 (EF 7).

In diesem Artikel:

pg
Pawel Gerr ist Architekt und Consultant bei Thinktecture. Er hat sich auf .NET Core Backends spezialisiert und kennt Entity Framework von vorne bis hinten.

What is a "Cartesian Explosion"?

As the name implies, it has something to do with a cartesian product, i.e., with JOINs. When performing a JOIN on the one-to-many relationship then the rows on the one-side are replicated N times whereby N is the number of matching records on the many-side.

Please note: In this article, I use synchronous methods for better readability.
Also note: In this article, we are talking about one-to-many/many-to-many relationships only. A one-to-one relationship does not cause the Cartesian Explosion.

Here is an example for JOIN-ing 1 ProductGroup with 1000 Products.
The corresponding LINQ query:

				
					var groups = Context.ProductGroups
          .Include(g => g.Products)
          .ToList();
				
			

The SQL statement is like the following one:

				
					SELECT *
FROM [ProductGroups] AS [p]
LEFT JOIN [Products] AS [p0]
   ON [p].[Id] = [p0].[ProductGroupId]
ORDER BY [p].[Id]

				
			

And the result set:

ProductGroup Id
Product Id
1
1
1
2
1
3
1
1
1000

The columns of the ProductGroup are replicated 1000 times.

If there were another navigational property Sellers and there were 10 sellers per Product, then the result set will contain 1 * 1000 * 10 = 10000 rows, although we have just 1 + 1000 + 10 = 1011 records in the database.

				
					var groups = Context.ProductGroups
          .Include(g => g.Products)
          .ThenInclude(p => p.Sellers)
          .ToList();
				
			

With a few Include/ThenInclude more, the result set (i.e., the cartesian product) will explode.

EF-Forced "ORDER BY"

The larger result set due to JOINs is not the only performance issue. Depending on the included navigational properties, EF might add one or more ORDER BY clauses.

The good news is EF 7 doesn’t add an additional ORDER BY clause anymore when 1 navigational property is included; in contrast, EF 3 orders the data by both product group ID and product ID.

				
					SELECT *
FROM [ProductGroups] AS [p]
LEFT JOIN [Products] AS [p0]
   ON [p].[Id] = [p0].[ProductGroupId]
ORDER BY [p].[Id] // EF 7
// ORDER BY [p].[Id], [p0].[Id] // EF 3
				
			

The performance hit comes back when 2 or more navigational properties are included, e.g., Sellers.

				
					var groups = Context.ProductGroups
          .Include(g => g.Products)
          .ThenInclude(p => p.Sellers)
          .ToList();
				
			

For better materialization, i.e., the population of .NET entities, EF puts all identifiers into the ORDER BY clause. The resulting query will produce a considerable load on the database if the result set gets big.

				
					SELECT *
FROM [ProductGroups] AS [p]
LEFT JOIN (
    SELECT *
    FROM [Products] AS [p0]
    LEFT JOIN (
        SELECT *
        FROM [SellerProducts] AS [s]
        INNER JOIN [Sellers] AS [s0] ON [s].[SellerId] = [s0].[Id]
    ) AS [t] ON [p0].[Id] = [t].[ProductId]
) AS [t0] ON [p].[Id] = [t0].[ProductGroupId]
ORDER BY [p].[Id], [t0].[Id], [t0].[ProductId], [t0].[SellerId]

				
			

Query Splitting

The solution to the Cartesian Explosion Problem in EF 7 is still the same as in EF 3. The LINQ query should be split into multiple SQL statements if (and only if) the database load rises significantly. This can be done either manually, by writing multiple LINQ queries, or by using the method AsSplitQuery.

Please note, that the „alternative“ way produces not the same result as the original query. The query Context.Sellers.ToList() loads all sellers and not just the ones attached to a product. Loading all sellers is an additional optimization because we know that (almost) all of them are going to be returned anyway.

For the following comparisons, the database contains 1000 products, 10 product groups, and 2 sellers.

				
					var groups = Context.ProductGroups
       // .AsSplitQuery()
          .Include(g => g.Products)
          .ThenInclude(p => p.Sellers)
          .ToList();

// Alternative to "AsSplitQuery"
var groups = Context.ProductGroups.ToList();
var products = Context.Products.ToList();
var sellers = Context.Sellers.ToList();
				
			

Below are some database statistics (MS SQL Server). The absolute numbers are not relevant. Just look at the relative difference, especially in Reads and Rows.

With query splitting, the CPU time seems to be less than 1ms, so the statistics return 0ms. I put 1ms instead because 0ms looks very odd.

Without AsSplitQuery
With AsSplitQuery
Alternative Way
CPU
16
1
1
Duration
233
420
264
Reads
6123
36
16
Rows
2000
2010
1012

With query splitting, the duration gets up because there are multiple round-trips to the database. But this increase in duration is more than acceptable if the number of reads goes down from 6k to 36 or even 16.

Summary

The Cartesian Explosion is still an issue in EF 7 and it is very likely not to disappear in the newer versions. EF team gave us the method AsSplitQuery which can mitigate the problem. This tool yields good results, but it won’t be a replacement for manual optimization by the developers.

Kostenloser
Newsletter

Aktuelle Artikel, Screencasts, Webinare und Interviews unserer Experten für Sie

Verpassen Sie keine Inhalte zu Angular, .NET Core, Blazor, Azure und Kubernetes und melden Sie sich zu unserem kostenlosen monatlichen Dev-Newsletter an.

Newsletter Anmeldung
Diese Artikel könnten Sie interessieren
Entity Framework Core
pg

Entity Framework Core 7: N+1 Queries Problem

The N+1 queries problem has been our constant companion since day one of Entity Framework (Core). Entity Framework Core 2 (EF 2) introduced a new feature that caused the "N+1 queries problem" more often and was more difficult to detect, so it was removed in the following version. After a little back and forth, let's see how Entity Framework Core 7 (EF 7) handles this issue and why it will likely remain in the future.
09.01.2023
Entity Framework Core
pg

Entity Framework Core: User-defined Fields and Tables

The requirement to store additional fields, unknown at development time, in a relational database is not new. Nonetheless, none of the projects I know of are willing to change the database structure at runtime. What if there is a project which needs dynamically created fields and doesn't want or cannot use entity–attribute–value model or switch to No-SQL databases?
20.09.2022
Blazor
favicon

[Sneak Preview] Blazor WebAssembly: The Power Of EF Core And SQLite In The Browser – In-Depth

Rumor has it, this is the dream of many .NET developers: Using SQLite and EF Core in the browser with Blazor WebAssembly. Is this possible? In this article, we will have a look how you can manage your offline data with the help of SQLite and EF Core by persisting the data with standard browser features and APIs.
12.04.2022
Entity Framework Core
pg

Entity Framework Core: Default Comparer For Byte Arrays May Waste Lots Of Memory And CPU

The default implementation of Entity Framework Core prefers to play it safe (for good reasons) when working with byte arrays. This 'safety' is - in some use cases - unnecessary and costs us a lot of memory and CPU. In this article, we will see that doing less is sufficient for the given property thanks to one of the most overlooked features of Entity Framework.
24.03.2022
Entity Framework Core
pg

Entity Framework Core 5 Performance: Power Of Table Hints

A few months ago, I had to implement a Web API for a relatively simple use case, but there was a challenge. The web endpoint must read and update a specific record using Entity Framework Core 5 and be capable of handling thousands of requests per second. Another requirement was the *data integrity*, so a transaction was a must. With high concurrency and Microsoft SQL Server as the database, we can end up in a deadlock if the SQL Server locks the records or rather the pages in an *unfavorable* way. Let's see how *table hints* can help us.
29.07.2021
Entity Framework Core
pg

Do Not Waste Performance By Not Using Temp Tables With Entity Framework Core

It has been a while since I released my article about the usage of temp tables in Entity Framework (v6). Meanwhile, Microsoft has released a completely rewritten version of its O/R mapper so my old approach is no longer applicable. But before we learn about a new one, let us think about what we might need temp tables for.
25.06.2020