The Performance Issue „Cartesian Explosion“ Made Its Comeback In Entity Framework Core 3

In Entity Framework Core 3.0/3.1 the SQL statement generation underwent significant changes. As we have seen in the previous post these changes removed both the implicit client-side evaluation and the N+1 Query Problem (which is good!). Unfortunately, these changes (re)introduced another issue: the Cartesian Explosion Problem.

In diesem Artikel:

pg
Pawel Gerr ist Architekt und Consultant bei Thinktecture. Er hat sich auf .NET Core Backends spezialisiert und kennt Entity Framework von vorne bis hinten.

What is a "Cartesian Explosion"?

As implied by the name, it has something to do with a cartesian product, i.e. with JOINs. When performing a JOIN on the one-to-many relationship then the rows of the one-side are being replicated N times whereby N is the number of matching records on the many-side.

Here is an example for JOIN-ing 1 ProductGroup with 1000 Products.
The corresponding LINQ query would look like:

				
					var groups = Context.ProductGroups
          .Include(g => g.Products)
          .ToList();
				
			

The SQL statement is similar to the following one:

				
					SELECT *
FROM ProductGroups
LEFT JOIN
    Products
    ON Products.GroupId = ProductGroups.Id
				
			

And the result set:

ProductGroup Id
Product Id
1
1
1
2
1
3
1
1
1000

As we see, the columns of the ProductGroup are replicated 1000 times. Imagine there are 10 sellers per Product – the result set will contain 1 * 1000 * 10 = 10000 rows although we have just 1 + 1000 + 10 = 1011 records in the database.

I should be clear what happens if we add a few Includes more. The result set (i.e. the cartesian product) would explode.

EF-forced "ORDER BY"

The larger result set due to JOINs is not the only cause for lower performance. Let’s look at the SQL statement generated by EF. Btw, the SQL statement above is not complete but the following one is:

				
					SELECT 
    [p].[Id], [p].[Name], [p].[RowVersion],
    [p0].[Id], [p0].[GroupId], [p0].[Name], [p0].[RowVersion]
FROM
    [ProductGroups] AS [p]
LEFT JOIN 
    [Products] AS [p0] 
    ON [p].[Id] = [p0].[GroupId]
ORDER BY
    [p].[Id], [p0].[Id]

				
			

For internal purposes, the EF adds an ORDER BY clause to order the entities by their identifiers. So, with a result set of that huge size, the ordering of this data will produce considerable load on the database.

Query splitting (back to the roots)

The solution of the Cartesian Explosion Problem that came with Entity Framework Core 3 is the same as with Entity Framework (non-Core) 6. We split 1 LINQ query in multiple queries if (and only if) the database load rises significantly.

When using our (oversimplified) example from above then the solution is to load Products and ProductGroups separately.

				
					var groups = Context.ProductGroups.ToList();
var products = Context.Products.ToList();
				
			

Here are some database statistics (MS SQL Server) I get when loading data having two one-to-many relationships before and after query splitting. The absolute numbers are not relevant, just look at the relative difference, especially in the Reads and Rows.

Before splitting
After splitting
CPU
31
16
Duration
75
3
Reads
5300
350
Rows
12000
2300

Summary

In this blog article, I wanted to convey two things: there is a new (old) issue we have to be aware of, and this issue can be solved.
The difficulty is finding such queries and determining how to split them. If we split too much, we waste time. If we split too little, we waste performance. The tools I highly recommend using for this task are the database statistics and execution plans.

Kostenloser
Newsletter

Aktuelle Artikel, Screencasts, Webinare und Interviews unserer Experten für Sie

Verpassen Sie keine Inhalte zu Angular, .NET Core, Blazor, Azure und Kubernetes und melden Sie sich zu unserem kostenlosen monatlichen Dev-Newsletter an.

Newsletter Anmeldung
Diese Artikel könnten Sie interessieren
Entity Framework Core
pg

Entity Framework Core 7 Performance: Cartesian Explosion

In Entity Framework Core 3 (EF 3) the SQL statement generation (re)introduced the Cartesian Explosion problem. A lot has happened since then, so it is time to revisit the issue with Entity Framework Core 7 (EF 7).
14.03.2023
Entity Framework Core
pg

Entity Framework Core 7: N+1 Queries Problem

The N+1 queries problem has been our constant companion since day one of Entity Framework (Core). Entity Framework Core 2 (EF 2) introduced a new feature that caused the "N+1 queries problem" more often and was more difficult to detect, so it was removed in the following version. After a little back and forth, let's see how Entity Framework Core 7 (EF 7) handles this issue and why it will likely remain in the future.
09.01.2023
Entity Framework Core
pg

Entity Framework Core: User-defined Fields and Tables

The requirement to store additional fields, unknown at development time, in a relational database is not new. Nonetheless, none of the projects I know of are willing to change the database structure at runtime. What if there is a project which needs dynamically created fields and doesn't want or cannot use entity–attribute–value model or switch to No-SQL databases?
20.09.2022
Blazor
favicon

[Sneak Preview] Blazor WebAssembly: The Power Of EF Core And SQLite In The Browser – In-Depth

Rumor has it, this is the dream of many .NET developers: Using SQLite and EF Core in the browser with Blazor WebAssembly. Is this possible? In this article, we will have a look how you can manage your offline data with the help of SQLite and EF Core by persisting the data with standard browser features and APIs.
12.04.2022
Entity Framework Core
pg

Entity Framework Core: Default Comparer For Byte Arrays May Waste Lots Of Memory And CPU

The default implementation of Entity Framework Core prefers to play it safe (for good reasons) when working with byte arrays. This 'safety' is - in some use cases - unnecessary and costs us a lot of memory and CPU. In this article, we will see that doing less is sufficient for the given property thanks to one of the most overlooked features of Entity Framework.
24.03.2022
Entity Framework Core
pg

Entity Framework Core 5 Performance: Power Of Table Hints

A few months ago, I had to implement a Web API for a relatively simple use case, but there was a challenge. The web endpoint must read and update a specific record using Entity Framework Core 5 and be capable of handling thousands of requests per second. Another requirement was the *data integrity*, so a transaction was a must. With high concurrency and Microsoft SQL Server as the database, we can end up in a deadlock if the SQL Server locks the records or rather the pages in an *unfavorable* way. Let's see how *table hints* can help us.
29.07.2021