Unnecessary Fuzzy Searches May Hurt Your Entity Framework Core Performance

After talking about performance issues like N+1 Queries and the Cartesian Explosion that made its comeback in Entity Framework Core 3, we will today look at a performance issue that is not tied to any Entity Framework version but is rather a general one.

In diesem Artikel:

Unnecessary Fuzzy Searches May Hurt Your Entity Framework Core Performance
Pawel Gerr ist Architekt und Consultant bei Thinktecture. Er hat sich auf .NET Core Backends spezialisiert und kennt Entity Framework von vorne bis hinten.

What do I mean by "unnecessary fuzzy searches"?

In this article, I consider any filter criteria (i.e. WHERE clause) that can be more exact/precise.
For example, when filtering for a Product with unique names (of same length) we could do it either using Contains() or the equality == operator:

				
					var name = "my product";

var product1 = Context.Products.FirstOrDefault(p => p.Name.Contains(name));
// OR
var product2 = Context.Products.FirstOrDefault(p => p.Name == name);

				
			

The same applies when comparing numbers or dates using ><==, and so on.

Why would someone do an unnecessary fuzzy search?

Hardly anyone would intentionally do something unnecessary, but unintentionally, it happens…

One of the causes for sub-optimal filters is the lack of exact domain knowledge and/or side effects in the programming model used. I’ve seen code as in the example above using Contains() instead of == multiple times and the authors could not fully explain why they took this approach because both ways work. The best thing we can do in such a situation is to sensitize users of EF Core for issues like these.

Another cause can be unfavorable code sharing. Imagine, in one of our core projects we found a method LoadProducts that takes a few parameters, and among them the parameter string name.
Now, our code could look like:

				
					var name = "my product";

var product = someRepository.LoadProducts(name).FirstOrDefault();
				
			

After a quick test we are sure that the method produces a correct response, which is true, but let us have a closer look.

				
					public List<Product> LoadProducts(
    string name = null,
    string someOtherFilter = null)
{
    var query = Context.Products.AsQueryable();

    if(name != null)
        query = query.Where(p => p.Name.Contains(name));

    if(someOtherFilter != null)
        query = query.Where(...);

    return query.ToList();
}
				
			

As we can see, there are a few issues with this method. First, this method is clearly not made for our use case but for some kind of global product search, probably for displaying the data on a user interface. And second, it resides in a core project although we are pretty sure it is not a general-purpose business logic but for a specific use case that differs from ours.

Increased database load due to fuzzy searches

In the previous examples, we saw that the LINQ queries or rather the filter criteria could be changed without affecting the outcome but there was no evidence that the comparison via == performs better than using Contains().

Now we will execute two slightly different LINQ queries, look at the SQL statements generated by EF and analyze the execution plans to determine the winner. An execution plan describes all operations the database performs to handle the SQL request.
I’m using MS SQL Server for the following tests.

Given is a table with 100 Products having columns Id and Name and an unique index on the column Name. The LINQ queries are similar to the one in the method LoadProducts and return exactly 1 product.

				
					var name = "my product";

// query 1
var product1 = Context.Products.Where(p => p.Name.Contains(name)).ToList();

// query 2
var product2 = Context.Products.Where(p => p.Name == name).ToList();
				
			

The corresponding SQL statements (simplified for the sake of readability).

				
					-- query 1
SELECT *
FROM Products
WHERE CHARINDEX(@__name_0, Name) > 0

-- query 2
SELECT *
FROM Products
WHERE Name = @__name_0
				
			

In both cases, there is just one operation (worth mentioning) being executed by the database. For the query using Contains() the database performs a so-called Index Scan, i.e. the database scans the whole data source record by record. The other query using the == comparison does an Index Seek, i.e. the database uses the index to jump right to the requested record.

Depending on how large the data source is, scanning it can lead to a considerable load on the database, but that is not all.

(Unnecessary) Scanning of the data source is just the tip of the iceberg

The performance loss due to the scanning of the whole data source is just the beginning of a chain reaction. If the query in the example above is just a small part of a bigger query, Index Scan will lead to bigger internal working sets the database has to work with. Due to bigger working sets the database may decide for sub-optimal JOIN order (from our perspective). The increased complexity may lead to parallelization and/or use of internal temp tables, which are both not for free. In the end, the database will require more memory for handling the query (high query memory grant), leaving us with very bad performance.

Summary

When writing database queries with EF Core then the filter criteria should be both as precise and as restrictive as possible. Otherwise, not just the query itself but the overall database performance could suffer greatly. If you need numbers to compare with when optimizing database requests, I recommend analyzing execution plans.

Kostenloser
Newsletter

Aktuelle Artikel, Screencasts, Webinare und Interviews unserer Experten für Sie

Verpassen Sie keine Inhalte zu Angular, .NET Core, Blazor, Azure und Kubernetes und melden Sie sich zu unserem kostenlosen monatlichen Dev-Newsletter an.

Diese Artikel könnten Sie interessieren
.NET
Incremental Roslyn Source Generators in .NET 6: Adapt Code Generation Based on Project Dependencies – Part 5

Incremental Roslyn Source Generators in .NET 6: Adapt Code Generation Based on Project Dependencies – Part 5

The Roslyn Source Generator, implemented in the previous articles of the series, emits some C# code without looking at the dependencies of the current .NET (Core) project. In this article our DemoSourceGenerator should implement a JsonConverter, but only if the corresponding library (e.g. Newtonsoft.Json) is referenced by the project.
08.07.2022
Unterschiede
.NET
Blazor WebAssembly vs. Blazor Server – Welche Unterschiede gibt es und wann wähle ich was?

Blazor WebAssembly vs. Blazor Server – Welche Unterschiede gibt es und wann wähle ich was?

Das Blazor Framework von Microsoft gibt es inzwischen in drei "Geschmacksrichtungen". Die erste ist Blazor WebAssembly, die zweite Blazor Server, und zu guter Letzt gibt es noch Blazor Hybrid. In diesem Artikel wollen wir uns die zwei "echten", also Browser-basierten, Web-Anwendungs-Szenarien WebAssembly und Server anschauen.
04.07.2022
Three different textured walls
.NET
Dependency Injection Scopes in Blazor

Dependency Injection Scopes in Blazor

The dependency injection system is a big part of how modern ASP.NET Core works internally: It provides a flexible solution for developers to structure their projects, decouple their dependencies, and control the lifetimes of the components within an application. In Blazor - a new part of ASP.NET Core - however, the DI system feels a bit odd, and things seem to work a bit differently than expected. This article will explain why this is not only a feeling but indeed the case in the first place and how to handle the differences in order to not run into problems later on.
31.05.2022
.NET
Asynchrone Operationen: Blazor WebAssembly für Angular-Entwickler – Teil 5 [Screencast]

Asynchrone Operationen: Blazor WebAssembly für Angular-Entwickler – Teil 5 [Screencast]

Eine Webanwendung will natürlich auch mit Daten gefüttert werden. Doch diese müssen irgendwo her kommen. Nichts liegt näher als diese von einer Web API zu laden. Dieser Screencast zeigt, wie asynchrone Operationen in Blazor funktionieren und welche gravierenden Unterschiede es zu Angular gibt.
26.05.2022
.NET
Typings: Blazor WebAssembly für Angular-Entwickler – Teil 4 [Screencast]

Typings: Blazor WebAssembly für Angular-Entwickler – Teil 4 [Screencast]

C# und TypeScript entstammen der Feder der selben Person. Doch sind sie deshalb auch gleich? In diesem Teil der Screencast-Serie erfahren Sie, wie mit Typen in den beiden Programmiersprachen verfahren wird und welche Unterschiede es gibt.
19.05.2022
.NET
Bindings: Blazor WebAssembly für Angular-Entwickler – Teil 3 [Screencast]

Bindings: Blazor WebAssembly für Angular-Entwickler – Teil 3 [Screencast]

Wer Komponenten einsetzt, steht früher oder später vor der Fragestellung, wie man Daten an die Komponente übergibt oder auf Ereignisse einer Komponente reagiert. In diesem Screencast wird gezeigt wie Bindings bei Komponenten funktionieren, also wie eine Komponente Daten von außerhalb benutzen und Rückmeldung bei Aktionen geben kann.
12.05.2022