Do Not Waste Performance By Not Using Temp Tables With Entity Framework Core

It has been a while since I released my article about the usage of temp tables in Entity Framework (v6). Meanwhile, Microsoft has released a completely rewritten version of its O/R mapper so my old approach is no longer applicable. But before we learn about a new one, let us think about what we might need temp tables for.

In diesem Artikel:

pg
Pawel Gerr ist Architekt und Consultant bei Thinktecture. Er hat sich auf .NET Core Backends spezialisiert und kennt Entity Framework von vorne bis hinten.

Potential issues without using temp tables

The usage of temp tables is beneficial if we have to work with a lot of data and/or with multiple databases, particularly with different types of DbContext. One of the most common use cases is when we have to load data like Products with specific identifiers.

				
					// could contain hundreds or thousands of product ids
List<Guid> productIds = ...;
DemoDbContext ctx = ...;

var products = await ctx.Products
                        .Where(p => productIds.Contains(p.Id))
                        .ToListAsync();
				
			

Without temp tables there is not much choice but to use the method Contains that sends all productIds to the database. The corresponding SQL statement looks like this:

				
					SELECT *
FROM Products
WHERE Id IN 
(
    'a1204953-d5e5-477c-a5d3-00111e810bcd', 
    ...,
    '880d3f31-2184-4566-9b63-1a762dcc5c13'
)
				
			

With a few dozen IDs, this approach performs very well, but with hundreds or thousands of values, the performance starts to degrade. Furthermore, it is not uncommon that the productIds are used in more than one query, for example, to load Products and OrderItems.

				
					List<Guid> productIds = ...;

// 1st usage of "productIds"
var products = await ctx.Products
                        .Where(p => productIds.Contains(p.Id))
                        .ToListAsync();

// 2nd usage of "productIds"
var orderItems = await ctx.OrderItems
                          .Where(p => productIds.Contains(p.ProductId))
                          .ToListAsync();
				
			

Multiple transmissions of a large SQL statement to the database and its parsing is certainly an issue, but likely not the biggest one. Depending on the particular query, the database may decide on an execution plan that is not the ideal one, due to the lack of information about the data. One wrong decision by the database may lead to a chain reaction (see: Scanning of the data source is just the tip of the iceberg) that can cause considerable load on the database server.

Benefits of temp tables

One benefit of the temp tables is that we do not need to send the same data like productIds multiple times to the database when using it in multiple queries. Furthermore, we may choose a more performant approach to send the identifiers to the database like using SqlBulkCopy in case of MS SQL Server. Last but not least, we can provide the database with more information about the data, so the database can to choose a better execution plan.
In our case, we can insert all productIds into a temp table and create a unique (clustered) index. By this, the database receives important information, so the records are unique and ordered. With unique, ordered data, the database should be able to choose the best operations to process the request.

The previous example may not yield the best performance when using built-in features only, but it is supported by Entity Framework Core. However, working with a collection of simple values like productIds is not the same as having a collection of tuples because we can not use Contains() anymore but need the method Join().
A concrete example would be loading OrderItems for specific customers and products. The following LINQ query symbolizes our intent but throws an InvalidOperationException at runtime because a Join of an EF-query with an in-memory collection is not supported.

				
					List<(Guid CustomerId, Guid ProductId)> customerProductTuples = ...;

var orderItems = await ctx.OrderItems
                          //.AsEnumerable()
                          .Join(customerProductTuples,
                                i => new { i.Order.CustomerId, i.ProductId },
                                t => new { t.CustomerId, t.ProductId },
                                (i, t) => i)
                          .ToListAsync();
				
			

Sure, we could add AsEnumerable() before Join(), but that will load all OrderItems from the database, which is not an option, especially with big tables.

Adding support for temp tables

Adding support for temp tables to Entity Framework Core can be divided into three parts: introduction of temp tables to Entity Framework Core, creation of the temp tables, and inserting records. The first part does not depend on the concrete vendor of the database, so it applies to the MS SQL Server, SQLite, MySQL, and further. The last two parts are vendor-specific, especially the (bulk) insert of data, so each database requires a specific implementation.

Moreover, I will show you some key elements required to add temp table support to Entity Framework Core 3.1. By the end of this article, we will have a fully functional prototype. If you need more detailed information, feel free to look into the real code. The sources can be found in Azure DevOps: Thinktecture.EntityFrameworkCore.

We will start with the easier part, the introduction of temp tables to Entity Framework Core.

Using temp tables in queries

A temp table has to be introduced to Entity Framework Core before it can be used in queries. For that, we go to OnModelCreating() of the corresponding DbContext and configure a new entity. The entity may have any number of columns you need but in this example we will use just 1 column of the type Guid.

				
					public class MyTempTable
{
   public Guid Id { get; set; }
}

public class DemoDbContext : DbContext
{
   ...
   
   protected override void OnModelCreating(ModelBuilder modelBuilder)
   {
      ...

      modelBuilder.Entity<MyTempTable>()
                  .HasNoKey()
                  .ToView("#MyTempTable");
   }
}
				
			

The entity MyTempTable is created as keyless because (1), I do not want to decide yet, whether the column Id will be unique or not, and (2), it does not really matter for temp tables. Furthermore, the entity is configured as a View, so the EF migrations ignores it. The mentioned configuration parameters are not essential. The only line that matters is modelBuilder.Entity<MyTempTable>().

To get a reference to an IQueryable<MyTempTable>, we use the method Set<T> defined in DbContext. Additionally, I select the column Id right away because the helper class MyTempTable is of no interest.

				
					public class DemoDbContext : DbContext
{
   ...
   
   public IQueryable<Guid> MyTempTable => Set<MyTempTable>()
                                               .Select(t => t.Id);
}
				
			

Please note: In this simplified example, I am using the fixed table name #MyTempTable. This limits us to one temp table per database connection. In the real code, I add a suffix (#MyTempTable_1#MyTempTable_2, etc.) so the names are unique per connection.

For a generic approach the access to corresponding IQueryable<T> would be:

				
					public IQueryable<Guid> MyTempTable 
            => Set<MyTempTable>()
                   .FromSqlRaw($"SELECT * FROM {escaped-table-name}")
                   .Select(t => t.Id);
				
			

After configuration of the new temp table, we can use it like any other entity. 

				
					var products = await ctx.Products
                        .Where(p => ctx.MyTempTable.Contains(p.Id))
                        .ToListAsync();
				
			

The LINQ query above produces the following SQL statement.

				
					SELECT *
FROM Products
WHERE Id IN (SELECT Id FROM #MyTempTable)
				
			

Looks good so far, despite the exception SqlException: Invalid object name '#MyTempTable' because there is no #MyTempTable yet.

Creation of the temp table

 

The creation of a table requires some manual work, but the Model of Entity Framework Core will help us. First, we need to implement a method that generates the required SQL. We do this inside our DemoDbContext. For simplicity, we skip some column properties, like the DEFAULT-value, that are not necessary in our example.

				
					private string GetCreateTableSql(
   bool createPk)
{
   var sqlGenHelper = this.GetService<ISqlGenerationHelper>();

   var entity = Model.FindEntityType(typeof(MyTempTable));
   var tableName = entity.GetViewName();
   var escapedTableName = sqlGenHelper.DelimitIdentifier(tableName);

   var idProperty = entity.FindProperty(nameof(MyTempTable.Id));
   var columnName = idProperty.GetColumnName();
   var escapedColumnName = sqlGenHelper.DelimitIdentifier(columnName);
   var columnType = idProperty.GetColumnType();
   var nullability = idProperty.IsNullable ? "NULL" : "NOT NULL";

   var pkSql = createPk ? $", PRIMARY KEY ({escapedColumnName})" : null;

   var sql = $@"
CREATE TABLE {escapedTableName}
(
   {escapedColumnName} {columnType} {nullability}
   {pkSql}
);";
   return sql;
}
				
			

For a more generic approach, we could iterate over all properties to generate SQL for all columns. The Model provides us with all the necessary information.

What is left is the implementation of a method that executes the SQL statement. But before the execution, we must open the connection and keep it open. If the connection is closed, the Entity Framework Core will open the connection, execute the SQL, and close the connection again, which drops the temp table.

Please note that OpenConnectionAsync will not throw an exception if the connection is open already, in this case, the Entity Framework Core increments an internal counter only. The (real) database connection is going to be closed if the counter drops to 0.

				
					public async Task CreateMyTempTableAsync(
   bool createPk,
   CancellationToken cancellationToken = default)
{
   var sql = GetCreateTableSql(createPk);

   await Database.OpenConnectionAsync(cancellationToken);

   try
   {
      await Database.ExecuteSqlRawAsync(sql, cancellationToken);
   }
   catch (Exception)
   {
      Database.CloseConnection();
      throw;
   }
}
				
			

Let us try out everything we have implemented so far. The LINQ query does not throw any exceptions anymore, so the temp table has been created on the database successfully.

				
					await ctx.CreateMyTempTableAsync(true);

var products = await ctx.Products
                        .Where(p => ctx.MyTempTable.Contains(p.Id))
                        .ToListAsync();
				
			

The SQL statement of CreateMyTempTableAsync is:

				
					CREATE TABLE [#MyTempTable]
(
   [Id] uniqueidentifier NOT NULL
   , PRIMARY KEY ([Id])
)
				
			

Let’s insert some data into #MyTempTable.

(Bulk) Insert into temp table

 

The implementation of the bulk insert into a (temp) table differs the most depending on the database vendors. With MS SQL Server, we could use SqlBulkCopy, with SQLite we may Prepare a SqliteCommand and re-use it, or we build an INSERT statement with multiple VALUES to save round trips to the database, etc.

For this prototype, I will use SqlBulkCopy and to get the best performance, and we need to implement IDataReaderSqlBulkCopy will use the data reader to iterate over the productIds. Our implementation will not implement all methods and properties of the interface, but just a few that are used by the SqlBulkCopy.

				
					public class MyTempTableDataReader : IDataReader
{
   private readonly IEnumerator<Guid> _enumerator;

   public int FieldCount => 1;

   public MyTempTableDataReader(
      IEnumerable<Guid> values)
   {
      _enumerator = values.GetEnumerator();
   }

   public bool Read()
   {
      return _enumerator.MoveNext();
   }

   public object GetValue(int i)
   {
      if (i == 0)
         return _enumerator.Current;

      throw new ArgumentOutOfRangeException();
   }

   public void Dispose()
   {
      _enumerator.Dispose();
   }

   // all other members throw NotImplementedException
   public int Depth => throw new NotImplementedException();
   public object this[int i] => throw new NotImplementedException();
   public bool GetBoolean(int i) => throw new NotImplementedException();
   ...
}
				
			

Next, we have to configure the instance of SqlBulkCopy.

				
					private SqlBulkCopy GetSqlBulkCopy()
{
   var sqlGenHelper = this.GetService<ISqlGenerationHelper>();

   var sqlCon = (SqlConnection)Database.GetDbConnection();
   var sqlTx = (SqlTransaction?)Database.CurrentTransaction?.GetDbTransaction();

   var entity = Model.FindEntityType(typeof(MyTempTable));
   var tableName = entity.GetViewName();
   var escapedTableName = sqlGenHelper.DelimitIdentifier(tableName);

   var idProperty = entity.FindProperty(nameof(MyTempTable.Id));
   var idColumnName = idProperty.GetColumnName();

   return new SqlBulkCopy(sqlCon, SqlBulkCopyOptions.Default, sqlTx)
          {
             DestinationTableName = escapedTableName,
             ColumnMappings =
             {
                new SqlBulkCopyColumnMapping(0, idColumnName)
             }
          };
}
				
			

In the end, we open the connection, create the MyTempTableDataReader, push all data to the database, and close the connection again.

				
					public async Task BulkInsertIntoMyTempTableAsync(
   IEnumerable<Guid> values,
   CancellationToken cancellationToken = default)
{
   using var bulkCopy = GetSqlBulkCopy();
   await Database.OpenConnectionAsync(cancellationToken);

   try
   {
      using var reader = new MyTempTableDataReader(values);
      await bulkCopy.WriteToServerAsync(reader, cancellationToken);
   }
   finally
   {
      Database.CloseConnection();
   }
}
				
			

Now our prototype is complete, and the example behaves as expected!

				
					List<Guid> productIds = ...;

await ctx.CreateMyTempTableAsync(true);
await ctx.BulkInsertIntoMyTempTableAsync(productIds);

var products = await ctx.Products
                        .Where(p => ctx.MyTempTable.Contains(p.Id))
                        .ToListAsync();
				
			

Summary

In this article, we looked at an approach, how to create and use temp tables with Entity Framework Core 3.1. Interestingly enough, the bulk insert was the most elaborate part because Entity Framework Core cannot help us much besides providing metadata about the table.
To improve the prototype, we may implement some (project-specific) convenience methods, so the temp tables‘ usage goes almost unnoticed.

If you have any ideas, suggestions, or questions, please write me an email pawel.gerr@thinktecture.com!

Kostenloser
Newsletter

Aktuelle Artikel, Screencasts, Webinare und Interviews unserer Experten für Sie

Verpassen Sie keine Inhalte zu Angular, .NET Core, Blazor, Azure und Kubernetes und melden Sie sich zu unserem kostenlosen monatlichen Dev-Newsletter an.

Newsletter Anmeldung
Diese Artikel könnten Sie interessieren
Database Access with Sessions
.NET
KP-round

Data Access in .NET Native AOT with Sessions

.NET 8 brings Native AOT to ASP.NET Core, but many frameworks and libraries rely on unbound reflection internally and thus cannot support this scenario yet. This is true for ORMs, too: EF Core and Dapper will only bring full support for Native AOT in later releases. In this post, we will implement a database access layer with Sessions using the Humble Object pattern to get a similar developer experience. We will use Npgsql as a plain ADO.NET provider targeting PostgreSQL.
15.11.2023
Old computer with native code
.NET
KP-round

Native AOT with ASP.NET Core – Overview

Originally introduced in .NET 7, Native AOT can be used with ASP.NET Core in the upcoming .NET 8 release. In this post, we look at the benefits and drawbacks from a general perspective and perform measurements to quantify the improvements on different platforms.
02.11.2023
.NET
KP-round

Optimize ASP.NET Core memory with DATAS

.NET 8 introduces a new Garbage Collector feature called DATAS for Server GC mode - let's make some benchmarks and check how it fits into the big picture.
09.10.2023
.NET CORE
pg

Incremental Roslyn Source Generators: High-Level API – ForAttributeWithMetadataName – Part 8

With the version 4.3.1 of Microsoft.CodeAnalysis.* Roslyn provides a new high-level API - the method "ForAttributeWithMetadataName". Although it is just 1 method, still, it addresses one of the biggest performance issue with Source Generators.
16.05.2023
AI
favicon

Integrating AI Power into Your .NET Applications with the Semantic Kernel Toolkit – an Early View

With the rise of powerful AI models and services, questions come up on how to integrate those into our applications and make reasonable use of them. While other languages like Python already have popular and feature-rich libraries like LangChain, we are missing these in .NET and C#. But there is a new kid on the block that might change this situation. Welcome Semantic Kernel by Microsoft!
03.05.2023
.NET
sg

.NET 7 Performance: Regular Expressions – Part 2

There is this popular quote by Jamie Zawinski: Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems."

In this second article of our short performance series, we want to look at the latter one of those problems.
25.04.2023