Potential issues without using temp tables
The usage of temp tables is beneficial if we have to work with a lot of data and/or with multiple databases, particularly with different types of DbContext
. One of the most common use cases is when we have to load data like Products
with specific identifiers.
// could contain hundreds or thousands of product ids
List productIds = ...;
DemoDbContext ctx = ...;
var products = await ctx.Products
.Where(p => productIds.Contains(p.Id))
.ToListAsync();
Without temp tables there is not much choice but to use the method Contains
that sends all productIds
to the database. The corresponding SQL statement looks like this:
SELECT *
FROM Products
WHERE Id IN
(
'a1204953-d5e5-477c-a5d3-00111e810bcd',
...,
'880d3f31-2184-4566-9b63-1a762dcc5c13'
)
With a few dozen IDs, this approach performs very well, but with hundreds or thousands of values, the performance starts to degrade. Furthermore, it is not uncommon that the productIds
are used in more than one query, for example, to load Products
and OrderItems
.
List productIds = ...;
// 1st usage of "productIds"
var products = await ctx.Products
.Where(p => productIds.Contains(p.Id))
.ToListAsync();
// 2nd usage of "productIds"
var orderItems = await ctx.OrderItems
.Where(p => productIds.Contains(p.ProductId))
.ToListAsync();
Multiple transmissions of a large SQL statement to the database and its parsing is certainly an issue, but likely not the biggest one. Depending on the particular query, the database may decide on an execution plan that is not the ideal one, due to the lack of information about the data. One wrong decision by the database may lead to a chain reaction (see: Scanning of the data source is just the tip of the iceberg) that can cause considerable load on the database server.
Benefits of temp tables
One benefit of the temp tables is that we do not need to send the same data like productIds
multiple times to the database when using it in multiple queries. Furthermore, we may choose a more performant approach to send the identifiers to the database like using SqlBulkCopy
in case of MS SQL Server. Last but not least, we can provide the database with more information about the data, so the database can to choose a better execution plan.
In our case, we can insert all productIds
into a temp table and create a unique (clustered) index
. By this, the database receives important information, so the records are unique and ordered. With unique, ordered data, the database should be able to choose the best operations to process the request.
The previous example may not yield the best performance when using built-in features only, but it is supported by Entity Framework Core. However, working with a collection of simple values like productIds
is not the same as having a collection of tuples because we can not use Contains()
anymore but need the method Join()
.
A concrete example would be loading OrderItems
for specific customers and products. The following LINQ query symbolizes our intent but throws an InvalidOperationException
at runtime because a Join
of an EF-query with an in-memory collection is not supported.
List<(Guid CustomerId, Guid ProductId)> customerProductTuples = ...;
var orderItems = await ctx.OrderItems
//.AsEnumerable()
.Join(customerProductTuples,
i => new { i.Order.CustomerId, i.ProductId },
t => new { t.CustomerId, t.ProductId },
(i, t) => i)
.ToListAsync();
Sure, we could add AsEnumerable()
before Join()
, but that will load all OrderItems
from the database, which is not an option, especially with big tables.
Adding support for temp tables
Adding support for temp tables to Entity Framework Core can be divided into three parts: introduction of temp tables to Entity Framework Core, creation of the temp tables, and inserting records. The first part does not depend on the concrete vendor of the database, so it applies to the MS SQL Server, SQLite, MySQL, and further. The last two parts are vendor-specific, especially the (bulk) insert of data, so each database requires a specific implementation.
Moreover, I will show you some key elements required to add temp table support to Entity Framework Core 3.1. By the end of this article, we will have a fully functional prototype. If you need more detailed information, feel free to look into the real code. The sources can be found in Azure DevOps: Thinktecture.EntityFrameworkCore.
We will start with the easier part, the introduction of temp tables to Entity Framework Core.
Using temp tables in queries
A temp table has to be introduced to Entity Framework Core before it can be used in queries. For that, we go to OnModelCreating()
of the corresponding DbContext
and configure a new entity. The entity may have any number of columns you need but in this example we will use just 1 column of the type Guid
.
public class MyTempTable
{
public Guid Id { get; set; }
}
public class DemoDbContext : DbContext
{
...
protected override void OnModelCreating(ModelBuilder modelBuilder)
{
...
modelBuilder.Entity()
.HasNoKey()
.ToView("#MyTempTable");
}
}
The entity MyTempTable
is created as keyless because (1), I do not want to decide yet, whether the column Id
will be unique or not, and (2), it does not really matter for temp tables. Furthermore, the entity is configured as a View
, so the EF migrations ignores it. The mentioned configuration parameters are not essential. The only line that matters is modelBuilder.Entity<MyTempTable>()
.
To get a reference to an IQueryable<MyTempTable>
, we use the method Set<T>
defined in DbContext
. Additionally, I select the column Id
right away because the helper class MyTempTable
is of no interest.
public class DemoDbContext : DbContext
{
...
public IQueryable MyTempTable => Set()
.Select(t => t.Id);
}
Please note: In this simplified example, I am using the fixed table name #MyTempTable
. This limits us to one temp table per database connection. In the real code, I add a suffix (#MyTempTable_1
, #MyTempTable_2
, etc.) so the names are unique per connection.
For a generic approach the access to corresponding IQueryable<T>
would be:
public IQueryable MyTempTable
=> Set()
.FromSqlRaw($"SELECT * FROM {escaped-table-name}")
.Select(t => t.Id);
After configuration of the new temp table, we can use it like any other entity.
var products = await ctx.Products
.Where(p => ctx.MyTempTable.Contains(p.Id))
.ToListAsync();
The LINQ query above produces the following SQL statement.
SELECT *
FROM Products
WHERE Id IN (SELECT Id FROM #MyTempTable)
Looks good so far, despite the exception SqlException: Invalid object name '#MyTempTable'
because there is no #MyTempTable
yet.
Creation of the temp table
The creation of a table requires some manual work, but the Model
of Entity Framework Core will help us. First, we need to implement a method that generates the required SQL. We do this inside our DemoDbContext
. For simplicity, we skip some column properties, like the DEFAULT
-value, that are not necessary in our example.
private string GetCreateTableSql(
bool createPk)
{
var sqlGenHelper = this.GetService();
var entity = Model.FindEntityType(typeof(MyTempTable));
var tableName = entity.GetViewName();
var escapedTableName = sqlGenHelper.DelimitIdentifier(tableName);
var idProperty = entity.FindProperty(nameof(MyTempTable.Id));
var columnName = idProperty.GetColumnName();
var escapedColumnName = sqlGenHelper.DelimitIdentifier(columnName);
var columnType = idProperty.GetColumnType();
var nullability = idProperty.IsNullable ? "NULL" : "NOT NULL";
var pkSql = createPk ? $", PRIMARY KEY ({escapedColumnName})" : null;
var sql = $@"
CREATE TABLE {escapedTableName}
(
{escapedColumnName} {columnType} {nullability}
{pkSql}
);";
return sql;
}
For a more generic approach, we could iterate over all properties to generate SQL for all columns. The Model
provides us with all the necessary information.
What is left is the implementation of a method that executes the SQL statement. But before the execution, we must open the connection and keep it open. If the connection is closed, the Entity Framework Core will open the connection, execute the SQL, and close the connection again, which drops the temp table.
Please note that OpenConnectionAsync
will not throw an exception if the connection is open already, in this case, the Entity Framework Core increments an internal counter only. The (real) database connection is going to be closed if the counter drops to 0.
public async Task CreateMyTempTableAsync(
bool createPk,
CancellationToken cancellationToken = default)
{
var sql = GetCreateTableSql(createPk);
await Database.OpenConnectionAsync(cancellationToken);
try
{
await Database.ExecuteSqlRawAsync(sql, cancellationToken);
}
catch (Exception)
{
Database.CloseConnection();
throw;
}
}
Let us try out everything we have implemented so far. The LINQ query does not throw any exceptions anymore, so the temp table has been created on the database successfully.
await ctx.CreateMyTempTableAsync(true);
var products = await ctx.Products
.Where(p => ctx.MyTempTable.Contains(p.Id))
.ToListAsync();
The SQL statement of CreateMyTempTableAsync
is:
CREATE TABLE [#MyTempTable]
(
[Id] uniqueidentifier NOT NULL
, PRIMARY KEY ([Id])
)
Let’s insert some data into #MyTempTable
.
(Bulk) Insert into temp table
The implementation of the bulk insert into a (temp) table differs the most depending on the database vendors. With MS SQL Server, we could use SqlBulkCopy
, with SQLite we may Prepare
a SqliteCommand
and re-use it, or we build an INSERT
statement with multiple VALUES
to save round trips to the database, etc.
For this prototype, I will use SqlBulkCopy
and to get the best performance, and we need to implement IDataReader
. SqlBulkCopy
will use the data reader to iterate over the productIds
. Our implementation will not implement all methods and properties of the interface, but just a few that are used by the SqlBulkCopy
.
public class MyTempTableDataReader : IDataReader
{
private readonly IEnumerator _enumerator;
public int FieldCount => 1;
public MyTempTableDataReader(
IEnumerable values)
{
_enumerator = values.GetEnumerator();
}
public bool Read()
{
return _enumerator.MoveNext();
}
public object GetValue(int i)
{
if (i == 0)
return _enumerator.Current;
throw new ArgumentOutOfRangeException();
}
public void Dispose()
{
_enumerator.Dispose();
}
// all other members throw NotImplementedException
public int Depth => throw new NotImplementedException();
public object this[int i] => throw new NotImplementedException();
public bool GetBoolean(int i) => throw new NotImplementedException();
...
}
Next, we have to configure the instance of SqlBulkCopy
.
private SqlBulkCopy GetSqlBulkCopy()
{
var sqlGenHelper = this.GetService();
var sqlCon = (SqlConnection)Database.GetDbConnection();
var sqlTx = (SqlTransaction?)Database.CurrentTransaction?.GetDbTransaction();
var entity = Model.FindEntityType(typeof(MyTempTable));
var tableName = entity.GetViewName();
var escapedTableName = sqlGenHelper.DelimitIdentifier(tableName);
var idProperty = entity.FindProperty(nameof(MyTempTable.Id));
var idColumnName = idProperty.GetColumnName();
return new SqlBulkCopy(sqlCon, SqlBulkCopyOptions.Default, sqlTx)
{
DestinationTableName = escapedTableName,
ColumnMappings =
{
new SqlBulkCopyColumnMapping(0, idColumnName)
}
};
}
In the end, we open the connection, create the MyTempTableDataReader
, push all data to the database, and close the connection again.
public async Task BulkInsertIntoMyTempTableAsync(
IEnumerable values,
CancellationToken cancellationToken = default)
{
using var bulkCopy = GetSqlBulkCopy();
await Database.OpenConnectionAsync(cancellationToken);
try
{
using var reader = new MyTempTableDataReader(values);
await bulkCopy.WriteToServerAsync(reader, cancellationToken);
}
finally
{
Database.CloseConnection();
}
}
Now our prototype is complete, and the example behaves as expected!
List productIds = ...;
await ctx.CreateMyTempTableAsync(true);
await ctx.BulkInsertIntoMyTempTableAsync(productIds);
var products = await ctx.Products
.Where(p => ctx.MyTempTable.Contains(p.Id))
.ToListAsync();
Summary
In this article, we looked at an approach, how to create and use temp tables with Entity Framework Core 3.1. Interestingly enough, the bulk insert was the most elaborate part because Entity Framework Core cannot help us much besides providing metadata about the table.
To improve the prototype, we may implement some (project-specific) convenience methods, so the temp tables’ usage goes almost unnoticed.
If you have any ideas, suggestions, or questions, please write me an email pawel.gerr@thinktecture.com!