Please note: this article is not about whether a byte array should or should not be used with relational databases but rather about “if you do, then be aware of …”
EF's default behavior with byte arrays
When working with byte arrays and change tracking is active, then on SaveChanges
Entity Framework Core (EF) is not just comparing the object references of the arrays, but the content as well. If the corresponding property represents some kind of bit-mask, i.e., every byte
in the array is changed independently, then comparing every byte
is necessary. But, most of the time, I see in projects that the properties are used for persisting small binary data, like thumbnails, which are considered immutable. In such cases, it is unlikely that someone will change single bytes inside the array. If the thumbnail has to be changed, then the byte array is replaced by another byte array, i.e., the new one is a completely new object reference.
How much does it cost?
Having some binary data, the comparison of the content is not wrong in general but unnecessary. I’ve benchmarked a few use cases in terms of memory and CPU usage. One entity was using the default behavior, the other a custom ValueComparer.
The benchmarks update 10k entities with 1kB array each. Before calling SaveChanges
, the property is assigned one of two new arrays. One new array has 1 different byte at the beginning and is considered the best-case, and the other has a different byte
at the end of the array.
// array read from database = [0,0,0,...,0];
var newArray_bestCase = [1,0,0,...,0];
var newArray_worstCase = [0,0,0,...,1];
All benchmarks do two things: update the property bytes
and call SaveChanges
.
entitiesLoadedFromDb.ForEach(e => e.Bytes = newArray_bestCase); // or newArray_worstCase
await myDbContext.SaveChangesAsync();
For benchmarking, I use the library BenchmarkDotNet with the MemoryDiagnoser
.
| Method | Mean | Error | StdDev | Gen 0 |Gen 1 | Allocated |
|------------------ |-----------:|---------:|---------:|------:|-----:|----------:|
| Default_BestCase | 337.0 ms | 4.01 ms | 3.75 ms | 7000 | 2000 | 63 MB |
| Default_WorstCase | 1,220.7 ms | 11.84 ms | 11.07 ms | 66000 | 2000 | 531 MB |
| Custom_BestCase | 325.6 ms | 6.16 ms | 5.46 ms | 8000 | 2000 | 65 MB |
| Custom_WorstCase | 330.5 ms | 5.02 ms | 4.70 ms | 8000 | 2000 | 65 MB |
Worst case, the memory usage rises from 63 MB to 531 MB (ca. 850%) and the duration from 337 ms to 1220 ms (over 350%), when using the default behavior. With the custom ValueComparer
, the values always staylow.
Use reference equality for opaque binary data
The ValueComparer
can be changed in OnModelCreating
or in IEntityTypeConfiguration<T>
via the method SetValueComparer
. The method expects an instance of ValueComparer
, which can be implemented from scratch or by using the generic class ValueComparer<T>
. The constructor of ValueComparer<T>
expects three expressions:
equalsExpression
: compares two instances using reference equalityhashCodeExpression
: computes the hash codesnapshotExpression
: passes the reference of the array as is because it is enough for reference equality
builder.Property(e => e.Bytes)
.Metadata
.SetValueComparer(new ValueComparer(
(obj, otherObj) => ReferenceEquals(obj, otherObj),
obj => obj.GetHashCode(),
obj => obj));
Summary
In this article, we looked at the ValueComparer
and how it affects memory and CPU usage when using byte arrays with EF. Although we were talking about byte arrays only, the same performance issues could arise with all custom objects with a ValueConverter
(please note: Converter
, not Comparer
).