Sunday, September 9, 2018

Comparison of Entity Framework to Core Data

Object Relational Mapping libraries connect in-memory object graphs to relational databases. Object-oriented programming is built upon the idea that there is an in-memory object graph, where each object is an instance of a class. An ORM is the software that can save that object graph to a database, either on-disk or using a service across the network.

Entity Framework is Microsoft’s premier ORM library, and Core Data is Apple’s premier ORM library. Both have the same goals - to persist an object graph to a database - but they were developed by different companies for different languages. It stands to reason that they made some different design choices.

Which Entity Framework?


Microsoft is infamous for creating multiple ways to do the same thing, and ORM libraries are no different. There are two versions of Entity Framework: Entity Framework 6, and Entity Framework Core. The documentation says that Entity Framework Core is the new hotness. Also, Entity Framework Core is open source.

So let’s start using Entity Framework Core, right? Well, not so fast. It turns out that you have to pick a runtime that Entity Framework Core will run on top of.

Which Runtime?


Entity Framework was originally developed for .NET. So that’s fine, but it turns out there are multiple versions of .NET.
  • .NET Framework only runs on Windows
  • .NET Core is written by Microsoft, and runs on Windows, Linux, and macOS. The documentation says that .NET Core is better than .NET Framework. Also, .NET Core is open source.
  • .NET Standard is just a standard. It isn’t a piece of software - it’s a specification that describes a level of support that a runtime needs to have in order to be compliant. Xamarin is another .NET runtime that supports the .NET Standard (and it runs on iOS / Android). Targeting this runtime means your app will work in every .NET implementation, but it won’t have access to some of the libraries only present in .NET Core.
  • The Universal Windows Platform is a runtime compliant with the .NET Standard. The Entity Framework documentation says that UWP is now supported. One interesting note: as part of the compilation process, the platform-independent .NET bytecode is run through the .NET Native toolchain, which produces a platform-dependent binary. They say this is to improve performance. (So I guess this means that the Universal Windows Platform isn’t really universal?) This compilation is somewhat lossy because reflection doesn’t fully work in native apps, and it sounds like Entity Framework had some bugs here that they had to fix.
There’s an example in the Entity Framework Core documentation about how to use it with the Universal Windows Platform, and UWP is the new hotness, so I’ll use that. If you dig into the example, you’ll find that the Entity Framework tools don’t work with UWP projects, so they had to make a dummy .NET Core project with nothing inside it, just to run the tools. How unfortunate.

Getting Entity Framework


Entity Framework is not built in to the system. Instead, you’ll have to get it from Visual Studio’s blessed package manager, named NuGet. When you install packages with NuGet, they’re not installed across the whole system; instead, they’re installed only for a single project. NuGet is built in to Visual Studio - simply go to Project -> Manage NuGet Packages to search/install packages.

Entity Framework is designed to be pluggable to different kinds of databases, and each database has its own package inside NuGet. The example uses a SQLite database, so it uses the Microsoft.EntityFrameworkCore.Sqlite package. There is also another package, Microsoft.EntityFrameworkCore.Tools, which includes command-line tools to generate migration code / apply migrations, so that one is included too.



How to get Core Data


It’s already part of the platform, and there’s only one version. Just use it.

High Level


Both libraries have a concept of a “context” which is the thing that holds the link to all the objects in the object graph. For Entity Framework, this is the Microsoft.EntityFrameworkCore.DbContext, and for Core Data, this is the NSManagedObjectContext. When you create an object, you register it with the context, and when you delete an object, you notify the context that it has been deleted. After you’ve done all your modifications, you tell the context to “save,” which stores all the changes in the database.

Entity Framework:
var blog = new Blog { url = url };
db.Blogs.Add(blog);
db.SaveChanges();


Core Data:
let blog = Blog(context: context, url: url)
try context.save()


Read/Modify/Write operations are also quite similar:

Entity Framework:
var blog = db.Blogs.First();
blog.Url = url;
db.SaveChanges();


Core Data:
let fetchRequest = Blog.fetchRequest() as NSFetchRequest
fetchRequest.fetchLimit = 1
let blog = try context.fetch(fetchRequest)[0]
blog.url = url
try context.save()



Context


In Core Data, the NSManagedObjectContext is just a class. When modifications are made to the object graph, the NSManagedObjectContext makes a strong reference to the object (because Swift is reference-counted, the distinction between strong and weak references are important). When it gets saved, the NSManagedObjectContext knows what to save.

However, in Entity Framework, the DbContext is magical. The application needs to subclass DbContext, and the subclass needs to have DbSet properties. These DbSets refer to the various tables in the database. When the DbContext’s constructor is run, it uses reflection to inspect itself, find all the DbSet properties, and inspect the generic type argument to determine the data model. It builds up a Microsoft.EntityFrameworkCore.ModelBuilder, and lets you make any last-minute changes you want inside DbContext.OnModelCreating().

Objects


In Core Data, each object in the object graph is represented by NSManagedObject. This object acts like a dictionary; you can “set properties” by using the Key-Value Coding functions value(forKey:) and setValue(_, forKey:). You can get better type-safety if you subclass NSManagedObject for each of your entities, and add typed properties. However, if you do this, you have to make sure that getting/setting these properties calls the Key-Value Coding methods on the inner NSManagedObject. Swift has a helpful keyword, @NSManaged, which does this for you. Even further, Xcode will even generate the subclass for you at compilation time, with the appropriately typed @NSManaged properties, if you select the appropriate value for “Codegen” in the right sidebar, with the entity selected. (Or you can use the managedObjectClassName string property on NSEntityDescription when building the NSManagedObjectModel, and Core Data will construct this class at runtime using the Objective-C runtime).



NSManagedObjects know which context they belong to, and their context requires you to pass in the context. This is presumably so when values get modified, the NSManagedObject can notify the NSManagedObjectContext.

In Entity Framework, each object in the object graph is just a regular object. No subclassing required, and no manifest or custom model creation code either. The DbContext learns about the object’s shape from reflection. This means that the ChangeTracker in the DbContext doesn’t automatically know about changes; instead, it has to DetectChanges() which iterates through the known objects. This is done automatically when it’s required.

Connection Between Classes and Data


In Core Data, when the system wants to populate a property of an object, it can do it dynamically, because the getter of the property will be filtered through value(forKey:). This way, the setter doesn’t have to know what the name of the field is at compilation time, which is required when the data model is created at runtime.

However, in Entity Framework, objects are just regular classes. This is a problem, though; how can Entity Framework set the correct property on the class when the name of the property is only known at runtime (because the model can be modified at runtime)? Well, it turns out it uses Linq to build a program at runtime that can set properties that are only known at runtime. This is extremely powerful; it looks like you can use Linq to write almost anything that you could write in C#.

Data Model


In Entity Framework, the DbContext constructor uses reflection to discover the object graph. You get a chance to modify the model at runtime in DbContext.OnModelCreating(), which is called inside the DbContext’s constructor. However, adding an entity to the model requires a class to match that entity. However, for properties, you can have a property that is present in the model but isn’t present in the class. This is valuable for things like automatically saved date fields.

In Core Data, there is a separate data file that describes the model declaratively (with the file extension .xcdatamodeld). You can edit these with a GUI inside Xcode. This file corresponds to a NSManagedObjectModel, which you can build at runtime instead, if you want. Then, when you bring up the Core Data stack, you can specify this model.

Fetch Queries


In EntityFramework, the DbSet implements the IQueryable interface. This is an interface that represents a Query node inside the Linq framework. Functions like .where() and .OrderBy() operate on these nodes and return other nodes, letting you chain up these operators. These operators aren’t actually applied at the time you call the function; instead they are a sort of retained-mode program. Whenever you want to actually pull data out of the query at the end, the runtime will look at the chain of operators and figure out how best to apply it (usually by creating SQL that matches the operation). However, some of the operations need to be applied by the client; this transparently works, but it obviously isn’t great for performance.

Core Data uses the same sort of thing, encapsulated by NSPredicate and NSExpression. NSExpression is the same kind of node inside a retained-mode program. These are quite powerful; you can even call arbitrary selectors on arbitrary objects. The big difference between this and Linq is that, in true Objective-C style, NSExpression isn’t typed, but Linq is typed.

Parallelism


Both Entity Framework and Core Data’s contexts are single-threaded, which means the managed objects all have to live on the same thread as their context. However, fetches and stores involve round trips to databases, which can be quite slow and would block the main thread. Entity Framework gets around this by providing Async versions of the fetching / saving functions. In this model, the objects live on the main thread, but the UI can still be redraw during the slow database operations.

Core Data has two approaches to this. One way is to host the entire Core Data object graph in another thread. You get this if the NSManagedObjectContext is initialized with the concurrencyType argument set to .privateQueueConcurrencyType. If you do this, the NSManagedObjectContext will create its own private queue, and operations on the NSManagedObjectContext are only valid from that queue. You run code on that queue by using NSManagedObjectContext’s perform(_:) function. Inside the callback, you can execute your fetch requests, build up some data, and post a message back to the main queue with your data (but not with NSManagedObjects!).

Alternatively, you can use the main queue, and use NSAsynchronousFetchRequest to create objects asynchronously. As far as I can tell, there is no equivalent call for NSManagedObjectContext.save(), and from my sampling, it appears that NSManagedObjectContext.save() is synchronous (though perhaps it doesn't have to be?)

Entity Framework:
var blog = await db.Blogs.FirstAsync();
blog.Url = url;
await db.SaveChangesAsync();


Core Data:
let fetchRequest = Blog.fetchRequest() as NSFetchRequest
fetchRequest.fetchLimit = 1
let asynchronousFetch = NSAsynchronousFetchRequest(fetchRequest: fetchRequest) { (result) in
    let blog = result.finalResult![0]
    blog.url = url
    do {
        try context.save()
    } catch {
        …
    }
}
try context.execute(asynchronousFetch)


Edit: The Core Data Best Practices video from 2012 describes how you can achieve asynchronous saves by using a parent/child NSManagedObjectContext pair. You set the child to live on one thread and the parent to live on the other thread, and when you tell the child to save, it will just push its changes to the other context on the other thread. Then you can asynchronously tell the other thread to save by using perform(_:).

Migrations


In Entity Framework, a migration is modeled as a chunk of code. However, this code is written by one of the tools inside Microsoft.EntityFrameworkCore.Tools. The command line tool saves a snapshot of whatever the current database schema is, and can create a new schema by using the same mechanism that DbContext uses when it creates a schema at runtime. Then, after you’ve created a migration, you can apply it, which involves running the code on your local development machine to upgrade the database to the new version. These tools even have documentation. You have a chance to fine-tune the migration by editing the source code the tool created, because creating the migration code and applying it to the database are two distinct steps. Because the migration is generated code, you can run it in your app instead of on your local development machine.

But wait, not so fast! The command-line tools use reflection on your source code to generate a model? Yep. That means the command-line tools build your source code. Then they look in your source code for the new model. If the command-line tools are supposed to perform the migration, then it’s supposed to connect to the database, too. But wait, how does it connect to the database? Well, your source code connects to the database … and the command-line tools will just run that code. The documentation describes what functions / classes it will look for in your code and run on your local machine.

Core Data handles migrations totally differently. Some simple migrations can happen automatically, right when you open the database (and you can check whether your change is “simple” by using a class function on NSMappingModel.) But, more complicated migrations are described declaratively in a .xcmappingmodel file, which Xcode lets you edit with a GUI. The expressions are described by strings, which (presumably) are the same strings that NSExpression accepts. This file corresponds to a NSMappingModel, which you can construct at runtime instead of loading from a bundle. Then, when you want to run the migration, you can use NSMigrationManager and pass in the NSMappingModel you want it to use. (One gotcha: to create a .xcmappingmodel in Xcode, it has to be between two different versions of the same model. You can create a new version of a model by selecting Editor -> Add Model Version.)

Configuring the Database


The constructor to Microsoft.EntityFrameworkCore.DbContext requires a Microsoft.EntityFrameworkCore.DbContextOptions, which is built by a Microsoft.EntityFrameworkCore.DbContextOptionsBuilder. C# has this nifty feature where you can declare a free function, but give the first argument the “this” keyword, and that free function will appear as if it was inside the class definition. So, the individual database package adds a function to the DbContextOptionsBuilder. (I haven’t investigated what the package does inside this function.) Then, the client code calls optionsBuilder.UseSqlite(connectionString), for example. You can use Microsoft.Data.Sqlite.SqliteConnectionStringBuilder() to build the connection string. You do this inside the DbContext.OnConfiguring() function so the command-line tools know how to configure the database.

Core Data works differently. Each persistent store is described via a NSPersistentStoreDescription, which includes a string “type” property. This “type” refers to the registeredStoreTypes registry inside NSPersistentStoreCoordinator, which can be extended with additional subclasses of NSPersistentStore. There are also 4 built-in strings for well-known database types.

2 comments:

  1. Thank you for writing such a useful and interesting article.Oracle Fusion SCM Online Training

    ReplyDelete
  2. This comment has been removed by the author.

    ReplyDelete