Extensible Data Types based on Associated Types
The Problem
Algebraic data types are notorically hard to extend. To see why, consider the following declarations:
data Bird = Duck | Emu | Seagull canFly :: Bird -> Bool canFly Duck = True canFly Emu = False canFly Seagull = True
It is hard to add another bird, such as Penguin, not only because the definition of the data type would have to be extended, but especially because all functions that consume this data type, such as canFly, need to be extended, too. Obvious attempts to solve the problem, such as requiring a default case, all have serious drawbacks.
In case we know a priori that we might want to extend a data type, we can define it such that it is more flexible.
data Bird ext = Duck | Emu | Seagull | Other ext
However, this requires to define all functions with more foresight, too.
canFly :: (ext -> Bool) -> Bird -> Bool canFly canOtherFly Duck = True canFly canOtherFly Emu = False canFly canOtherFly Seagull = True canFly canOtherFly (Other ext) = canOtherFly ext
At least, we can now define
data OtherBird = Penguin -- This could of course have more than one alternative type ExtBird = Bird OtherBird otherCanFly :: OtherBird -> Bool otherCanFly Penguin = False extCanFly :: ExtBird -> Bool extCanFly = canFly otherCanFly
If all functions that use canFly accept the canOtherFly argument, we can use them on the extended data type without modification, simply by passing in otherCanFly.
This approach certainly has a number of drawbacks. Firstly, explicitly threading the additional argument to canFly through all functions that directly or indirectly use canFly is tedious to begin with, but if we have multiple extensible data types, it can quickly get impractical. Secondly, we might want to extend a data type multiple times; i.e., we might modify the definition of OtherBird to also get an extra type parameter to plug in an another extension. This makes the explicit threading of the extra arguments a notch more tedious. Thirdly, we have to prepare a data type and all its consuming functions suitably to be extensible.
We can not hope to overcome the third point as long as we want to have both standard data types and extensible data types. Having only extensible data types has clear disadvantages as long as they are more work to define and use then vanilla data types. Moreover, they might also be undesirable for formally reasoning about programs. Nevertheless, we might hope to eliminate the first two drawbacks.
Using Associated Types
The above example can be implemented as follows with associated types:
class Birds a where data Bird a canFly :: Bird a -> Bool data Std instance Birds Std where data Bird Std = Duck | Emu | Seagull canFly Duck = True canFly Emu = False canFly Seagull = True
We can extend the Bird data type and all class methods consuming that type by simply adding more instances, such as
data Other instance Birds Other where data Bird Other = Penguin canFly Penguin = False
Here the explicit threading of the manual method to extensible data types is replaced by the dictionary passing mechanism used by the compiler to implement type classes. Moreover, multiple extensions of one data type require no additional work from the programmer. However, we still need to decide a priori whether to define the data type in this manner and we need to put all functions over the data type that we like to be extensible into the class definition. (We'll come back to that last point below.) In essence, we collect all data types whose variants we want to be alternatives for each other into one indexed type family.
Whether functions using methods, such as canFly work transparently in the face of extensions depends on their signature. The following function will obviously only work for standard birds:
swimsAndFlys :: Bird Std -> Bool swimsAndFlys Duck = canFly Duck swimsAndFlys Emu = False swimsAndFlys Seagull = canFly Seagull
In contrast, the next function works fine for any bird we ever include in the class:
tooFatToFly :: Birds a => Bird a -> Bool tooFatToFly bird = not . canFly
Caveat: We cannot easily put the (extended) elements of a data type into a homogenous structure, such as a list; so, [Emu, Penguin] leads to a type error, as it would require Bird Std and Bird Other to unify. Note that, when using the manual encoding outlined initially, [Emu, Penguin] would be ill-typed, too, but [Emu, Other Penguin] would be fine.
OO Languages
It is well known that algebraic data types in functional languages and classes in OO languages have orthogonal extension properties. In a functional language, it is easy to add a new function scrutinising a data type, but hard to add a new variant. In an OO language, it is easy to add a new variant (i.e., instance) to a class, but hard to add a new member function.
Interestingly, our proposed solution to extensible data types based on associated types inherits the properties of OO languages that we can easily add new variants, but have a closed set of class methods operating on these variants. Functions defined outside the class can operate on the variants defined in one instance (such as swimsAndFlys) above, but not on variants spanning multiple instances.
Type Classes with Existential Types
SeanSeefried
rightly points out that we can achieve a very similar encoding with a
combination of existential types and type classes using an encoding
suggested by
Konstantin Läufer. The basic definition would be
class Birds a where canFly :: a -> Bool data BIRD = forall a.Birds a => MakeBird a instance Birds BIRD where canFly (MakeBird b) = canFly b data Bird = Duck | Emu | Seagull instance Birds Bird where canFly Duck = True canFly Emu = False canFly Seagull = True
Now, we can extend this with
data ExtBird = Penguin instance Birds ExtBird where canFly _ = False
This leads to the question of what the difference between the associated types and the existential types solution is. There seem to be two main differences:
-
the denotation of the encoded type family and
-
the impact on compiler optimisations.
In the associated types variant, the type family is denoted Bird a and its members are obtained by suitable instantiation of the type variable a; i.e., we get Bird Std and Bird Other. In the variant using existential types, the type family is denoted BIRD and its members are Bird and ExtBird. More precisely, with associated types, the family/family-member relationship is embedded in the type structure as it corresponds to type variable instantation. However, with existential types that relationship is a meta property of the program that is not reflected in the type structure (i.e., apart from a suitable choice of identifiers the denotation of the family and it's members bears no resemblence). In other words, associated types allow a deep encoding of type families, whereas the encoding with existantial types is shallow.
The disadvantage of the shallow encoding is that we have to coerce between values of the type family BIRD and its members Bird and ExtBird by adding or stripping the data constructor MakeBird. That's not required for the associated types encoding. Nevtherless, the shallow encoding also has an advantage. We can easily compile a homogenous list from an arbitrary collection of BIRDs, we can't do that with the associated type Bird a. Which of this is a more serious concern probably depends on the requirements of the concrete application.
The construction of the data type BIRD (in the existential types variant) uses a type context, namely Birds a on the existential type. In terms of the implementation, this means that the dictionary representing instances of Birds at runtime will be included into every value of that type. If such a value is decomposed - by pattern matching on MakeBird (as done in the BIRD instance of Birds) - it is this dictionary which is used when class methods are applied to the value embedded in MakeBird. This means that any optimisations that the compiler performs to use type specialisation to eliminate the overhead of dictionary passing will be in vain for values embedded in MakeBird. This is not the case when associated types are used, but whether that's a concern will again depend on the concrete application.
Please email comments to Manuel Chakravarty. [Last update 16 Feb 5]