Haskell Style Guide

The following style rules are more what you'd call "guidelines" than actual rules1. There are no style marks in the assignments, but some bonus sections are marked subjectively and I look through your code to avoid penalising you multiple times for the same error. If I can't understand or read your code, then your marks may suffer as I cannot determine the cause of your problems when marking, or I may not be able to determine if you have actually correctly completed the bonus parts.

HLint

If your code can get through HLint and GHC's -Wall flag with no problems, I'll be happy with at least some of the basics. HLint can check things such as:

  • Opportunities to simplify your code
  • Extra parentheses
  • Poor choice of function (i.e reimplementation of the standard library)

If you have the haskell platform, getting hlint should be a matter of

$ cabal install hlint

Comments

Comments are there to explain why, not how nor what. That is what code is for.

Good:

digitVal :: Char -> Maybe Int
digitVal c | isDigit c = Just $ ord c - ord '0' -- because each digit appears consecutively after zero 
                                                -- in the ascii table 
           | otherwise = Nothing

Bad:

digitVal :: Char -> Maybe Int
digitVal c | isDigit c = Just $ ord c - ord '0' -- subtract the zero character from the character's ascii value
           | otherwise = Nothing
digitVal :: Char -> Maybe Int
digitVal c | isDigit c = Just $ ord c - ord '0' -- use the ord function to get the ascii value.
           | otherwise = Nothing

Naming Things

In general, all names should use camelCase.

Don't name things

Coming up with good names for things is hard, so avoid naming things if it's easy to do so (within reason). Using partial application, operator sections, function composition or η-equivalence are good ideas. But don't go crazy with golfing your code so that no names are introduced anywhere (probably wise to avoid functions like flip or curry or uncurry except in circumstances where it makes it clearer what's going on).

In particular, observe that:

f x = x / 3

and

f = (/ 3)

are equivalent (the η rule), and one avoids the problem of naming the argument.

Functions

Functions should in general be named based on what they return rather than what computation they do. An exception to this is monadic actions or algorithms with specific names (e.g infer is an acceptable name for a type inference function). Avoid meaningless action words like: process, manage, munge, perform, get, do, run except where they have accepted usages (e.g run is often used for monad transformers). Most importantly, do not restate the type signature as the function name. Convey how the function is to be used through its name.

As a corollary to this, it's important to provide meaningful type signatures to functions. See the section on effective use of types for more pointers.

Good:

digitVal :: Char -> Maybe Int
mostGeneralUnifier :: Term -> Term -> Substitution
launchTheMissiles :: LaunchCodes -> IO Devastation
sum :: [Int] -> Int

Bad:

getTheValue :: Char -> Maybe Int
substitutionForTwoTerms :: Term -> Term -> Substitution
runMissiles :: LaunchCodes -> IO Devastation
listOfIntegersToInt :: [Int] -> Int

Locally bound variables

Local variables that represent data should typically have short names. Single letters can and sometimes should be used, but only if they make the structure of the code clear and each letter is chosen intelligently. Avoid long names except for top-level bindings. If a function has a large number of local variables for data, this is a code smell and it should probably be decomposed.

The reason for short local variable names is to make them easily distinguishable from top-level function definitions.

For lists in particular, it's near universal to have a x:xs convention, where the variable name for the tail is the plural of the variable name for the head (which is typically a single character).

Avoid shadowing names (-Wall can warn you if you do this), except in the case of record field projections.

Once again, avoid restating type signatures, and useless words like "the". Also avoid naming a variable based on what you will do to it (e.g theListToSum).

Type Variables should always be a single character.

Good:

digits n = map digitVal (show n)

unifyAndApply t1 t2 t = mostGeneralUnifier t1 t2 `subst` t

launchTheMissiles code = initiateLaunchSequence code >> fire

sum :: (Num a) => [a] -> a
sum [] = []
sum (x:xs) = x + sum xs

Bad:

digits theInt = map digitVal (show theInt)

unifyAndApply firstTerm secondTerm termToSubstitute = mostGeneralUnifier firstTerm secondTerm `subst` termToSubstitute

launchTheMissiles p = initiateLaunchSequence p >> fire

sum :: (Num num) => [num] -> num
sum [] = []
sum (head:tail) = head + sum tail

Record Fields

While common Haskell practice says otherwise, I despise the practice or prefixing field names with the name of their type. If a record type Dog has a field for its age, then call the field age, not dogAge. Name conflicts can be resolved by putting the records in separate modules and using qualified imports.

Future extensions of GHC will make record fields overloaded and make the type-prefixing completely obsolete, so please don't do it.

Types

Types should be named based on their semantic domain, not their representation.

Good:

type Length = Int
type VariableName = String
type Substitution = [(VariableName, Term)]

Bad:

type SignedInt32 = Integer
type ListOfChars = String
type ListOfPairs = [(ListOfChars, Term)]

Effective Use of Types

The mantra here is to make illegal states unrepresentable, that is, prevent bugs by making it difficult to write a type-correct program that is incorrect.

Use strings only for strings

Do not use so-called "stringly typed" data structures. If data is of type String, it should be best represented by a series of characters.

If you read a string from the outside world, you ought to parse it into some more richly structured data type as soon as possible. Do not let unvalidated strings permeate your program.

Avoid Partial Functions

The best way to avoid crashes due to pattern match failures is to have functions that are total - that is, they produce a result (not an exception!) for all inputs.

There are, in general, two ways to transform a partial function into a total one:

  • Restrict the domain: Change the type of the input to the function such that the cases where your function is not defined cannot arise. Using GADTs (a haskell extension) or "smart constructors" (see below) can help here:
data NonEmptyList a = Cons a [a]

head :: NonEmptyList a -> a
head (Cons a as) = a
  • Expand the codomain: Change the type of the output to the function such that the cases where your function is not defined can now be defined. Typically this involves a Maybe type.
head :: [a] -> Maybe a
head (a:as) = Just a
head [] = Nothing

Use Newtypes and "Smart" Constructors

Imagine a function that produced an identity matrix of the given dimensions:

identity :: Int -> Int -> Matrix

We cannot a priori determine if the first argument is the width of the matrix or the number of rows in the matrix. We can alleviate this problem somewhat with type synonyms:

type Rows = Int
type Cols = Int
identity :: Rows -> Cols -> Matrix

But if I were to call identity with the arguments swapped, the compiler would still not complain. It is better if we use a newtype here:

newtype Rows = Rows Int
newtype Cols = Cols Int
identity :: Rows -> Cols -> Matrix

This way, while it is still trivial to convert Int s to Row s and Col s, it is impossible to unintentionally confuse them without a compiler error.

We can go one step further by using a so-called "smart" constructor, to ensure that both dimensions are positive:

module Dimensions (Rows, Cols, rows, cols) where
newtype Rows = Rows Int
newtype Cols = Cols Int

rows :: Int -> Maybe Rows
rows x | x > 0     = Just $ Rows x
       | otherwise = Nothing

cols :: Int -> Maybe Cols
cols x | x > 0     = Just $ Cols x
       | otherwise = Nothing

module Main where
import Dimensions     
identity :: Rows -> Cols -> Matrix

Now it is impossible to call identity without first checking if the dimensions are positive, as the default constructors for Rows and Cols are not exported from the module Dimensions. On the other hand, this approach is now a fair bit more heavyweight, so a balance must be struck.

Formatting

In general I am not too concerned about formatting so long as you are consistent.

I would also prefer lines to remain less than 120 characters long, as it's easier to read in the marker tool.

Indentation and Alignment, let and where

Try to align bars for guards, equals signs for pattern matching, and arrows for case statements.

For do notation, you are welcome to use either indentation based syntax (the generally preferred style in Haskell programming) or the curly-braces and semicolons version, simply because syntax errors can be hard to debug with the indentation style. Just make sure you always use the same style — don't mix them.

If a line gets too long, you can break after the equals sign:

function f = 
    veryLongOtherFunction f + evenLongerStuff + ...

Typically, you will want to indent around 4 spaces so that where bindings can be indented.

function f =
    veryLongOtherFunction f + evenLongerStuff + ...
  where 
    binding1 = foo
    binding2 = bar

I prefer the following style for let expressions:

function f = let x = foo
                 y = bar 
              in result

When to use let and when to use where

In general, use where for locally-defined functions, and let for intermediate values:

f i = let x = firstStep f
          y = secondStep x
       in thirdStep y
  where
    firstStep f = ...
    secondStep x = ...
    thirdStep y = ...

Large, multiline lists, do notation, records and operator sequences

Typical haskell style prefixes with operators, commas, and semicolons:

foo x = do { putStrLn "hello world"
           ; x <- getLine
           ; putStrLn x
           }

data Bar = Bar { a :: Bool
               , b :: Char
               }

baz = [ "this"
      , "is"
      , "a"
      , "list"
      , "of"
      , "words"
      ]

blah a b c = something a 
          <> somethingElse b
          <> c

I like this style, but I also don't mind if you use this style:

foo x = do { --or just use alignment
  putStrLn "hello world";
  x <- getLine;
  putStrLn x
 }

data Bar = Bar { a :: Bool,
                 b :: Char 
               }
baz = ["this",
       "is",
       "a",
       "list",
       "of",
       "words"]


blah a b c = something a  <> 
             somethingElse b <>
             c

Parentheses etc.

Follow the guidelines of hlint!

Footnotes:

Author: Liam O'Connor

Created: 2014-07-31 Thu 19:18

Emacs 24.3.50.2 (Org mode 8.2.5h)

Validate