Epsilon

Create a Monadic Parser

Posted on May 7, 2013

Online I found this very interesting paper on creating a monadic parser from scratch. I thought it would make for a fun project to build my own similar but slightly different version of a parser following this paper, so this next series of posts will deal with creating a parser entirely from scratch. You can find the code for these posts here.

Just a warning, this is for exercise only. I wouldn’t recommend using this to create your own language, there are better parser libraries like Parsec to use instead.

Defining a Data Type

To build a parser, we have to think about a few things first. First, let’s solidify what a parser is. For this article, we will think of a parser as a function that takes a string as input, and possibly outputs something useful. Since the purpose of this article is to create parsers we can combine to produce different parsers, we also should think of the parser returning any string it did not parse. Also, there might be a case where the string can not be parsed, so we must allow for that possibility in our library With that idea, we define a Parser data type

newtype Parser a = Parser ( String -> Maybe (a, String) )

A Simple Parser

So with this data type, let’s create a simple parser that parses a single character. For this parser, we will not care what character is parsed. As long as it can parse a character, this parser will return one.

eat :: Parser Char
eat = Parser eat'
  where
    eat' (x:xs) = Just (x,xs)
    eat' _      = Nothing

To use this function, let’s define some helper functions that will extract the parser function from the parser

parse :: Parser a -> (String -> Maybe (a, String))
parse (Parser p) = p

extract :: Maybe (a, String) -> Maybe a
extract = fmap fst 

runParser :: (Parser a) -> String -> (Maybe a)
runParser p s = extract $ (parse p) s

So now lets use the eat Parser we made

main = print $ runParser eat "omnomnom"
-- output: Just 'o'

Making a Monad of the Data Type

Earlier we said we want to combine parsers to create new parsers. To do that, we need to create a function that will combine the parsers in a way that makes sense to us.

So let’s try to create that function.

combine :: Parser a -> Parser b -> Parser b
combine (Parser p1) (Parser p2) = Parser combine'
  where
    combine' = join . fmap (p2 . snd) . p1

There we go! With this function, we call the first parser, then take the leftover string and put it into the second parser. After that, we return the contents of the second parser. Now we can do this.

main = print $ runParser (eat `combine` eat) "omnomnom"
-- output: Just 'm'

We have this combine function, but let’s go further than this. Let’s try to use existing structure to help define the possibilities of our parser. To do this, we will define Parser as a monad.

import Control.Monad

instance Monad Parser where
    return x = Parser $ \cs -> Just (x, cs)
    
    (>>=) p f = Parser $ join . fmap applyf . parse p
      where
        applyf (v, ys) = parse (f v) ys

So return makes a parser that does not parse a value, and passes the parameter of return to the parser. The bind function takes a parser, passes what that parser leaves unparsed to a function. This function then takes the value of the parser and uses that as an argument to return a parser. So in a way, this is what the combine function does, but instead uses a function that returns a parser.

Verifying the Data Type is a Monad

There are three laws a monad has to follow, so let’s check that the Parser type follows these laws.

Left Identity

The left identity law states that given a value x of type a and a function that takes a value of type a and returns a Parser, that

return x >>= p = p x

So for our parser, return was defined to place the x value into a null Parser, or a Parser that does not parse the string. The bind function takes that value x and applies it to the parser p. Thus, we can see that this law holds for how we defined return and the bind function.

Right Identity

The right identity law states that given a parser p, that

p >>= return = p

So the bind function takes the value out of the parser p, passes it to return. The return function just re-wraps that into the Parser type, so the right identity law holds.

Associativity

The associativity law states that given a parser p and two functions that return parsers f and g that

(p >>= f) >>= g = p >>= (\x -> f x >>= g)

So for our parser, we know that parser p passes its value and leftover string to the function f and the parser it produces. This step is done again between the parser f creates and the function g. The right side of the equality above is stating this more explicitly, but it does the same thing as the before.

This means that the Parser type we created is a monad under our definition of the bind function and return.

Adding onto the Monad Structure

While the monad structure makes combining parsers easier, there are a few more things we can do to make the Parser type easier to use. One way we can do this is have a function that takes in two parser. The function will try to apply a parser, and if it fails (returns Nothing) then return whatever is in the other parser. Put more simply, lets have an ‘or’ operator for our Parser. On top of that, let’s create the concept of a null Parser we described earlier

nullParser = Parser $ const Nothing

choice p1 p2 = Parser $ \cs -> case parse p1 cs of
                                    Nothing -> parser p2 cs
                                    x       -> x

The nullParser function is a Parser that returns Nothing no matter the input. The choice function gives us the ability to try multiple different parsers before saying we failed. Let’s give it a try.

char c = Parser char'
  where
    char' (x:xs) | x == c    = Just (c, xs)
                 | otherwise = Nothing
          _                  = Nothing

oneof (c:cs) = choice (char c) (oneof cs)
oneof _      = nullParser

letter = oneof "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"

main = print $ runParser letter "aasdd"
-- output: Just 'a'

Here we created a few more useful parsers, a parser that will parser any character of a string that is given to it, and a parser that parses letters of the alphabet. We defined these functions, but really they already describe another structure called a Monoid. A monoid has a function that takes two monoids. This function must be associative, and it must have some monoid that is a unit.

So for the Parser type, we defined nullParser and choice. In this case nullParser is the unit monoid. That is, it is the monoid where applying choice on it and a different parser always gives the result of the different parser. So you can see that nullParser definitely has that property. The last thing to check is that choice is associative, or that

(p1 `choice` p2) `choice` p3 = p1 `choice` (p2 `choice` p3)

Since choice acts like an or operator, we know it must be associative, just like the or operator. So the next step is to make Parser a monadplus.

instance MonadPlus Parser where
    mempty  = nullParser
    mappend = choice

With this, we can take advantage of all the the helper functions that can be used with monoids.

Calculator Parser

We will finish this article by using our new found Parser type to make a parser for a calculator language. So first, let’s map out what the syntax for the calculator is. We will make the calculator use a lisp like mathematical notation, where it will accept integers, and operations like multiplication, division, addition, and subtraction. An example program for this calculator language might be

(* 1 (+ 1 2))

Written in the more familiar mathematical notation, this is equivalent to

(1 + 2) * 1

Integer Parser

So first lets parse integers. An integer will be one or more digit, and that’s it. So let’s define a parser for it.

many :: Parser a -> Parser [a]
many p = Parser $ many' []
  where
    many' xs cs = case parse p cs of
                        Nothing       -> Just (xs, cs)
                        Just (x, cs') -> many' (x:xs) cs'

many1 :: Parser a -> Parser [a]
many1 p = Parser $ \cs -> listToMaybe . parse (many p) $ cs
  where
    listToMaybe Nothing        = Nothing
    listToMaybe (Just ([], _)) = Nothing
    listToMaybe  x             = x
    
digit = oneof "0123456789"

integer = many1 digit

And with this, we’ve defined a couple more important functions for us, many and many1. The many function trys to apply a parser as many times as it can, and dumps all the successful trys into a list The many1 function does the same, except gives Nothing if that list is empty.

Operator Parser

This is really simplistic if all we do is parse the characters

operator = oneof "+*/-"

Putting it together

Now all that’s left is to put it together and make a parser for our calculator

operator = oneof "+*/-"

digit = oneof "0123456789"

integer = many1 digit

spaces = many $ char ' '

expression = expression' `mappend` integer

expression' = do
    char '('
    spaces
    
    operator
    
    spaces
    
    expression
    
    spaces
    
    expression
    char ')'

So this is the whole parser. It checks for parenthesis, and spaces, followed by an operator and two other expressions!

comments powered by Disqus