## Create a Monadic Parser

Posted on May 7, 2013Online I found this very interesting paper on creating a monadic parser from scratch. I thought it would make for a fun project to build my own similar but slightly different version of a parser following this paper, so this next series of posts will deal with creating a parser entirely from scratch. You can find the code for these posts here.

Just a warning, this is for exercise only. I wouldn’t recommend using this to create your own language, there are better parser libraries like Parsec to use instead.

## Defining a Data Type

To build a parser, we have to think about a few things first. First, let’s solidify what a parser is. For this article, we will think of a parser as a function that takes a string as input, and possibly outputs something useful. Since the purpose of this article is to create parsers we can combine to produce different parsers, we also should think of the parser returning any string it did not parse. Also, there might be a case where the string can not be parsed, so we must allow for that possibility in our library With that idea, we define a `Parser`

data type

`newtype Parser a = Parser ( String -> Maybe (a, String) )`

## A Simple Parser

So with this data type, let’s create a simple parser that parses a single character. For this parser, we will not care what character is parsed. As long as it can parse a character, this parser will return one.

```
eat :: Parser Char
eat = Parser eat'
where
eat' (x:xs) = Just (x,xs)
eat' _ = Nothing
```

To use this function, let’s define some helper functions that will extract the parser function from the parser

```
parse :: Parser a -> (String -> Maybe (a, String))
parse (Parser p) = p
extract :: Maybe (a, String) -> Maybe a
extract = fmap fst
runParser :: (Parser a) -> String -> (Maybe a)
runParser p s = extract $ (parse p) s
```

So now lets use the `eat`

Parser we made

```
main = print $ runParser eat "omnomnom"
-- output: Just 'o'
```

## Making a Monad of the Data Type

Earlier we said we want to combine parsers to create new parsers. To do that, we need to create a function that will combine the parsers in a way that makes sense to us.

So let’s try to create that function.

```
combine :: Parser a -> Parser b -> Parser b
combine (Parser p1) (Parser p2) = Parser combine'
where
combine' = join . fmap (p2 . snd) . p1
```

There we go! With this function, we call the first parser, then take the leftover string and put it into the second parser. After that, we return the contents of the second parser. Now we can do this.

```
main = print $ runParser (eat `combine` eat) "omnomnom"
-- output: Just 'm'
```

We have this `combine`

function, but let’s go further than this. Let’s try to use existing structure to help define the possibilities of our parser. To do this, we will define `Parser`

as a monad.

```
import Control.Monad
instance Monad Parser where
return x = Parser $ \cs -> Just (x, cs)
(>>=) p f = Parser $ join . fmap applyf . parse p
where
applyf (v, ys) = parse (f v) ys
```

So `return`

makes a parser that does not parse a value, and passes the parameter of `return`

to the parser. The bind function takes a parser, passes what that parser leaves unparsed to a function. This function then takes the value of the parser and uses that as an argument to return a parser. So in a way, this is what the `combine`

function does, but instead uses a function that returns a parser.

## Verifying the Data Type is a Monad

There are three laws a monad has to follow, so let’s check that the `Parser`

type follows these laws.

### Left Identity

The left identity law states that given a value `x`

of type `a`

and a function that takes a value of type `a`

and returns a `Parser`

, that

`return x >>= p = p x`

So for our parser, return was defined to place the `x`

value into a null `Parser`

, or a Parser that does not parse the string. The bind function takes that value `x`

and applies it to the parser `p`

. Thus, we can see that this law holds for how we defined `return`

and the bind function.

### Right Identity

The right identity law states that given a parser `p`

, that

`p >>= return = p`

So the bind function takes the value out of the parser `p`

, passes it to `return`

. The `return`

function just re-wraps that into the `Parser`

type, so the right identity law holds.

### Associativity

The associativity law states that given a parser `p`

and two functions that return parsers `f`

and `g`

that

`(p >>= f) >>= g = p >>= (\x -> f x >>= g)`

So for our parser, we know that parser `p`

passes its value and leftover string to the function `f`

and the parser it produces. This step is done again between the parser `f`

creates and the function `g`

. The right side of the equality above is stating this more explicitly, but it does the same thing as the before.

This means that the `Parser`

type we created is a monad under our definition of the bind function and `return`

.

## Adding onto the Monad Structure

While the monad structure makes combining parsers easier, there are a few more things we can do to make the `Parser`

type easier to use. One way we can do this is have a function that takes in two parser. The function will try to apply a parser, and if it fails (returns `Nothing`

) then return whatever is in the other parser. Put more simply, lets have an ‘or’ operator for our `Parser`

. On top of that, let’s create the concept of a null `Parser`

we described earlier

```
nullParser = Parser $ const Nothing
choice p1 p2 = Parser $ \cs -> case parse p1 cs of
Nothing -> parser p2 cs
x -> x
```

The `nullParser`

function is a `Parser`

that returns `Nothing`

no matter the input. The `choice`

function gives us the ability to try multiple different parsers before saying we failed. Let’s give it a try.

```
char c = Parser char'
where
char' (x:xs) | x == c = Just (c, xs)
| otherwise = Nothing
_ = Nothing
oneof (c:cs) = choice (char c) (oneof cs)
oneof _ = nullParser
letter = oneof "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
main = print $ runParser letter "aasdd"
-- output: Just 'a'
```

Here we created a few more useful parsers, a parser that will parser any character of a string that is given to it, and a parser that parses letters of the alphabet. We defined these functions, but really they already describe another structure called a Monoid. A monoid has a function that takes two monoids. This function must be associative, and it must have some monoid that is a unit.

So for the `Parser`

type, we defined `nullParser`

and `choice`

. In this case `nullParser`

is the unit monoid. That is, it is the monoid where applying `choice`

on it and a different parser always gives the result of the different parser. So you can see that `nullParser`

definitely has that property. The last thing to check is that `choice`

is associative, or that

`(p1 `choice` p2) `choice` p3 = p1 `choice` (p2 `choice` p3)`

Since `choice`

acts like an or operator, we know it must be associative, just like the or operator. So the next step is to make `Parser`

a monadplus.

```
instance MonadPlus Parser where
mempty = nullParser
mappend = choice
```

With this, we can take advantage of all the the helper functions that can be used with monoids.

## Calculator Parser

We will finish this article by using our new found `Parser`

type to make a parser for a calculator language. So first, let’s map out what the syntax for the calculator is. We will make the calculator use a lisp like mathematical notation, where it will accept integers, and operations like multiplication, division, addition, and subtraction. An example program for this calculator language might be

`(* 1 (+ 1 2))`

Written in the more familiar mathematical notation, this is equivalent to

`(1 + 2) * 1`

### Integer Parser

So first lets parse integers. An integer will be one or more digit, and that’s it. So let’s define a parser for it.

```
many :: Parser a -> Parser [a]
many p = Parser $ many' []
where
many' xs cs = case parse p cs of
Nothing -> Just (xs, cs)
Just (x, cs') -> many' (x:xs) cs'
many1 :: Parser a -> Parser [a]
many1 p = Parser $ \cs -> listToMaybe . parse (many p) $ cs
where
listToMaybe Nothing = Nothing
listToMaybe (Just ([], _)) = Nothing
listToMaybe x = x
digit = oneof "0123456789"
integer = many1 digit
```

And with this, we’ve defined a couple more important functions for us, `many`

and `many1`

. The `many`

function trys to apply a parser as many times as it can, and dumps all the successful trys into a list The `many1`

function does the same, except gives `Nothing`

if that list is empty.

### Operator Parser

This is really simplistic if all we do is parse the characters

`operator = oneof "+*/-"`

### Putting it together

Now all that’s left is to put it together and make a parser for our calculator

```
operator = oneof "+*/-"
digit = oneof "0123456789"
integer = many1 digit
spaces = many $ char ' '
expression = expression' `mappend` integer
expression' = do
char '('
spaces
operator
spaces
expression
spaces
expression
char ')'
```

So this is the whole parser. It checks for parenthesis, and spaces, followed by an operator and two other expressions!

comments powered by Disqus