Online I found this very interesting paper on creating a monadic parser from scratch. I thought it would make for a fun project to build my own similar but slightly different version of a parser following this paper, so this next series of posts will deal with creating a parser entirely from scratch. You can find the code for these posts here.
Just a warning, this is for exercise only. I wouldn’t recommend using this to create your own language, there are better parser libraries like Parsec to use instead.
Defining a Data Type
To build a parser, we have to think about a few things first. First, let’s solidify what a parser is. For this article, we will think of a parser as a function that takes a string as input, and possibly outputs something useful. Since the purpose of this article is to create parsers we can combine to produce different parsers, we also should think of the parser returning any string it did not parse. Also, there might be a case where the string can not be parsed, so we must allow for that possibility in our library With that idea, we define a
Parser data type
newtype Parser a = Parser ( String -> Maybe (a, String) )
A Simple Parser
So with this data type, let’s create a simple parser that parses a single character. For this parser, we will not care what character is parsed. As long as it can parse a character, this parser will return one.
eat :: Parser Char eat = Parser eat' where eat' (x:xs) = Just (x,xs) eat' _ = Nothing
To use this function, let’s define some helper functions that will extract the parser function from the parser
parse :: Parser a -> (String -> Maybe (a, String)) parse (Parser p) = p extract :: Maybe (a, String) -> Maybe a extract = fmap fst runParser :: (Parser a) -> String -> (Maybe a) runParser p s = extract $ (parse p) s
So now lets use the
eat Parser we made
main = print $ runParser eat "omnomnom" -- output: Just 'o'
Making a Monad of the Data Type
Earlier we said we want to combine parsers to create new parsers. To do that, we need to create a function that will combine the parsers in a way that makes sense to us.
So let’s try to create that function.
combine :: Parser a -> Parser b -> Parser b combine (Parser p1) (Parser p2) = Parser combine' where combine' = join . fmap (p2 . snd) . p1
There we go! With this function, we call the first parser, then take the leftover string and put it into the second parser. After that, we return the contents of the second parser. Now we can do this.
main = print $ runParser (eat `combine` eat) "omnomnom" -- output: Just 'm'
We have this
combine function, but let’s go further than this. Let’s try to use existing structure to help define the possibilities of our parser. To do this, we will define
Parser as a monad.
import Control.Monad instance Monad Parser where return x = Parser $ \cs -> Just (x, cs) (>>=) p f = Parser $ join . fmap applyf . parse p where applyf (v, ys) = parse (f v) ys
return makes a parser that does not parse a value, and passes the parameter of
return to the parser. The bind function takes a parser, passes what that parser leaves unparsed to a function. This function then takes the value of the parser and uses that as an argument to return a parser. So in a way, this is what the
combine function does, but instead uses a function that returns a parser.
Verifying the Data Type is a Monad
There are three laws a monad has to follow, so let’s check that the
Parser type follows these laws.
The left identity law states that given a value
x of type
a and a function that takes a value of type
a and returns a
return x >>= p = p x
So for our parser, return was defined to place the
x value into a null
Parser, or a Parser that does not parse the string. The bind function takes that value
x and applies it to the parser
p. Thus, we can see that this law holds for how we defined
return and the bind function.
The right identity law states that given a parser
p >>= return = p
So the bind function takes the value out of the parser
p, passes it to
return function just re-wraps that into the
Parser type, so the right identity law holds.
The associativity law states that given a parser
p and two functions that return parsers
(p >>= f) >>= g = p >>= (\x -> f x >>= g)
So for our parser, we know that parser
p passes its value and leftover string to the function
f and the parser it produces. This step is done again between the parser
f creates and the function
g. The right side of the equality above is stating this more explicitly, but it does the same thing as the before.
This means that the
Parser type we created is a monad under our definition of the bind function and
Adding onto the Monad Structure
While the monad structure makes combining parsers easier, there are a few more things we can do to make the
Parser type easier to use. One way we can do this is have a function that takes in two parser. The function will try to apply a parser, and if it fails (returns
Nothing) then return whatever is in the other parser. Put more simply, lets have an ‘or’ operator for our
Parser. On top of that, let’s create the concept of a null
Parser we described earlier
nullParser = Parser $ const Nothing choice p1 p2 = Parser $ \cs -> case parse p1 cs of Nothing -> parser p2 cs x -> x
nullParser function is a
Parser that returns
Nothing no matter the input. The
choice function gives us the ability to try multiple different parsers before saying we failed. Let’s give it a try.
char c = Parser char' where char' (x:xs) | x == c = Just (c, xs) | otherwise = Nothing _ = Nothing oneof (c:cs) = choice (char c) (oneof cs) oneof _ = nullParser letter = oneof "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" main = print $ runParser letter "aasdd" -- output: Just 'a'
Here we created a few more useful parsers, a parser that will parser any character of a string that is given to it, and a parser that parses letters of the alphabet. We defined these functions, but really they already describe another structure called a Monoid. A monoid has a function that takes two monoids. This function must be associative, and it must have some monoid that is a unit.
So for the
Parser type, we defined
choice. In this case
nullParser is the unit monoid. That is, it is the monoid where applying
choice on it and a different parser always gives the result of the different parser. So you can see that
nullParser definitely has that property. The last thing to check is that
choice is associative, or that
(p1 `choice` p2) `choice` p3 = p1 `choice` (p2 `choice` p3)
choice acts like an or operator, we know it must be associative, just like the or operator. So the next step is to make
Parser a monadplus.
instance MonadPlus Parser where mempty = nullParser mappend = choice
With this, we can take advantage of all the the helper functions that can be used with monoids.
We will finish this article by using our new found
Parser type to make a parser for a calculator language. So first, let’s map out what the syntax for the calculator is. We will make the calculator use a lisp like mathematical notation, where it will accept integers, and operations like multiplication, division, addition, and subtraction. An example program for this calculator language might be
(* 1 (+ 1 2))
Written in the more familiar mathematical notation, this is equivalent to
(1 + 2) * 1
So first lets parse integers. An integer will be one or more digit, and that’s it. So let’s define a parser for it.
many :: Parser a -> Parser [a] many p = Parser $ many'  where many' xs cs = case parse p cs of Nothing -> Just (xs, cs) Just (x, cs') -> many' (x:xs) cs' many1 :: Parser a -> Parser [a] many1 p = Parser $ \cs -> listToMaybe . parse (many p) $ cs where listToMaybe Nothing = Nothing listToMaybe (Just (, _)) = Nothing listToMaybe x = x digit = oneof "0123456789" integer = many1 digit
And with this, we’ve defined a couple more important functions for us,
many function trys to apply a parser as many times as it can, and dumps all the successful trys into a list The
many1 function does the same, except gives
Nothing if that list is empty.
This is really simplistic if all we do is parse the characters
operator = oneof "+*/-"
Putting it together
Now all that’s left is to put it together and make a parser for our calculator
operator = oneof "+*/-" digit = oneof "0123456789" integer = many1 digit spaces = many $ char ' ' expression = expression' `mappend` integer expression' = do char '(' spaces operator spaces expression spaces expression char ')'
So this is the whole parser. It checks for parenthesis, and spaces, followed by an operator and two other expressions!comments powered by Disqus