ask @user1 @user2 every day evening Please fill timesheets for today by evening?
ask #general every week friday What was one thing you learned last week?
ask #general every month first Submit your expense report by end of day.
ask #team by 18h Confirm once you are done with your changes for code-freeze
ask #marketing by friday Read this book - link-to-book
ask @user1 @user2 by today Please review this blog I wrote - link-to-blog
Why Not NLP?
We actually started with trying to write the command line like we are used to on linux terminal and evaluated several libraries for that click, docopt, Argparse and OptParse. However, it turned out to be too hard to use for non-tech users. Our HR wouldn’t have used it and we wanted them to!
Then we then went all the way around and ran a trial for pure natural language parsing using wit.ai and api.ai. It was too verbose and ambiguous. Our audience would use this many times a day and can learn specific syntax instead of having to explain themselves to NLP.
The solution lied someone between pure NLP and linux shell like command line.
Designing Context Free Grammar
To define a Context Free Grammar (CFG), you need a set of generative rules for any possible valid sentence in your language. We used the excellent PyParsing library for writing the grammar. Here is our grammar.py
Quick highlights on how the grammar was written.
- Define lowest level tokens as Group e.g.
- Define combinations of one or more tokens e.g
- Define sentences
- Define parse actions. There are methods to parse a token and return python value e.g.
tomorrowis parsed to datetime,
2dto timedelta and so on.
That completes the grammar to generate all sentences of the library. Pyparsing magic can use also use it to parse a string. Not really magic if you remembers Compilers 101, but here it is anyway.
Thats our command with all arguments! We can write a simple python method to execute it.
Thats it. Our command DSL and executor are ready!
When parsing fails, users want to know exactly why and where. This can be done as follows.
Error Stops These are
- signs instead of
+ sign in the grammar above. Error stops prevent back-tracking at those breakpoints. If everything before the error stop matches, parser assumes that match as final and wouldn’t try another alternative even if rest of the string fails to match. That helps because you know which sentence failed, instead of an ambiguous couldn’t match error. That said, error stops make the grammar more limiting, so use them with care.
Token Parsing is easier. Just make sure parse actions throw the right exception with the right message which can be shown to the user.
Generic Tokens Tokens should be as generic as possible. In the example grammar, we have defined time of day as morning, evening etc. Instead we could have defined it as a string and manage error handling in parse action. That would allow mispellings to be corrected or reported in the parser e.g. morn instead of morning.
Thats it. CFGs and pyparsing is a powerful way to quickly build expressive DSLs. You define a generative grammar and use pyparsing to parse command strings into method names and arguments. With all the excitement around bots, I expect using it more often (at least until NLP becomes what its supposed to be).