13.04.2017       Выпуск 173 (10.04.2017 - 16.04.2017)       Статьи

Делаем Slack Bot CLI используя Context Free Grammar и Pyparsing

Читать>>



Экспериментальная функция:

Ниже вы видите текст статьи по ссылке. По нему можно быстро понять ссылка достойна прочтения или нет

Просим обратить внимание, что текст по ссылке и здесь может не совпадать.

At HashedIn, we use Slack for our internal communication. We wanted to build a Slack Bot to automate follow-ups. Here are some example commands we wanted.

ask @user1 @user2 every day evening
Please fill timesheets for today by evening?
ask #general every week friday
What was one thing you learned last week?
ask #general every month first
Submit your expense report by end of day.
ask #team by 18h
Confirm once you are done with your changes for code-freeze
ask #marketing by friday 
Read this book - link-to-book
ask @user1 @user2 by today 
Please review this blog I wrote - link-to-blog

Why Not NLP?

We actually started with trying to write the command line like we are used to on linux terminal and evaluated several libraries for that click, docopt, Argparse and OptParse. However, it turned out to be too hard to use for non-tech users. Our HR wouldn’t have used it and we wanted them to!

Then we then went all the way around and ran a trial for pure natural language parsing using wit.ai and api.ai. It was too verbose and ambiguous. Our audience would use this many times a day and can learn specific syntax instead of having to explain themselves to NLP.

The solution lied someone between pure NLP and linux shell like command line.

Designing Context Free Grammar

Our first proof of concepts started with regular expressions. That soon became too complicated. It was time to write our own small DSL and Context Free Grammar for parsing.

To define a Context Free Grammar (CFG), you need a set of generative rules for any possible valid sentence in your language. We used the excellent PyParsing library for writing the grammar. Here is our grammar.py

# grammar.py:

from pyparsing import Word, Group, ZeroOrMore, nums, alphanums, OneOrMore, CaselessLiteral, FollowedBy, Regex, Optional


USER = Group(Regex(r"@[A-Z0-9a-z_]+")).setParseAction(parse_username)
CHANNEL = Group(Regex(r"#[A-Z0-9a-z_-]+")).setParseAction(parse_user_group)

DAYS = Group(CaselessLiteral('monday') | 'mon'
             | 'tuesday' | 'tue'
             | 'wednesday' | 'wed'
             | 'thursday' | 'thur'
             | 'friday' | 'fri'
             | 'saturday' | 'sat'
             | 'sunday' | 'sun'
             ).setParseAction(lambda tokens: date_parse(tokens[0][0]))

RELATIVE_DAY = Group(CaselessLiteral('today')
                     | 'tomorrow'
                     | 'yesterday'
                     | 'day-after-tomorrow'
                     ).setParseAction(lambda tokens: date_parse(tokens[0][0]))

RELATIVE_TIME = Group(Regex(r"\d+[mdh]")
                      ).setParseAction(lambda tokens: interval_parse(str(tokens[0][0])))

SPECIAL_CHARACTERS = "~`!@#$%^&*()'?"
MULTILINE_TEXT = Group(Regex(r".*")).setParseAction(lambda tokens: str(tokens[0][0]))

TIME_OF_DAY = (Word(nums, max=2) + FollowedBy(' ')
               | Word(nums, max=2) + ':' + Word(nums, exact=2))

TIME_OF_DAY_LABEL = (CaselessLiteral('morning')
                     | 'evening'
                     | 'afternoon'
                     | 'night'
                     )

INTERVAL_RELATIVE_TIME = Group(Regex(r"\d+[mdh]")).setParseAction(
                           lambda tokens: repeated_interval_parse(str(tokens[0][0])))

INTERVAL_DAYS = Group(CaselessLiteral('monday') | 'mon'
                      | 'tuesday' | 'tue'
                      | 'wednesday' | 'wed'
                      | 'thursday' | 'thur'
                      | 'friday' | 'fri'
                      | 'saturday' | 'sat'
                      | 'sunday' | 'sun'
                      ).setParseAction(lambda tokens: repeated_date_parse(tokens[0][0]))

INTERVAL_DAY = Group(CaselessLiteral('day') | 'week').setParseAction(lambda tokens: repeated_day_parse(tokens[0][0]))

TIME_OF_DAY = TIME_OF_DAY_LABEL | TIME_OF_DAY
TIME_ETA = DAYS | RELATIVE_DAY | RELATIVE_TIME | TIME_OF_DAY
TIME_INTERVAL = INTERVAL_RELATIVE_TIME | INTERVAL_DAYS | INTERVAL_DAY
USERS = Group(OneOrMore(USER))
CHANNELS = Group(OneOrMore(CHANNEL))

"""
Ask Command Formats
ask by tomorrow @user1 @user2 What are you up to?
ask every day evening @user1 Please update the time sheet and let me know.
"""
COMMAND_ASK_DEADLINE = (CaselessLiteral('ask')('command') +
                        ZeroOrMore(USERS('users')) +
                        ZeroOrMore(CHANNELS('channels')) +
                        CaselessLiteral('by')('type') -
                        TIME_ETA('eta') +
                        ZeroOrMore(TIME_OF_DAY)('time_of_day') +
                        MULTILINE_TEXT('question'))

COMMAND_ASK_REPEAT = (CaselessLiteral('ask')('command') +
                      ZeroOrMore(USERS('users')) +
                      ZeroOrMore(CHANNELS('channels')) +
                      CaselessLiteral('every')('type') -
                      TIME_INTERVAL('interval') +
                      Optional(TIME_OF_DAY)('time') +
                      MULTILINE_TEXT('question'))

COMMAND_ASK = COMMAND_ASK_REPEAT | COMMAND_ASK_DEADLINE

COMMAND_LINE = COMMAND_ASK

Quick highlights on how the grammar was written.

  1. Define lowest level tokens as Group e.g. USER, CHANNEL, DAYS, RELATIVE_DAY, INTERVAL_DAYS etc.
  2. Define combinations of one or more tokens e.g TIME_OF_DAY, TIME_ETA, TIME_INTERVAL, USERS, CHANNELS etc.
  3. Define sentences COMMAND_ASK_DEADLINE, COMMAND_ASK_REPEAT, COMMAND_LINE
  4. Define parse actions. There are methods to parse a token and return python value e.g. tomorrow is parsed to datetime, 2d to timedelta and so on.

That completes the grammar to generate all sentences of the library. Pyparsing magic can use also use it to parse a string. Not really magic if you remembers Compilers 101, but here it is anyway.

>>> command_str = "ask by tomorrow @user_id Question text goes here"
>>> print(json.dumps(COMMAND_LINE.parseString(command_str).asDict()))
{
  "command" : "ask",
  "users": ["user_id"],
  "type": "by",
  "eta": datetime(2016,01,01) # next day’s date
  "question": "Question text goes here"
}

Thats our command with all arguments! We can write a simple python method to execute it.

def ask(type, users=[], channels=[], eta=datetime.now, question):
   # setup a reminder for that user(s) at the right times

Thats it. Our command DSL and executor are ready!

Error handling

When parsing fails, users want to know exactly why and where. This can be done as follows.

Error Stops These are - signs instead of + sign in the grammar above. Error stops prevent back-tracking at those breakpoints. If everything before the error stop matches, parser assumes that match as final and wouldn’t try another alternative even if rest of the string fails to match. That helps because you know which sentence failed, instead of an ambiguous couldn’t match error. That said, error stops make the grammar more limiting, so use them with care.

Token Parsing is easier. Just make sure parse actions throw the right exception with the right message which can be shown to the user.

Generic Tokens Tokens should be as generic as possible. In the example grammar, we have defined time of day as morning, evening etc. Instead we could have defined it as a string and manage error handling in parse action. That would allow mispellings to be corrected or reported in the parser e.g. morn instead of morning.

Summary

Thats it. CFGs and pyparsing is a powerful way to quickly build expressive DSLs. You define a generative grammar and use pyparsing to parse command strings into method names and arguments. With all the excitement around bots, I expect using it more often (at least until NLP becomes what its supposed to be).



Лучшая Python рассылка

Нас поддерживает


Python Software Foundation



Разместим вашу рекламу

Пиши: mail@pythondigest.ru

Нашли опечатку?

Выделите фрагмент и отправьте нажатием Ctrl+Enter.

Система Orphus