Nov 17

NLP with AIML - Part 1 - Stopwords

Folks learning about chatbots and AIML are likely to ask if AIML is a Natural Language Processing (NLP) implementation. The answer is no. However, can one use AIML to perform NLP functions? ABSOLUTELY! It takes some work and creative use of recursion and predicates, but it can be done.

This should be the first in a series of posts that describe how you can perform NLP tasks with AIML. In this post, we'll tackle a very simple concept: removing stopwords.

In NLP, there is the concept of a pre-processing step whereby very common (and potentially meaningless) words are removed. Examples of these words are: is, the, and, I, will, just, so, than... So an input of "this is a simple sentence" becomes "simple sentence."

In most programming languages, we can conceive of looping through and testing each word. AIML doesn't naturally have this feature (ok, technically in AIML 2.0, there is a tag called , but with the exception of the example in the standard which involves counting numbers, I can't see how to use this in other ways - please comment if you have good examples of ). But we can force AIML to loop using reductions... the tag. The part where we really need to get creative is how we stop the loop. In this case, I basically have a test for when we're at the end.

At this point, I think it's better to show than tell...

Here's a flowchart of the method (no criticizing my flowcharting skills... but please ask questions!):

Flowchart for stopword removal in AIML

And even better than that, you can see this in work in two ways:

1) If you have an account on the pandorabots playground ( playground.pandorabots.com ), search in the clubhouse for "NLP Test Bot" and you can chat directly. Here's a screen shot of a chat:
Chat window with NLP Test Bot

2) I have a zip file of 3 AIML files that you can play around with yourself. These are the same files as what is loaded in the NLP Test Bot on pandorabots playground. A couple things to know... the list of stopwords comes from python's nltk library. It seemed like a good list. However, given that in AIML, we normally expand contractions, half of this list could probably be removed. The other thing to know is that it's AIML v2.0. That's because I am using the and tags.

Download nlpinaiml.z7

2a) Could this be done with AIML v1.0? Yup. Before the and tags came into existence, I had a category that looked something like this:

<pattern>SPLITME * *</pattern>
<set name="firstword"><star index="1"/></set>
<set name="remainder"><star index="2"/></set>


There was also a second category: SPLITME * which, when both of these existed, would match if * only had one word.

Hopefully I'll get around to posting some of the next steps... like part of speech tagging (hint: I do a lot of the same stuff with reduction and setting predicates... the trick is to have a good base of words) and then what really matters: how you can use this to have an interesting chatbot!

Please comment/ask questions...