17
Nov 17

NLP with AIML - Part 1 - Stopwords

Folks learning about chatbots and AIML are likely to ask if AIML is a Natural Language Processing (NLP) implementation. The answer is no. However, can one use AIML to perform NLP functions? ABSOLUTELY! It takes some work and creative use of recursion and predicates, but it can be done.

This should be the first in a series of posts that describe how you can perform NLP tasks with AIML. In this post, we'll tackle a very simple concept: removing stopwords.

In NLP, there is the concept of a pre-processing step whereby very common (and potentially meaningless) words are removed. Examples of these words are: is, the, and, I, will, just, so, than... So an input of "this is a simple sentence" becomes "simple sentence."

In most programming languages, we can conceive of looping through and testing each word. AIML doesn't naturally have this feature (ok, technically in AIML 2.0, there is a tag called , but with the exception of the example in the standard which involves counting numbers, I can't see how to use this in other ways - please comment if you have good examples of ). But we can force AIML to loop using reductions... the tag. The part where we really need to get creative is how we stop the loop. In this case, I basically have a test for when we're at the end.

At this point, I think it's better to show than tell...

Here's a flowchart of the method (no criticizing my flowcharting skills... but please ask questions!):

Flowchart for stopword removal in AIML

And even better than that, you can see this in work in two ways:

1) If you have an account on the pandorabots playground ( playground.pandorabots.com ), search in the clubhouse for "NLP Test Bot" and you can chat directly. Here's a screen shot of a chat:
Chat window with NLP Test Bot

2) I have a zip file of 3 AIML files that you can play around with yourself. These are the same files as what is loaded in the NLP Test Bot on pandorabots playground. A couple things to know... the list of stopwords comes from python's nltk library. It seemed like a good list. However, given that in AIML, we normally expand contractions, half of this list could probably be removed. The other thing to know is that it's AIML v2.0. That's because I am using the and tags.

Download nlpinaiml.z7

2a) Could this be done with AIML v1.0? Yup. Before the and tags came into existence, I had a category that looked something like this:

<pattern>SPLITME * *</pattern>
<template>
<think>
<set name="firstword"><star index="1"/></set>
<set name="remainder"><star index="2"/></set>
</think>
</template>

 

There was also a second category: SPLITME * which, when both of these existed, would match if * only had one word.

Hopefully I'll get around to posting some of the next steps... like part of speech tagging (hint: I do a lot of the same stuff with reduction and setting predicates... the trick is to have a good base of words) and then what really matters: how you can use this to have an interesting chatbot!

Please comment/ask questions...

26
Aug 14

Musings on Ray Kurzweil, Moore's Law, and the not-so-far-off future

I write science fiction. I was with my critique group the other night and one of the gentlemen critiquing my work was concerned with the date I chose for the setting of one of my stories. It was not just the date, but the date combined with the fact that the technology I was positing didn't seem advanced enough.

He cited Ray Kurzweil and Kurzweil's predictions about the integration of non-biological intelligence with human intelligence and was emphatic about it enough so that I decided I needed to re-listen to Kurzweil's TEDtalks.

I did that today.

I like Kurzweil. I really do. But... I think he goes a little too far. At the center of his talks is Moore's Law. Kurzweil emphatically (and correctly) points out how well Moore's law has proven to be true over the years and therefore, it will continue to be true. He also likes to apply the concept of exponential growth to anything digital. Sure. No issue there.

But he makes some unfair and inconsistent extrapolations when he mentions intelligence and our ability to understand intelligence over the coming decades. Just because we are collecting data at an exponential rate and digitizing data at an exponential rate does not imply we are UNDERSTANDING anything, especially intelligence, at anything close to that rate.

While we might have access to an exponentially larger quantity of data than we did in the recent past, while we might be able to compute exponentially faster... we are not exponentially more intelligent.

And we are not going to magically understand intelligence in the coming decades solely based on the rate that our technology is expanding.

Please... don't confuse my assertion that we won't understand intelligence with not understanding brain function. Certainly over the last several decades we've learned a lot about biology and how the brain WORKS. But that's not intelligence. Not by a long shot.

I'm in Jeff Hawkins camp when he writes in his fantastic book "On Intelligence" that we don't yet have a framework for understanding the brain (in terms of intelligence, not biology... different things) and until we do, we won't be making the fantastic leaps in technology that Kurzweil predicts.

(side note: another post that I hope to get out soon will be on Jeff Hawkin's book and how it makes a fantastic case for the AIML)

10
Jan 14

Always use protection...

...for your passwords, of course! (what did you think I meant?)

Twenty years ago when I was in college (OMG... did I really just type that?) I had three passwords to remember. Two for two different email systems I had access to and one for my dorm room voice mail. That was it. There was no online anything. My bank didn't dole out ATM cards till a few years later.

Well, times have changed. Sitting on my computer here, I have a spreadsheet that has 113 passwords. And that doesn't count the half a dozen passwords and pin numbers I have to remember and use on a daily basis at my day job. I'm sure it also doesn't include a handful of "throw away" passwords... you know those websites that require to you sign in and create an account even though you'll only ever be there that one time? Yeah, there must be a couple dozen or so of those with my name on it.

Everything is fine and dandy when I'm home. But when I travel, I have some issues. You see... I keep my passwords in a spreadsheet (a password protected spreadsheet, of course) - I think I mentioned that.

I'm sometimes a little old school with my computing habits. I haven't fully adjusted to having all my sensitive data in "the cloud." Yes, there are some supposedly really good password keepers out there. But I've never really been comfortable with them and their ability to keep my data in sync between my desktop and other devices (smartphone, laptop, and now tablet). I'm not saying that they aren't safe and secure, I'm only saying that I've never felt comfortable. So I keep my spreadsheet.

Which is a problem when I travel. If I was traveling with my laptop, I'd still be fine. The laptop has two passwords to get on and in and the spreadsheet is hidden and itself has a password. But these days, if I'm traveling it's for vacation and I rarely need to travel with my laptop since I have that smartphone and tablet. And I'm not comfortable with the Excel readers (and not even sure if they do the password thing).

And while I try to do things like pay my bills before I leave town... I never know when I'm going to need one of those passwords that I don't use too often.

You see where this is going? I needed another way to be comfortable with written passwords.

Enter my little java app, the Password Obfuscator.

I figured that if I modified my password in a way that only made sense to me, I'd feel comfortable writing it down and comfortable saving it in some cloud-enabled online app that syncs to a website, my phone, etc (like Evernote. I love Evernote for non-sensitive things). That way, if someone got hold of my password(s), by the time they figured out my scheme (if they figured out my scheme), I'd already have received some email that my account was locked out and voila!

For me, the best way to obfuscate a password was to look at the keyboard, and shift all the letters and numbers by some number to the right. So, if my password was "dog" and my shift number was 2 (on the qwerty keyboard) the password I'd write down would be "gqj". (note the rollover from "p" to "q") The chance that anyone would get "dog" from seeing that well... it's not impossible... but small enough for me to be comfortable writing some things down.

Anyone could do this with a password or two.  But 113? Yeah, it was much easier to write the little java app that could do it for me.

The app is here:  http://riotsw.com/passwordObfuscator.html

It should be reasonably straightforward. Choose how many characters you want to move your passwords by. Decide if you want to change case. Decide if you want to move your numbers and characters, too. Then copy your passwords into the input, press the "Do it!" button and viola... examine your output.  Here's what it looks like:

Picture1

You can further obfuscate things by not tying them directly to their site. For example, if "test" was the password for your bank... well, you're likely not in danger of forgetting what bank is yours, right? So wherever you keep the changed password, just label it "my bank." Someone who finds that isn't going to try the couple hundred banks in the world.

Then when you need to use the password, you only need to remember how you shifted it. I presume that anyone looking at a keyboard can reverse the process to get their password (and since it's one at a time, it shouldn't be that cumbersome).

A note about passwords... Good passwords are long passwords. Long doesn't necessarily mean complicated or difficult to remember. myMymyMymyMySharona is a better password than My.1!Shar0na#.

The best post and description of why this is true is here: https://www.grc.com/haystack.htm

I would love to muse on passwords some more... but it's time to watch Big Bang Theory.

Tags: