Enhancing AIML Bots using Semantic Web Technologies

Eric Freese
eric (dot) freese (at) lexisnexis (dot) com

Abstract

AIML [Artificial Intelligence Markup Language] is a derivative of XML [Extensible Markup Language] that enables pattern-based, stimulus-response knowledge content to be served, received and processed on the Web and offline in the manner that is presently possible with HTML [Hypertext Markup Language] and XML. AIML was designed for ease of implementation, ease of use by newcomers, and for interoperability with XML and XML derivatives such as XHTML. Software reads the AIML objects and provides application-level functionality based on their structure. The AIML interpreter is part of a larger application generically known as a bot, which carries the larger functional set of interaction based on AIML. A software module called a responder handles the human-to-bot or bot-to-bot interface work between an AIML interpreter and its object(s).

RDF [Resource Description Framework] is a language for representing information, specifically metadata, about resources in the World Wide Web. The underlying structure of any RDF expression is a collection of triples. Each triple represents a statement of a relationship between the things that it links. Each triple has three parts: a subject, an object, and a predicate (also called a property) that denotes the relationship. In many cases the triples can be used to form simple human understandable sentences.

This paper discusses a methodology and provides examples of the conversion of RDF triples to AIML topics and categories which can then be used within an AIML-based bot. The statements representing the domain knowledge can then be used in a conversation handled by the responder. The combination of these two technologies allows the knowledge represented within the RDF to be accessed interactively using natural language by a human user.

Keywords: RDF; Semantic Web

Eric Freese

Eric Freese is a consulting software engineer with LexisNexis. He has nearly 20 years of experience in the areas of document, information, and knowledge management with specific expertise in the development and implementation of XML technologies. His experience includes research, analysis, specification, design, development, testing, implementation, integration and management of information systems in a wide range of environments. He has significant research experience in human interface design, graphics interface development and artificial intelligence. Freese was a founding member of TopicMaps.Org, the organization that developed the XTM [XML Topic Maps] specification, and served as the chairman of this group. He continues to strive to build the Star Trek computer so his mother will finally understand what he does for a living.

Enhancing AIML Bots using Semantic Web Technologies

Eric Freese [LexisNexis]

Extreme Markup Languages 2007® (Montréal, Québec)

Copyright © 2007 Eric Freese. Reproduced with permission.

Introduction

RDF has been described as the lingua franca for the semantic web. Encoding data in RDF allows computers to do browsing, searching, querying, etc. for human users. These computers will be able to seek out knowledge distributed throughout the web, mesh it together and do something useful with it. RDF can be encoded in several different syntaxes including XML, N3 and Turtle to name a few. While some of these are more human readable than others, a reasonable person would not ask an 8-year-old to read the data and be able to understand it.

Imagine being able to interact with a computer that contains the knowledge stored in RDF files using natural human language. A simple conversation could allow the computer to provide a user with exactly the information that he is looking for without having to search through a multitude of search hits. Parents of young children are told to "Answer the question being asked". This guidance tells us to provide simple answers to simple questions and not a full dissertation on a particular subject. Quite often, the simple answer is the desired result. If more information is required, subsequent questions will follow, to be sure (and hopefully not all of them are "Why?").

The current technology on the web feels more like trying to take a sip from fire hose, than from a cup. Most searches on popular search engines will yield thousands of hits unless you feel lucky. In fact, the trend seems to favor a large number of hits rather than a limited number of high-quality matches. The goal of the project described in this paper is to allow users to access large amounts of knowledge by interacting with a system using natural language. Information is delivered in small chunks. The user can receive additional information by requesting more detail or by asking further questions.

ALICE - A Quick Introduction

ALICE [Artificial Linguistic Internet Computer Entity]was developed in the late 1990s by Dr. Richard Wallace. The goal of the project was to bridge the divide between human and computer interaction. The architecture is minimalist. The core program itself is rather compact, but allows for the data to be as large or as small as necessary for a given task.

When you chat with ALICE you will find that she likes to think of herself as a sentient entity, claims to know a lot of gossip and expresses a fondness for Dr. Wallace. For his work on ALICE, Wallace was awarded the Loebner prize in 2000, 2001, and 2004. Based on the Turing Test, the prize is awarded to the most "human" computer program. The Turing Test is a proposal for a test of a machine's capability to demonstrate thought. Described by Professor Alan Turing in the 1950 paper "Computing machinery and intelligence," it proceeds as follows: a human judge engages in a natural language conversation with two other parties, one a human and the other a machine; if the judge cannot reliably tell which is which, then the machine is said to pass the test. It is assumed that both the human and the machine try to appear human. The winner of the annual contest is the best entry relative to other entries that year, irrespective of how good it is in an absolute sense. However, even the casual user will often expose a computer's mechanistic aspects in short conversations.[TT]

Those familiar with early attempts at communication with machine intelligence may remember a system called ELIZA. ELIZA was a kind of computerized psychiatrist written in the 1960s by Professor Joseph Weizenbaum at the Massachusetts Institute of Technology. ELIZA was an exercise in human response and natural language communication with so-called machine intelligence. There are still versions of ELIZA in existence today. Is ALICE just ELIZA with angle brackets? Conceptually ALICE is not much more complicated that ELIZA. The main difference is the ability to define customizable AIML files that can be added to ALICE making her more extensible and customizable than ELIZA was. Does this minimize the potential use of ALICE in real-world applications? I think not and hope to demonstrate that in this paper.

AIML - A Primer

AIML describes a class of data objects called AIML objects and partially describes the behavior of computer programs that process them. AIML is a derivative of XML.

AIML objects are made up of units called topics and categories, which contain either parsed or unparsed data. Parsed data is made up of characters, some of which form character data, and some of which form AIML elements. AIML elements encapsulate the stimulus-response knowledge contained in the document. Character data within these elements is sometimes parsed by an AIML interpreter, and sometimes left unparsed for later processing by a responder.

Categories

The basic unit of knowledge in AIML is called a category. Each category consists of an input question, an output answer, and an optional context. The question, or stimulus, is called the pattern. The answer, or response, is called the template. There are two optional methods for defining context using the <that> and <topic> markup. The <that> tag appears inside a category, and its pattern must match the robot's last utterance. Remembering one last utterance is important if the robot asks a question. The <topic> tag appears outside the category, and collects a group of categories together. The topic may be set inside any template. The use of these will be shown in more detail below.

The pattern language within AIML is simple, consisting only of words, spaces, and the wildcard symbols "_" and "*". The words may consist of letters and numerals, but no other characters. The pattern language is case insensitive. Words within the pattern are separated by a single space, and the wildcard characters function like words.

The template language is also designed to represent the response as simply as possible for the task at hand. In its simplest form, the template consists of only plain, unmarked text. More generally, AIML tags allow the reply to save data, activate other programs, give conditional responses, and recursively call the pattern matcher to insert the responses from other categories. Most AIML tags belong to this template side sublanguage.

AIML supports two ways to interface other languages and systems. The <system> tag executes any program accessible as an operating system shell command, and inserts the results in the reply. The <javascript> tag allows arbitrary scripting inside the templates.

AIML processing is similar to querying a simple database of questions and answers. However, the pattern matching "query" language is much simpler than something like SQL. A category template may contain the recursive <srai> tag, so that the output depends not only on one matched category, but also any others recursively reached through <srai>.

Recursion

AIML implements recursion with the <srai> operator. No agreement exists about the meaning of the acronym. The "AI" stands for artificial intelligence, but "SR" may mean "stimulus-response," "syntactic rewrite," "symbolic reduction," "simple recursion," or "synonym resolution." The disagreement over the acronym reflects the variety of applications for <srai> in AIML. Each of these is described in more detail in a subsection below:

  1. Symbolic Reduction: Reduce complex grammatic forms to simpler ones.
  2. Divide and Conquer: Split an input into two or more subparts, and combine the responses to each.
  3. Synonyms: Map different ways of saying the same thing to the same reply.
  4. Spelling or grammar corrections.
  5. Detecting keywords anywhere in the input.
  6. Conditionals: Certain forms of branching may be implemented with <srai>.
  7. Any combination of (1)-(6).

The danger of <srai> is that it could permit the creation infinite loops. Although there are some risks, the <srai> tag is much simpler than any of the iterative block structured control tags which might have replaced it.

Symbolic Reduction

Symbolic reduction refers to the process of simplifying complex grammatical forms into simpler ones. Usually, the atomic patterns in categories storing bot knowledge are stated in the simplest possible terms, for example we tend to prefer patterns like "WHO IS SOCRATES" to ones like "DO YOU KNOW WHO SOCRATES IS" when storing biographical information about Socrates.

Many of the more complex forms reduce to simpler forms using AIML categories designed for symbolic reduction:

<category>
  <pattern>DO YOU KNOW WHO * IS</pattern>
  <template>
    <srai>WHO IS <star/></srai>
  </template>
</category>

Whatever input matched this pattern, the portion bound to the wildcard "*" may be inserted into the reply with the markup <star/>. This category reduces any input of the form "Do you know who X is?" to "Who is X?"

Divide and Conquer

Many individual sentences may be reduced to two or more subsentences, and the reply formed by combining the replies to each. A sentence beginning with the word "Yes" for example, if it has more than one word, may be treated as the subsentence "Yes." plus whatever follows it.

<category>
  <pattern>YES *</pattern>
  <template>
    <srai>YES</srai> <sr/>
  </template>
</category>

The markup <sr/> is simply an abbreviation for <srai><star/></srai>. The net effect is that the response will be the combination of 2 separate responses based on how the stimulus is divided. This may have unexpected results, or in many cases produce correct results that the botmaster might never have taken into account. For example:

Client: I bet you are gay.
ALICE: Actually I am not the gambling type.  Actually as a machine
  I have no need for sex.

In this case ALICE linked two different categories which both coincidentally have a moral theme. But this specific combination was not "preprogrammed" in the AIML files. Is this artificial intelligence or coincidence? It is possible to argue both sides but as much, if not more, knowledge is gained by unexpected results as it is from an original hypothesis.

Synonyms

The AIML standard does not permit more than one pattern per category. Synonyms are perhaps the most common application of <srai>. Many ways to say the same thing can be reduced to a single base category, which contains the reply:

<category>
  <pattern>HELLO</pattern>
  <template>Hi there!</template>
</category>

<category>
  <pattern>HI</pattern>
  <template>
    <srai>HELLO</srai>
  </template>
</category>

<category>
  <pattern>HI THERE</pattern>
  <template>
    <srai>HELLO</srai>
  </template>
</category>

<category>
  <pattern>HOWDY</pattern>
  <template>
    <srai>HELLO</srai>
  </template>
</category>

<category>
  <pattern>HOLA</pattern>
  <template>
    <srai>HELLO</srai>
  </template>
</category>

Spelling and Grammar Correction

The single most common client spelling mistake is the use of "your" when "you're" or "you are" is intended. Not every occurrence of "your," however, should be turned into "you're." A small amount of grammatical context is usually necessary to catch this error:

<category>
  <pattern>YOUR A *</pattern>
  <template>I think you mean "you're" or "you are" not "your."
    <srai>YOU ARE A <star/></srai>
  </template>
</category>

Here the bot both corrects the client input and acts as a language tutor.

Keywords

Frequently it is useful to write an AIML template which is activated by the appearance of a keyword anywhere in the input sentence. The general format of four AIML categories is illustrated by this example borrowed from ELIZA:

<category>
  <pattern>MOTHER</pattern>
  <template> Tell me more about your family. </template>
</category>

<category>
  <pattern>_ MOTHER</pattern>
  <template>
    <srai>MOTHER</srai>
  </template>
</category>

<category>
  <pattern>MOTHER _</pattern>
  <template>
    <srai>MOTHER</srai>
  </template>
</category>

<category>
  <pattern>_ MOTHER *</pattern>
  <template>
    <srai>MOTHER</srai>
  </template>
</category>

The first category both detects the keyword when it appears by itself, and provides the generic response. The second category detects the keyword as the suffix of a sentence. The third detects it as the prefix of an input sentence, and finally the last category detects the keyword anywhere within the sentence. Each of the last three categories uses <srai> to link to the first, so that all four cases produce the same reply, but it needs to be written and stored only once.

Conditionals

It is possible to write conditional branches in AIML, using only the <srai> tag. Consider three categories:

<category>
  <pattern>WHO IS HE</pattern>
  <template>
    <srai>WHOISHE <get name="he"/></srai>
  </template>
</category>

<category>
  <pattern>WHOISHE *</pattern>
  <template>He is <get name="he"/>.</template>
</category>

<category>
  <pattern>WHOISHE UNKNOWN</pattern>
  <template>I don't know who he is.</template>
</category>

Provided that the predicate "he" is initialized to "Unknown," the categories execute a conditional branch depending on whether "he" has been set. As a convenience to the botmaster, AIML also provides the equivalent function through the <condition> tag.

Targeting

Bot personalities are created and shaped through a cyclical process of supervised learning called targeting. Targeting is a cycle incorporating user, bot, and botmaster. Whenever the user inputs something that finds no complete match among the categories, the statement is logged by the bot and delivered as a target to the botmaster. The botmaster then creates suitable responses, starting with the most common queries. This cycle produces a progressively more refined bot personality.

The art of AIML writing is most apparent in creating default categories, which provide noncommittal replies to a wide range of inputs. This example allows the user to train the bot on the fly:

User: Where do I find the web page with Australopithecus?
ALICE: Where do you want to find it?
User: Bad Answer Alice, time to Learn a new Line
ALICE: How would you answer "Where do I find the web page with
  Australopithecus?"
User: I would say: "http://www.ancientskulls.net/australopithecus.html"
ALICE: Thanks, human. Try asking me again.
User: Where do I find the web page with Australopithecus?
ALICE: http://www.ancientskulls.net/australopithecus.html

This is exactly the same as targeting when the botmaster is the teacher. More accurately, it is an AIML interface to targeting. This scenario raises many problems, such as how to weed out hackers and other garbage givers, which might mean only turning on the Learn function while in a restricted sandbox, though it is the free labor of many chatters on the web that could help the bot grow wise more quickly.

That is the crux of the learning problem in a nutshell. It all comes down to editing. Either the botmaster edits good quality replies to begin with, or if the bot is "open to the public" then the teacher has to edit out all the unsatisfactory answers supplied by all that "free" labor. Based on the experience of many AIML bot developers, compared with a hypothetical learning machine capable of learning language like a child, the AIML targeting approach appears to be more efficient.

Context

The element <that> in AIML refers to the robot's previous utterance. Specifically, if the robot responds with a multiple sentence paragraph, the value of <that> is set to the last sentence in the sequence. The choice of the keyword "that" is motivated by its use in ordinary language:

User: Today is yesterday.
ALICE: That makes no sense.
User: The answer is 3.1412926 approximately.
ALICE: That is cool.

In AIML the syntax <that>...</that> encloses a pattern that matches the robot's previous utterance. A common application of <that> is found in yes-no questions:

<category>
  <pattern>YES</pattern>
  <that>DO YOU LIKE MOVIES</that>
  <template>What is your favorite movie?</template>
</category>

This category is activated when the client says YES. The robot must be able to find out what the client is saying "yes" to. If the robot asked, "Do you like movies?," this category matches, and the response, "What is your favorite movie?," allows the conversation to continue along the same lines.

One interesting application of <that> are categories that enable the robot to respond to knock-knock jokes. The categories:

<category>
  <pattern>KNOCK KNOCK</pattern>
  <template>Who's there?</template>
</category>

<category>
  <pattern>*</pattern>
  <that>WHOS THERE</that>
  <template><person/> who?</template>
</category>

<category>
  <pattern>*</pattern>
  <that>* WHO</that>
  <template>Ha ha very funny, <get name="name"/>.</template>
</category>
produce the following dialog:
User: Knock knock.
ALICE: Who's there?
User: Banana.
ALICE: Banana who?
User: Knock knock.
ALICE: Who's there?
User: Banana.
ALICE: Banana who?
User: Knock knock.
ALICE: Who's there?
User: Orange.
ALICE: Orange who?
User: Orange you glad I didn't say banana.
ALICE: Ha ha very funny, Nancy.

Internally the AIML interpreter stores the input pattern, that pattern and topic pattern along a single path, like: INPUT <that> THAT <topic> TOPIC. When the values of <that> or <topic> are not specified, the program implicitly sets the values of the corresponding THAT or TOPIC pattern to the wildcard "*".

The first part of the path to match is the input. If more than one category have the same input pattern, the program may distinguish between them depending on the value of <that>. If two or more categories have the same <pattern> and <that>, the final step is to choose the reply based on the <topic>.

This structure seems to suggest a design rule: never use <that> unless you have written two categories with the same <pattern>, and never use <topic> unless you write two categories with the same <pattern> and <that>. That being said, one of the most useful applications for <topic> is to create subject-dependent conversation starters like:

<topic name="CARS">
  <category>
    <pattern>*</pattern>
    <template>
      <random>
        <li>What's your favorite car?</li>
        <li>What kind of car do you drive?</li>
        <li>Do you get a lot of parking tickets?</li>
        <li>My favorite car is one with a driver.</li>
      </random>
    </template>
  </category>
</topic>

The <set> element allows the program to add further context as the discussion progresses. The <get> element allows the program to "remember" what has been said in the past. Consider the following example:

<category>
  <pattern>DO YOU LIKE ICE CREAM</pattern>
  <template>What is not to like about
    <set name="it"><set name="topic">ice cream</set></set>?
  </template>
</category>

<topic name="ice cream">
  <category>
    <pattern>WHAT IS YOUR FAVORITE FLAVOR</pattern>
    <template>My favorite flavor of <get name="topic"/> is chocolate
      peanut butter.</template>
  </category>
</topic>
In this case the user has asked the program about ice cream. The first <set> element is used to tell the computer that the word "it" now refers to "ice cream". The second <set> element sets the current topic to "ice cream." This allows the program to filter any responses using the <topic> markup discussed previously. In the second template, the <get> element allows the topic ("ice cream") to be inserted into the response. The <think> element can also be used to set the context. The difference between that and the method described above is that the contents of the <think>element are processed by the program but not displayed to the user.

Considering the vast size of the set of things people could say that are grammatically correct or semantically meaningful, the number of things people actually do say is surprisingly small. Steven Pinker, in his book How the Mind Works wrote, "Say you have ten choices for the first word to begin a sentence, ten choices for the second word (yielding 100 two-word beginnings), ten choices for the third word (yielding a thousand three-word beginnings), and so on. (Ten is in fact the approximate geometric mean of the number of word choices available at each point in assembling a grammatical and sensible sentence). A little arithmetic shows that the number of sentences of 20 words or less (not an unusual length) is about 1020."

Experience has shown that for chat robot programmers, Pinker's calculations, while mathematically accurate, do not necessarily model the real world. Experiments with ALICE indicate that the number of choices for the "first word" is more than ten, but it is only about two thousand. Specifically, about 2000 words covers 95% of all the first words input to ALICE. The number of choices for the second word is only about two. To be sure, there are some first words ("I" and "You" for example) that have many possible second words, but the overall average is just under two words. The average branching factor decreases with each successive word.

A plot of the core of the ALICE brain is shown below and can be found online at http://www.alicebot.org/documentation/gallery/. The spiral images outline a territory of language that has been effectively "conquered" by ALICE, and AIML. These plots also provide guidance for adding new information to the knowledge base.

It is possible to concoct riddles and linguistic paradoxes that show how difficult the natural language problem is. "John saw the mountains flying over Zurich" or "Fruit flies like a banana" reveal the ambiguity of language and the limits of an ALICE-style approach. However, ALICE already knows how to respond to these. It is believed that the basic outline of the spiral graphs will look much the same. The "big trees" from "A *" to "YOUR *" may become bigger, but unless the English language itself changes we won't find any more big trees. The territory of language "understood" by ALICE contains a large percentage of the population of sentences that people commonly use. Expanding the borders even more we will continue to absorb the stragglers outside, until the very last human critic cannot think of one sensible sentence to "fool" ALICE.

RDF - A Quick Primer

RDF is a language for representing information about resources in the World Wide Web. It is particularly intended for representing metadata about Web resources, such as the title, author, and modification date of a Web page, copyright and licensing information about a Web document, or the availability schedule for some shared resource. However, by generalizing the concept of a "Web resource", RDF can also be used to represent information about things that can be identified on the Web, even when they cannot be directly retrieved on the Web. Examples include information about items available from on-line shopping facilities (e.g., information about specifications, prices, and availability), or the description of a Web user's preferences for information delivery.

RDF is intended for situations in which this information needs to be processed by applications, rather than being only displayed to people. RDF provides a common framework for expressing this information so it can be exchanged between applications without loss of meaning. Since it is a common framework, application designers can leverage the availability of common RDF parsers and processing tools. The ability to exchange information between different applications means that the information may be made available to applications other than those for which it was originally created.

RDF is based on the idea of identifying things using Web identifiers (called URIs [Uniform Resource Identifiers]), and describing resources in terms of simple properties and property values. This enables RDF to represent simple statements about resources as a graph of nodes and arcs representing the resources, and their properties and values.

The figure above illustrates that RDF uses URIs to identify:

  • individuals, e.g., Eric Miller, identified by http://www.w3.org/People/EM/contact#me
  • kinds of things, e.g., Person, identified by http://www.w3.org/2000/10/swap/pim/contact#Person
  • properties of those things, e.g., mailbox, identified by http://www.w3.org/2000/10/swap/pim/contact#mailbox
  • values of those properties, e.g. mailto:em@w3.org as the value of the mailbox property (RDF also uses character strings such as "Eric Miller", and values from other data types such as integers and dates, as the values of properties)

The graph above can be represented in natural language by the following group of statements "there is a Person identified by http://www.w3.org/People/EM/contact#me, whose name is Eric Miller, whose email address is em@w3.org, and whose title is Dr.". Using RDF/XML syntax the statements would be marked up as follows:

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:contact="http://www.w3.org/2000/10/swap/pim/contact#">
  <contact:Person rdf:about="http://www.w3.org/People/EM/contact#me">
    <contact:fullName>Eric Miller</contact:fullName>
    <contact:mailbox rdf:resource="mailto:em@w3.org"/>
    <contact:personalTitle>Dr.</contact:personalTitle>
  </contact:Person>
</rdf:RDF>

RDF provides a general, flexible method to decompose any knowledge into small pieces, called triples, with some rules about the semantics (meaning) of those pieces.

The foundation is breaking knowledge down into a labeled, directed graph. Each edge in the graph represents a fact, or a relation between two things. The connection in the example from the node "http://www.w3.org/People/EM/contact#me" labeled "http://www.w3.org/2000/10/swap/pim/contact#mailbox" to the node "mailto:em@w3.org" represents the fact that Eric Miller has an email address of "mailto:em@w3.org." A fact represented this way has three parts: a subject, a predicate (i.e., verb), and an object. The subject is what's at the start of the connection, the predicate is the type of connection (its label), and the object is what's at the end of the connection.

The set of documents that make up the RDF specification tell us two important things. First, it outlines the abstract model, i.e., how to use triples to represent knowledge about the world. Second, it describes how to encode those triples in XML.

Most of the abstract model of RDF comes down to four simple rules:

  1. Facts can be expressed as a Subject-Predicate-Object triples, also known as statements. These facts are like little English sentences.
  2. Subjects, predicates, and objects are given as names for entities, also called resources (dating back to RDF's application to metadata for web resources) or nodes (from graph terminology). Entities represent something, a person, website, or something more abstract like states and relations.
  3. Names are URIs, which are global in scope, always referring to the same entity in any RDF document in which they appear.
  4. Objects can also be given as text values, called literal values, which may or may not be typed using XML Schema data types.

Table 1
Start Node (Subject) Connector (Predicate) End Node (Object)
http://www.w3.org/People/EM/contact#me http://www.w3.org/2000/10/swap/pim/contact#fullName Eric Miller
http://www.w3.org/People/EM/contact#me http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://www.w3.org/2000/10/swap/pim/contact#Person
http://www.w3.org/People/EM/contact#me http://www.w3.org/2000/10/swap/pim/contact#mailbox mailto:em@w3.org
http://www.w3.org/People/EM/contact#me http://www.w3.org/2000/10/swap/pim/contact#personalTitle Dr.

The table above illustrates the set of statements made in the graph. Each row in the triples table represents a fact. This satisfies the need for being able to represent knowledge as a graph.

URIs provide the globally unique, distributed naming system we need for distributed knowledge. URIs can have the same syntax or format as website addresses ( URLs [Uniform Resource Locators]), so you will see RDF files that contain URIs, such as http://www.w3.org/1999/02/22-rdf-syntax-ns#type. The fact that it looks like a web address is totally incidental. There may or may not be an actual website at that address, and it doesn't matter for RDF. It is just a very verbose identifier. (Although sometimes there is something useful at the address.) There are also other types of URIs besides http: URIs, such as URNs and TAGs, which you'll see below. URIs are used as global names because they provide a way to break down the space of all possible names into units that have obvious owners. Since URIs can be quite long, in RDF notations they're usually abbreviated using the concept of namespaces from XML.

Literal values, like "Eric Miller," allow text to be included in RDF. This is used heavily when RDF is used for metadata, its original purpose. In fact, literal values are primarily what tie RDF to the real world, since URIs are just arbitrary strings.

These concepts form most of the abstract RDF model for encoding knowledge. It's analogous to the common API [Application Programming Interface] that most XML libraries provide. If it weren't for curious humans always peeking into files, the actual format of XML wouldn't matter so much as long as we had our appendChild, setAttribute, etc. Of course, we do need a common file format for exchanging data, which RDF provides.

Introducing AIMEE

At last year's Extreme Markup Languages conference I presented Semetag, which collects metadata from common applications using RDF and manages it in an integrated package. In the time since the last conference Semetag has migrated from a standalone desktop application to a web-based application (http://www.semetag.com). One portion of the system that was not complete last year was an interface between an AIML-based chat system and the RDF metadata that had been collected and managed by the system. The result is AIMEE [Artificially Intelligent Metadata Enabled Entity]. AIMEE consists of the basic ALICE knowledge base, but is also connected to the components of the Semetag system in order to "learn" from their metadata and other metadata that is collected from the Internet. This section will discuss some of the more familiar RDF-based standards such as Dublin Core and FOAF, but it is planned that many other applications (such as Musicbrainz, CIA [Central Intelligence Agency] World Factbook, etc.) will also be incorporated into the system prior to the conference.

Semetag uses the Jena Semantic Web framework to manage the RDF metadata stored within the application. A JDBC-enabled database (MySQL) is used to allow persistence of the data models. Jena includes an RDF API, an OWL [Web Ontology Language] API, query capabilities using RDQL [RDF Data Query Language] and a rule-based inference engine.

The AIML engine described within the paper also resides on the server and stores its knowledge base within MySQL. Java and PHP are used to tie the two systems together.

Creating AIML categories from RDF Triples

As we have seen in the RDF discussion, RDF triples look very much like simple sentences. In theory it should be relatively straightforward to take the triples and create simple statements from them. These simple statements could form the basis of a set of <template> elements within an AIML knowledge set. Based on the RDF example shown in the previous section, it is possible to build simple statements such as "Eric Miller's email address is em@w3.org." or "Eric Miller's title is 'Dr.'." Further inferences could also be made based on domain knowledge such as "Eric Miller works for W3C.". The other, perhaps more difficult, part of the equation is the development of the <pattern> elements to trigger the template responses.

In order to create the patterns that a user might enter, it is necessary to be familiar with the RDF data being added to AIMEE's brain. This can be done through analysis of the RDF data itself. However, if an RDF or OWL schema is available for the data set, the quality of the patterns can be increased significantly. Returning again to the RDF example, we see that the resources identified as "http://www.w3.org/People/EM/contact#me" is of type "http://www.w3.org/2000/10/swap/pim/contact#Person". This allows us to know that questions about this resources might begin with "Who". There are other metadata items included that would indicate questions that begin with "What". Using the email example, the following AIML category can be created:

<category>
  <pattern>WHAT IS ERIC MILLERS EMAIL ADDRESS</pattern>
  <template>
    <set name="he">
      <set name="topic">Eric Miller</set>
    </set>'s email address is <a href="mailto:em@w3.org">em@w3.org</a>
  </template>
</category>

The category shown above establishes a simple question and response dialog. If the user enters "What is Eric Miller's email address?", the bot will respond "Eric Miller's email address is em@w3.org." with the address linkable if a web browser is being used as the display mechanism. However, the only way to get this response is to enter the question exactly as shown. AIML allows patterns to be set up which provide more flexible conversations.

The categories shown below extend the previous example to allow other forms of query

<category>
  <pattern>WHAT IS ERIC MILLERS EMAIL ADDRESS</pattern>
  <template>
    <srai>ERIC MILLER EMAIL</srai>
  </template>
</category>

<category>
  <pattern>ERIC MILLER EMAIL</pattern>
  <template>
    <set name="he">
      <set name="topic">Eric Miller<set>
    <set>'s email address is <a href="mailto:em@w3.org">em@w3.org</a>
  </template>
</category>

<category>
  <pattern>PLEASE GIVE * ERIC MILLERS EMAIL *</pattern>
  <template>
    <srai>ERIC MILLER EMAIL</srai>
  </template>
</category>

<category>
  <pattern>_ ERIC MILLERS EMAIL * </pattern>
  <template>
    <srai>ERIC MILLER EMAIL</srai>
  </template>
</category>

<category>
  <pattern>_ ERIC MILLER * EMAIL * </pattern>
  <template>
      <srai>ERIC MILLER EMAIL</srai>
  </template>
</category>

The first category is nearly identical to the previous example. The main difference is that it now references a generic category (the 2nd one) using the <srai> tag. The third category will allow the bot to respond to a request such as "Please give me Eric Miller's email address". The "*" character allows any number of words to appear in its place and still match the pattern. Unlike the "_" character, the "*" also holds the text in memory for use in the <star> element within the <template>. The fourth category will allow the bot to respond to a request such as "Can you tell me Eric Miller's email address?" or "Give me Eric Miller's email." The "_" character specifies that any number of words can occur prior to the specified pattern. The last category will allow the bot to respond to a request such as "How do I contact Eric Miller by email?" The last three categories also reference the 2nd generic category making it possible to define a single response to any number of possible queries (see the "Synonyms" recursion discussion earlier in this paper). This makes the task of maintaining the knowledge base much easier.

The keys to being able to build a reasonably complete knowledge base are two-fold. First is the domain knowledge of the RDF or OWL schema being used for a particular set of data. Second is knowledge of the tendencies of the users in how they interact with the bot.

If other queries are encountered that the bot does not know how to handle, they can be added to the core template for the RDF property and that portion of AIMEE's "brain" can be rebuilt.

The key to being able to convert RDF triples to AIML that makes sense from a natural language viewpoint is the schema for the RDF. A logical starting point for the creation of AIML patterns and templates is the rdf:label for the subject, property and object of each triple.

By knowing the constraints defined within the schema, we can determine that the property contact:fullname contains a string ("Eric Miller") that can be used as a textual representation for the resource with the URI "http://www.w3.org/People/EM/contact#me". We also see that this resource has a type represented by the URI "http://www.w3.org/2000/10/swap/pim/contact#Person". This allows us to assume that we can phrase pattern questions using the pronoun "Who", such as "Who is Eric Miller?". Unlike Jeopardy, not all patterns need to be in the form of a question. Some could be statements, such as "Tell me about Eric Miller."

The rdf:label for the property presents a different kind of challenge. In English, the property usually contains a full phrase that ties the subject and object together. The schema may or may not include a label that is complete enough for this purpose and, therefore, could be extended. There are also instances where simple extension is not enough and a query must be used to fully build the pattern and template. For example, the rdf:label for contact:mailbox might only consist of the word "email". In order to create a statement like "Eric Miller's email address is em@w3.org, the rdf:label for the property must be extended to "email address is". Another solution would be to use a query mechanism to find the triples and format the patterns and templates based on the query results. This will be discussed in more detail shortly.

The object within the triple may consist of an RDF resource or it might simply contain text. If the object refers to a resource, then we can use a method similar to that used for the subject to find an acceptable textual value to use for the patterns and template. If the object is already a string of text, then it could be used directly.

We have been looking at how triples can be converted into AIML patterns and templates. However, you might have noticed a small challenge. Many of the triples do not form statements that a user would enter in natural language. For example, a user will, most likely, not enter a URI and try to match it to a name or email address or whatever. In the example, we are using the object of the triple whose property is "contact:fullName" as the subject within the triple whose property is "contact:mailbox" along with the string contained within the object. In order to do this a set of queries must be developed in order to find the pieces of information needed to construct sensible statements. This methodology provides several benefits:

  • eliminates the need to modify a schema that we might not control;
  • more power in finding the most useful pieces of information in order to build the pattern/template; and,
  • more flexibility in the definition of the pattern/template.
These benefits will be taken into account as we examine a couple of industry standard schemas, the Dublin Core Metadata Initiative and FOAF, and how data marked using these schemas can be imported into AIMEE's knowledge base.

The use of RDF to build AIMEE's knowledge base also addresses the targeting issue mentioned earlier in this paper. One of the main challenges of defining a bot's personality is the process of adding information. By using publicly available and hopefully accurate RDF data sets, the problem of finding a source of the information is alleviated. This leaves the interaction with the user as the remaining challenge. By reviewing how the users interact with the information and finding areas where AIMEE did not respond in the most appropriate manner, new rules can be created and new AIML files can be generated that take these edge case into account and prepares her for them in the future.

AIMEE and the Dublin Core

The Dublin Core Metadata Initiative was founded in 1995 and has become one of the preeminent standards for web metadata. It consists of 15 elements that can be used to describe written materials on the web, such as HTML pages. While any of the 15 elements can be used to generate natural language statements, a set of 11 seems more useful to start, than the others. These 11 include:

  • identifier - an unambiguous reference to the resource within a given context, possibly a URI, URL, DOI [Document Object Identifier], or ISBN [International Standard Book Number]
  • creator - an entity primarily responsible for making the content of the resource, typically the name of the creating entity
  • contributor - an entity responsible for making contributions to the content of the resource, typically the name of the contributing entity
  • publisher - the entity responsible for making the resource available, typically the name of the publishing entity
  • subject - the topic of the content of the resource, typically expressed as keywords, phrases, or classification codes
  • description - an account of the content of the resource
  • title - the name given to a resource, typically the formal name by which the resource is known
  • type - the nature or genre of the content of a resource
  • relation - a reference to a related resource
  • date - a date associated with an event in the life cycle of the resource, typically the date of creation or availability of the resource
  • coverage - the extent or scope of the content of the resource, typically including spatial locations, temporal period, or jurisdiction

In this section we will be considering the following Dublin Core example:

<rdf:RDF
  xmlns:rdf="http://www.w3c.org/RDF/"
  xmlns:dc="http://purl.org/RDF/DC/"
  xmlns:owl="http://www.w3.org/2002/07/owl#">
  <rdf:Description rdf:about="http://purl.org/metadata/dublin_core_elements">
    <dc:Title>
	  Dublin Core Metadata Element Set: Reference Description
	</dc:Title>
    <dc:Creator rdf:resource="http://purl.net/people/eric"/>
    <dc:Creator rdf:resource="http://purl.net/people/stu"/>
    <dc:Subject>
	  Metadata, Dublin Core element, resource description
	</dc:Subject>
    <dc:Description>This document is the reference description of the Dublin Core
      Metadata Element Set designed to facilitate resource discovery.</dc:Description>
    <dc:Publisher>OCLC Online Computer Library Center, Inc.</dc:Publisher>
    <dc:Type>Technical Report</dc:Type>
    <dc:Date>1997-11-02</dc:Date>
    <dc:Relation>
      <rdf:Description>
         <dc:Relation.Type>IsBasisFor</dc:Relation.Type>
         <dc:Relation.Identifier>
                ftp://ftp.ietf.org/internet-drafts/draft-kunze-dc-02.txt
         </dc:Relation.Identifier>
         <dc:Title>Dublin Core Metadata for Simple Resource Discovery</dc:Title>
         <dc:Creator rdf:resource="http://purl.net/people/stu"/>
         <dc:Creator>John A. Kunze</dc:Creator>
         <dc:Creator>Carl Lagoze</dc:Creator>
         <dc:Type>Internet RFC</dc:Type>
      </rdf:Description>
    </dc:Relation>
  </rdf:Description>

  <rdf:Description rdf:about="http://purl.net/people/eric">
    <owl:sameAs rdf:resource="http://www.w3.org/People/EM/contact#me"/>
  </rdf:Description>

</rdf:RDF>
This example describes a technical report written by Eric Miller and Stu Weibel entitled "Dublin Core Metadata Element Set: Reference Description". The second entry uses OWL to say that one of the creators of this document is the same person whose contact information we have been discussing throughout this paper. It should be noted that it is through entries such as the second one that it becomes possible to start aggregating knowledge about certain resources.

If we look at this sample, we can see several statements that can be made about the document being described that would work well as templates within an AIML file. For example,

  • Eric Miller and Stu Weibel created a technical report entitled 'Dublin Core Metadata Element Set: Reference Description.'
  • Eric Miller created this document. He created this document with Stu Weibel. (and vice versa)
  • This document discusses the subjects of metadata, Dublin Core element, and resource description.
  • This document can be summarized as follows: 'This document is the reference description of the Dublin Core Metadata Element Set designed to facilitate resource discovery.'
  • This document was published by OCLC Online Computer Library Center, Inc.
  • This document is the basis for another document entitled 'Dublin Core Metadata for Simple Resource Discovery.'

Once we have identified pieces of information that would work well within AIMEE's brain, we must consider the pattern's that a user might enter in searching for this information. In the first round we'll concentrate on some sample questions and then look for opportunities to use recursion and reduction to simplify the AIML while also making it more flexible. Candidate initial questions could include:

  • What works has Eric Miller created?
  • What subjects are discussed in the document "Dublin Core Metadata Element Set: Reference Description?"
  • Please summarize this document.
  • Who published the document?
The following sections will use some of the Dublin Core elements to create <category> elements for use within AIMEE's brain.

dc:Creator

What works has Eric Miller created? - We could also use Stu Weibel as the subject. We can also use different verb synonyms based on the type of the resource. In the case of a technical report, "written" or "authored" would also work. (This would be an excellent use case for the addition of the Wordnet RDF dataset within AIMEE's brain as well.)

You may recall a statement made earlier that a set of queries would be beneficial in the development of the AIML patterns and templates. This will be demonstrated now. Upon further examination of the RDF, there is not a single triple that provides all the data needed to build this pattern and the template. Instead, the dataset will need to be queried using an RDF query language such as SPARQL, SeRQL, RQL, RDQL, etc. For simplicity, this paper will use pseudo-code rather than a particular language.

To build the pattern, first we need to query for all the values of <dc:Creator>. Once we have the list of resource creators, we can query for all the resources that were created by each <dc:Creator>. Finally we need to retrieve the <dc:Title> for each resource. Based on the example above, the query should return one resource created by Eric Miller and two created by Stu Weibel. One of these resources was created by both.

Based on those results the following AIML can be created:

<category>
  <pattern>DUBLINCORE WORKS ERIC MILLER CREATED</pattern>
  <template>
    <set name="he"><set name="topic">Eric Miller</set></set> created a
    technical report entitled '<set name="it"><set name="documentname1"><set
    name="documentname">Dublin Core Metadata Element Set: Reference
    Description</set></set></set>'.
  </template>
</category>

<category>
  <pattern>WHAT * HAS ERIC MILLER CREATED</pattern>
  <template>
    <srai>DUBLINCORE WORKS ERIC MILLER CREATED</srai>
  </template>
</category>

<category>
  <pattern>WHAT * HAS ERIC MILLER WRITTEN</pattern>
  <template>
    <srai>DUBLINCORE WORKS ERIC MILLER CREATED</srai>
  </template>
</category>

<category>
  <pattern>DUBLINCORE WORKS STU WEIBEL CREATED</pattern>
  <template>
    <set name="he"><set name="topic">Stu Weibel</set></set> created [1] a
    technical report entitled '<set name="documentname1">Dublin Core Metadata
    Element Set: Reference Description</set>.' He also created [2] an Internet
    RFC entitled '<set name="documentname2">Dublin Core Metadata for Simple
    Resource Discovery</set>.'
  </template>
</category>

<category>
  <pattern>WHAT * HAS STU WEIBEL CREATED</pattern>
  <template>
    <srai>DUBLINCORE WORKS STU WEIBEL CREATED</srai>
  </template>
</category>

<category>
  <pattern>WHAT * HAS STU WEIBEL WRITTEN</pattern>
  <template>
    <srai>DUBLINCORE WORKS STU WEIBEL CREATED</srai>
  </template>
</category>
In the set of <category> elements for Eric Miller, we see the base category which provides AIMEE with the answer to any number of possible queries. It is unlikely that a user will enter the string "DUBLINCORE WORKS ERIC MILLER CREATED", but this provides a central location to which any number of patterns can point using <srai>. The two subsequent <category> elements provide examples of ways in which the query might be phrased by a user. The use of the <set> elements allows AIMEE to store some context for the discussion. The "it" variable allows AIMEE to keep track of the current thing being discussed. The "topic" variable is very similar but, in this case, is set to the value of "Eric Miller" as the topic or subject of the discussion. Each document from the query will be counted and assigned a number, e.g. "documentname1", "documentname2", etc. The "documentname" variable is used to store the name of the document being discussed.

The second set of three <category> elements contain the information for the resources created by Stu Weibel. Notice that the <template> is somewhat different that the one for Eric Miller. Since Stu has written more than one document, we must devise a way to indicate this to the user and provide a mechanism where the user is not forced to enter the entire title of the document. We will see how this is used in the following section.

dc:Subject

What subjects are discussed in the document "Dublin Core Metadata Element Set: Reference Description?" - If this document has been presented to the user in a prior statement, this would be an ideal candidate for the use of the contextual capabilities of AIML. The user will not want to continually enter the long title, so other patterns should be developed that allow the user to use words like "it" or "the document".

To build the pattern, we need to query for the all the resources that have one or more <dc:Subject> elements defined. We also need to retrieve the <dc:Title> for each resource in the list. Based on the example above, the query should return only one resource.

<category>
  <pattern>DUBLINCORE DOCUMENT SUBJECTS</pattern>
  <template>
    <condition name="documentname">
      <li value="Dublin Core Metadata Element Set: Reference
      Description"><get name="documentname"/> discusses the subjects Metadata,
      Dublin Core element, resource description</li>
      <li>Which document are you asking about?</li>
    </condition>
  </template>
</category>

<category>
  <pattern>DOCUMENT *</pattern>
  <that>WHAT DOCUMENT ARE YOU ASKING ABOUT</that>
  <template>
    <srai>DUBLINCORE DOCUMENT <star/> SUBJECTS</srai>
  </template>
</category>

<category>
  <pattern>THE FIRST ONE</pattern>
  <that>WHAT DOCUMENT ARE YOU ASKING ABOUT</that>
  <template>
    <srai>DUBLINCORE DOCUMENT 1 SUBJECTS</srai>
  </template>
</category>

<category>
  <pattern>_ SUBJECTS * IN * DOCUMENT</pattern>
  <template>
    <srai>DUBLINCORE DOCUMENT SUBJECTS</srai>
  </template>
</category>

<category>
  <pattern>_ SUBJECTS * OF * DOCUMENT</pattern>
  <template>
    <srai>DUBLINCORE DOCUMENT SUBJECTS</srai>
  </template>
</category>

<category>
  <pattern>_ TOPICS * IN * DOCUMENT</pattern>
  <template>
    <srai>DUBLINCORE DOCUMENT SUBJECTS</srai>
  </template>
</category>

<category>
  <pattern>_ TOPICS * OF * DOCUMENT</pattern>
  <template>
    <srai>DUBLINCORE DOCUMENT SUBJECTS</srai>
  </template>
</category>

<category>
  <pattern>DUBLINCORE DOCUMENT * SUBJECTS</pattern>
  <template>
    <think><set name="documentnumber"><star/></set></think>
    <condition name="documentnumber">
	  <li value="1"><set name="it"><set name="documentname">
        <get name="documentname1"/></set></set></li>
	  <li value="2"><set name="it"><set name="documentname">
        <get name="documentname2"/></set></set></li>
	  <li value="3"><set name="it"><set name="documentname">
        <get name="documentname3"/></set></set></li>
      ...
      <li value="10"><set name="it"><set name="documentname">
        <get name="documentname10"/></set></set></li>
    </category>
    <srai>DUBLINCORE DOCUMENT SUBJECTS</srai>
  </template>
</category>

<category>
  <pattern>_ SUBJECTS * OF DOCUMENT *</pattern>
  <template>
    <srai>DUBLINCORE DOCUMENT <star index="2"/> SUBJECTS</srai>
  </template>
</category>

<category>
  <pattern>_ TOPICS * IN DOCUMENT *</pattern>
  <template>
    <srai>DUBLINCORE DOCUMENT <star index="2"/> SUBJECTS</srai>
  </template>
</category>

<category>
  <pattern>_ TOPICS * OF DOCUMENT *</pattern>
  <template>
    <srai>DUBLINCORE DOCUMENT <star index="2"/> SUBJECTS</srai>
  </template>
</category>
As in the previous set of <category> elements, a generic base pattern is created and several other patterns then point to it. The <condition> element checks to see whether AIMEE has set a value for the "documentname" variable. If so, she can respond accordingly. If not, she will tell the user that shew doesn't know which document is being discussed. However, in the second set of <category> elements we take advantage of AIMEE's ability to recall context and allow the user to ask shorter questions. If you recall in the <category> elements created for <dc:Creator>, Stu Weibel is credited with creating 2 documents. The user will need to specify to AIMEE which document they are discussing. This can be done using an string such as "What are the subjects discussed in document 2?" AIMEE will match the pattern and set the "documentname" variable to the title of the second document, represented by the "documentname2" variable. Once she has set that context, she can then proceed to answer the question.

dc:Description

Please summarize this document. - The user can enter statements as well as commands and the AIML developer needs to take this possibility into account when developing the patterns for a given dataset.

To build the pattern, we need to query for all the resources that have one or more <dc:Description> elements defined. We also need to retrieve the <dc:Title> for each resource in the list. Based on the example above, the query should return only one resource.

<category>
  <pattern>DUBLINCORE SUMMARIZE DOCUMENT</pattern>
  <template>
    <condition name="documentname">
      <li value="Dublin Core Metadata Element Set: Reference Description">
        This document is the reference description of the Dublin Core Metadata
        Element Set designed to facilitate resource discovery.</li>
      <li>I don't have a summary for the document.</li>
    </condition>
  </template>
</category>

<category>
  <pattern>_ SUMMARIZE * DOCUMENT</pattern>
  <template>
    <srai>DUBLINCORE SUMMARIZE DOCUMENT</srai>
  </template>
</category>

<category>
  <pattern>_ SUMMARY * DOCUMENT</pattern>
  <template>
    <srai>DUBLINCORE SUMMARIZE DOCUMENT</srai>
  </template>
</category>

<category>
  <pattern>DUBLINCORE SUMMARIZE DOCUMENT *</pattern>
  <template>
    <think><set name="documentnumber"><star/></set></think>
    <condition name="documentnumber">
	  <li value="1"><set name="it"><set name="documentname">
        <get name="documentname1"/></set></set></li>
	  <li value="2"><set name="it"><set name="documentname">
        <get name="documentname2"/></set></set></li>
	  <li value="3"><set name="it"><set name="documentname">
        <get name="documentname3"/></set></set></li>
      ...
      <li value="10"><set name="it"><set name="documentname">
      <get name="documentname10"/></set></set></li>
    </condition>
    <srai>DUBLINCORE SUMMARIZE DOCUMENT</srai>
  </template>
</category>

<category>
  <pattern>_ SUMMARIZE * DOCUMENT *</pattern>
  <template>
    <srai>DUBLINCORE SUMMARIZE DOCUMENT *</srai>
  </template>
</category>

<category>
  <pattern>_ SUMMARY * DOCUMENT *</pattern>
  <template>
    <srai>DUBLINCORE SUMMARIZE DOCUMENT *</srai>
  </template>
</category>
This set of <category> elements works very much like the previous set. The main difference is the ability to respond to a directive in addition to answering a question. The user could enter "Please summarize document 2" or "How would you summarize the document?" and get the same results.

dc:Publisher

Who published the document? - In English, corporate entities can also be referred to using personal pronouns. The patterns will also need to handle beginnings such as "What organization ..." or "What company ..."

To build the pattern, we need to query for all the resources that have one or more <dc:Publisher> elements defined. We also need to retrieve the <dc:Title> for each resource in the list. Based on the example above, the query should return only one resource.

<category>
  <pattern>DUBLINCORE DOCUMENT PUBLISHER</pattern>
  <template>
    <condition name="documentname">
      <li value="Dublin Core Metadata Element Set: Reference Description">
	    <get name="documentname"/> was published by OCLC Online Computer
	    Library Center, Inc.
	  </li>
      <li>I don't know who published this document.</li>
    </condition>
  </template>
</category>

<category>
  <pattern>_ PUBLISHED * DOCUMENT</pattern>
  <template>
    <srai>DUBLINCORE DOCUMENT PUBLISHER</srai>
  </template>
</category>

<category>
  <pattern>DUBLINCORE DOCUMENT * PUBLISHER</pattern>
  <template>
    <think><set name="documentnumber"><star/></set></think>
    <condition name="documentnumber">
	  <li value="1"><set name="it"><set name="documentname">
        <get name="documentname1"/></set></set></li>
	  <li value="2"><set name="it"><set name="documentname">
        <get name="documentname2"/></set></set></li>
	  <li value="3"><set name="it"><set name="documentname">
        <get name="documentname3"/></set></set></li>
      ...
      <li value="10"><set name="it"><set name="documentname">
        <get name="documentname10"/></set></set></li>
    </condition>
    <srai>DUBLINCORE DOCUMENT PUBLISHER</srai>
  </template>
</category>

<category>
  <pattern>_ PUBLISHED * DOCUMENT *</pattern>
  <template>
    <srai>DUBLINCORE SUMMARIZE DOCUMENT *</srai>
  </template>
</category>
This set of <category> elements is very similar to the previous sets. Other Dublin Core elements could be processed in the same way, but in the interest of space, they won't be discussed in this paper.

Introducing AIMEE to Your Friends using FOAF

The FOAF [Friend of a Friend] project is an effort to define an RDF vocabulary for expressing metadata about people, and their interests, relationships and activities. The basic categories of information it defines are:

  • foaf:Person
  • foaf:Document
  • foaf:Image

There are five classes of properties which define the categories:

  • the basic class contains information such as names, home pages, and email addresses;
  • the personal information class contains more detailed information including who a person knows, their interests and projects;
  • an online accounts class contains information about the various online identities a person might have using different chat servers and other online accounts;
  • a projects and groups class includes information on any projects or groups of which the person might be a part;
  • a documents and images class allows files to be attached to a person's information using the <foaf:depiction> element.
An example of FOAF markup (with some Dublin Core) is shown below.
<rdf:RDF
  xmlns:rdf="http://www.w3c.org/RDF/"
  xmlns:foaf="http://xmlns.com/foaf/0.1/"
  xmlns:dc="http://purl.org/dc/elements/1.1/"
  xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#">

<foaf:Person rdf:about="http://www.w3.org/People/EM/contact#me">
  <foaf:name>Eric Miller</foaf:name>
  <foaf:firstName>Eric</foaf:firstName>
  <foaf:surname>Miller</foaf:surname>
  <foaf:mbox rdf:resource="mailto:em@csail.mit.edu"/>
  <foaf:mbox rdf:resource="mailto:em@zepheira.com"/>
  <foaf:depiction>
    <foaf:Image rdf:about=
        "http://www.ilrt.bristol.ac.uk/people/cmdjb/events/dc7/orig/eric.png">
      <dc:Thumbnail rdf:resource=
        "http://www.ilrt.bristol.ac.uk/people/cmdjb/events/dc7/orig/eric.png"/>
      <dc:Title>Eric Miller, relaxing.</dc:Title>
      <dc:Description>Eric Miller, relaxing.</dc:Description>
      <dc:Format>image/png</dc:Format>
    <foaf:/Image>
  <foaf:/depiction>
  <foaf:homepage rdf:resource="http://purl.org/net/eric/"/>
  <foaf:homepage rdf:resource="http://www.w3.org/People/EM/" />
  <rdfs:seeAlso rdf:resource="http://purl.org/net/eric/webwho.xrdf"/>
  <foaf:workplaceHomePage rdf:resource="http://zepheira.com/" />
  <foaf:nick>em</foaf:nick>
  <foaf:knows>
    <foaf:Person rdf:about="http://www.w3.org/People/Berners-Lee/card#i">
      <foaf:name>Tim Berners-Lee</foaf:name>
      <foaf:isPrimaryTopicOf rdf:resource=
        "http://en.wikipedia.org/wiki/Tim_Berners-Lee"/>
      <foaf:homepage rdf:resource="http://www.w3.org/People/Berners-Lee/"/>
      <foaf:mbox rdf:resource="mailto:timbl@w3.org"/>
      <rdfs:seeAlso rdf:resource="http://www.w3.org/People/Berners-Lee/card"/>
    </foaf:Person>
  </foaf:knows>
  <foaf:knows>
    <foaf:Person rdf:ID="dajobe">
      <foaf:name>Dave Beckett</foaf:name>
      <foaf:mbox rdf:resource="mailto:dave.beckett@bristol.ac.uk"/>
    </foaf:Person>
  </foaf:knows>
  <foaf:knows>
    <foaf:Person rdf:ID="matola">
      <foaf:name>Tod Matola</foaf:name>
      <foaf:mbox rdf:resource="mailto:matola@oclc.org"/>
  </foaf:knows>
  <foaf:knows>
    <foaf:Person rdf:ID="danbri">
      <foaf:name>Dan Brickley</foaf:name>
      <foaf:mbox rdf:resource="mailto:danbri@w3.org"/>
    </foaf:Person>
  </foaf:knows>
  <foaf:knows>
    <foaf:Person rdf:ID="weibel">
      <foaf:name>Stu Weibel</foaf:name>
      <foaf:mbox rdf:resource="mailto:weibel@oclc.org"/>
      <foaf:workplaceHomePage rdf:resource="http://www.oclc.org" />
    </foaf:Person>
  </foaf:knows>
  <foaf:knows>
    <foaf:Person rdf:ID="baker">
      <foaf:name>Tom Baker</foaf:name>
      <foaf:mbox rdf:resource="mailto:thomas.baker@gmd.de"/>
    </foaf:Person>
  </foaf:friend>
  <foaf:knows>
    <foaf:Person rdf:ID="connolly">
      <foaf:name>Dan Connolly</foaf:name>
      <foaf:mbox rdf:resource="mailto:connolly@w3.org"/>
    </foaf:Person>
  </foaf:knows>
  <foaf:knows>
    <foaf:Person rdf:ID="swick">
      <foaf:name>Ralph Swick</foaf:name>
      <foaf:mbox rdf:resource="mailto:swick@w3.org"/>
    </foaf:Person>
  </foaf:knows>
  <foaf:interest>
    <rdf:Description rdf:about="http://purl.org/rss"
      dc:Title="RDF Site Summary (RSS)"/>
  </foaf:interest>
  <foaf:interest>
    <rdf:Description rdf:about="http://dublincore.org/"
      dc:Title="Dublin Core Metadata Initiative"/>
  </foaf:interest>
  <foaf:interest>
    <rdf:Description rdf:about="http://www.w3.org/2001/sw/"
      dc:Title="Semantic Web"/>
  </foaf:interest>
  <foaf:interest>
    <rdf:Description rdf:about="http://www.w3.org/RDF/"
      dc:Title="Resource Description Framework (RDF)"/>
  </foaf:interest>
  <foaf:interest>
    <rdf:Description rdf:about="http://www.w3.org/XML/"
      dc:Title="Extensible Markup Language (XML)"/>
  </foaf:interest>
</foaf:Person>
This example describes Eric Miller, how to contact him, who he knows and what some of his interests are. As can be seen, some of the same information is shown here using FOAF as was shown using a non-standard markup in the RDF section earlier in this paper. Using OWL's <owl:sameAs> markup we can map the FOAF to the "contact" markup.
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:contact="http://www.w3.org/2000/10/swap/pim/contact#"
  xmlns:foaf="http://xmlns.com/foaf/0.1/"
  xmlns:owl="http://www.w3.org/2002/07/owl#">
  <rdf:Description rdf:about="contact:Person">
    <owl:sameAs rdf:resource="foaf:Person"/>
  </rdf:Description>

  <rdf:Description rdf:about="contact:fullName">
    <owl:sameAs rdf:resource="foaf:name"/>
  </rdf:Description>

  <rdf:Description rdf:about="contact:mailbox">
    <owl:sameAs rdf:resource="foaf:mbox"/>
  </rdf:Description>

  <rdf:Description rdf:about="contact:personalTitle">
    <owl:sameAs rdf:resource="foaf:title"/>
  </rdf:Description>
</rdf:RDF
In mapping the equivalency of the two markup schemes, we have effectively combined the entries. In doing so, we can also reuse the AIML generated previously.

If we look at the main FOAF RDF sample, we can see several statements that can be made about Eric Miller that would work well as templates within an AIML file. For example,

  • Eric Miller has several email addresses.
  • Eric Miller has web pages set up at several URLs.
  • Eric Miller knows Tim Berners-Lee, Dave Beckett, Tod Matola, Dan Brickley, Stu Weibel, Tom Baker, Dan Connolly and Ralph Swick.
  • His interests include RDF Site Summary (RSS), Dublin Core Metadata Initiative, Semantic Web, RDF, and XML.

Once we have identified pieces of information that would be work well within AIMEE's brain, we must consider the patterns that a user might enter in searching for this information. In the first round we'll concentrate on some sample questions and then look for opportunities to use recursion and reduction to simplify the AIML while also making it more flexible. Candidate initial questions could include:

  • What email addresses are associated with Eric Miller?
  • What web pages are associated with Eric Miller?
  • Who does Eric Miller know? (and conversely, Who knows Eric Miller?)
  • What are Eric Miller's interests? (and Who shares an interest with Eric Miller?)
The following sections will use some of the FOAF elements to create <category> elements for use within AIMEE's brain.

foaf:mbox

What email addresses are associated with Eric Miller? We will start with the categories defined previously for Eric Miller's email address. Since the FOAF entry for Eric includes several addresses, we must extend the model a bit to handle this use case.

To build the pattern, we need to query the FOAF entry for the person identified by the URI "http://www.w3.org/People/EM/contact#me" and retrieve all the <foaf:mbox> elements defined. We will also query for the <foaf:name> element. Based on the example above, the query should return three addresses (2 from the FOAF entry and one from the "contact" entry earlier in the paper.

<category>
  <pattern>FOAF MBOX HTTP WWW W3 ORG PEOPLE EM CONTACT ME</pattern>
  <template>I have 3 email addresses listed for
    <set name="he">
      <set name="topic">Eric Miller</set>
    </set>. [1] <a href="mailto:em@w3.org">em@w3.org</a>,
      [2] <a href="mailto:em@csail.mit.edu">em@csail.mit.edu</a>,
      and [3] <a href="mailto:em@zepheira.com">em@zepheira.com</a>
  </template>
</category>

<category>
  <pattern>FOAF MBOX WHICH ERIC MILLER</pattern>
  <template>
    <srai>FOAF MBOX HTTP WWW W3 ORG PEOPLE EM CONTACT ME</srai>
  </template>
</category>

<category>
  <pattern>WHAT IS ERIC MILLERS EMAIL ADDRESS</pattern>
  <template>
    <srai>FOAF MBOX WHICH ERIC MILLER</srai>
  </template>
</category>

<category>
  <pattern>PLEASE GIVE * ERIC MILLERS EMAIL *</pattern>
  <template>
    <srai>FOAF MBOX WHICH ERIC MILLER</srai>
  </template>
</category>

<category>
  <pattern>_ ERIC MILLERS EMAIL * </pattern>
  <template>
    <srai>FOAF MBOX WHICH ERIC MILLER</srai>
  </template>
</category>

<category>
  <pattern>_ ERIC MILLER * EMAIL * </pattern>
  <template>
    <srai>FOAF MBOX WHICH ERIC MILLER</srai>
  </template>
</category>
There are some differences in this version of the categories. First of all we have added the "FOAF MBOX" string to the beginning of the <srai> elements to help denote where the data originated. We have also changed the name of the base <category> to include the URI for Eric Miller. This change was made to utilize the URI which (we hope) uniquely identifies Eric Miller. Notice the format of the URI. AIML drops all punctuation in its patterns, so the resulting names includes only the alphanumeric characters with spaces in place of the punctuation. There is still the problem of more than one person named "Eric Miller" being contained within AIMEE's brain. The <category> elements have already been set up to handle such as instance in the event that another "Eric Miller" is added to the knowledge base. In the current setup AIMEE must figure out which "Eric Miller" is being requested. She determines that there is only one and proceeds to deliver the requested information. To handle the case of more than person with the same name, the initial query would need to be modified to first look for all <foaf:Person> elements that have a <foaf:name> of "Eric Miller". If more than one exists, then a new set of patterns would be needed to allow the user to tell AIMEE which "Eric Miller" they are asking about. This could be done by looking at additional FOAF properties such as <foaf:title>, <foaf:nickname, or even <foaf:knows. An example of such a case is shown below for "John Doe":
<category>
  <pattern>FOAF MBOX WHICH JOHN DOE</pattern>
  <template>
    <condition name="johndoe">
      <li value="johndoe1"><srai>FOAF MBOX HTTP WWW DOCBUBBA COM</srai></li>
      <li value="johndoe2"><srai>FOAF MBOX HTTP WWW JOHNSFAIRYTALES COM</srai></li>
      <li value="johndoe3"><srai>FOAF MBOX HTTP WWW DEARJOHN COM</srai></li>
      <li>I know of 3 people with the name "John Doe" <br/>
      Please enter the number:<br/>
      [1] Dr. John Doe (aka "Bubba"); [2] John Doe (who knows Simple Simon,
      Cinderella and Red Riding Hood), [3] John Doe (interests include wine,
      women, song) [0] none of these<br/>
      Which John Doe are you asking about?</li>
    </condition>
  </template>
</category>

<category>
  <pattern>*</pattern>
  <that>WHICH JOHN DOE ARE YOU ASKING ABOUT</that>
  <template>
    <condition>
      <li value="1">
        <think><set name="johndoe">johndoe<star/></set></think>
        <srai>FOAF MBOX HTTP WWW DOCBUBBA COM</srai></li>
      <li value="2">
        <think><set name="johndoe">johndoe<star/></set></think>
        <srai>FOAF MBOX HTTP WWW JOHNSFAIRYTALES COM</srai></li>
      <li value="3">
        <think><set name="johndoe">johndoe<star/></set></think>
        <srai>FOAF MBOX HTTP WWW DEARJOHN COM</srai></li>
      <li>I'm sorry I can't help you!</li>
    </condition>
  </template>
</category>
In this case we added some more descriptive information in the hopes that the user can select the person they are asking about. Once the user selects, AIMEE proceeds on, just as in the Eric Miller case. Also notice that in the first <category>, the <condition> element checks to make sure that AIMEE hasn't already been told which "John Doe" is being discussed. If she already knows, she won't ask again.

foaf:homepage, foaf:workplaceHomepage

What web pages are associated with Eric Miller? FOAF allows 2 types of home page to be connected to a person, a regular home page and a workplace home page. In this example we will show all home pages whenever the user asks for a home page and show only the workplace home page if the user asks for it specifically. The same issue about more than one person sharing a name exists in this case also, but for brevity, we will not discuss this further in this or subsequent examples. Suffice it to say that the extensions shown in the "John Doe" example previously would need to be carried forward to any FOAF property that is carried forward into AIML.

To build the pattern, we need to query the FOAF entry for the person identified by the URI "http://www.w3.org/People/EM/contact#me" and retrieve all the <foaf:homepage> and <foaf:workplaceHomepage> elements defined, keeping track of which is which. We will also query for the <foaf:name> element. Based on the example above, the query should return three home pages (two personal and one workplace).

<category>
  <pattern>FOAF HOMEPAGE HTTP WWW W3 ORG PEOPLE EM CONTACT ME</pattern>
  <template>I have 3 home pages listed for
    <set name="he">
      <set name="topic">Eric Miller</set>
    </set> - [1] <a href="http://purl.org/net/eric/">http://purl.org/net/eric/</a>,
      [2] <a href="http://www.w3.org/People/EM/">http://www.w3.org/People/EM/</a>,
      and [3] (workplace) <a href="http://zepheira.com/">http://zepheira.com/</a>
  </template>
</category>

<category>
  <pattern>FOAF WORKPLACEHOMEPAGE HTTP WWW W3 ORG PEOPLE EM CONTACT ME</pattern>
  <template>I have 1 workplace home page listed for
    <set name="he">
      <set name="topic">Eric Miller</set>
    </set> - <a href="http://zepheira.com/">http://zepheira.com/</a>.
  </template>
</category>

<category>
  <pattern>FOAF WORKPLACEHOMEPAGE WHICH ERIC MILLER</pattern>
  <template>
    <srai>FOAF WORKPLACEHOMEPAGE HTTP WWW W3 ORG PEOPLE EM CONTACT ME</srai>
  </template>
</category>

<category>
  <pattern>FOAF HOMEPAGE WHICH ERIC MILLER</pattern>
  <template>
    <srai>FOAF HOMEPAGE HTTP WWW W3 ORG PEOPLE EM CONTACT ME</srai>
  </template>
</category>

<category>
  <pattern>_ SHOW * ERIC MILLERS HOME PAGE</pattern>
  <template>
    <srai>FOAF HOMEPAGE WHICH ERIC MILLER</srai>
  </template>
</category>

<category>
  <pattern>_ SHOW * ERIC MILLERS HOMEPAGE</pattern>
  <template>
    <srai>FOAF HOMEPAGE WHICH ERIC MILLER</srai>
  </template>
</category>

<category>
  <pattern>_ SHOW * ERIC MILLERS WORKPLACE HOME PAGE</pattern>
  <template>
    <srai>FOAF WORKPLACEHOMEPAGE WHICH ERIC MILLER</srai>
  </template>
</category>

<category>
  <pattern>_ HOMEPAGE * ERIC MILLER _</pattern>
  <template>
    <srai>FOAF HOMEPAGE WHICH ERIC MILLER</srai>
  </template>
</category>

<category>
  <pattern>_ ERIC MILLER * HOMEPAGE _</pattern>
  <template>
    <srai>FOAF HOMEPAGE WHICH ERIC MILLER</srai>
  </template>
</category>
Upon further examination, you can see that any workplace home page was identified as such when the home pages were listed. They will allow the user to know that they can ask for those types of home pages specifically, if desired.

foaf:knows

Who does Eric Miller know? (and conversely, Who knows Eric Miller?) According to the FOAF spec, "FOAF documents describe the characteristics and relationships amongst friends of friends, and their friends, and the stories they tell." It is the connectedness of people that has made FOAF as widely used as it is. When a person creates his or her FOAF entry, they can make the statement that they "know" another person. Conversely, other people can claim they "know" the first person. It is quite likely that the lists will not be mirror images of each other, but that might not really be important.

Another potential application of the <foaf:knows> element is something similar to the "Six Degrees of Kevin Bacon" (http://en.wikipedia.org/wiki/Six_Degrees_of_Kevin_Bacon) which demonstrates the connectedness of the Hollywood community through working relationships with Kevin Bacon, or with those who have worked with Kevin Bacon, or those who have worked with Bacon's coworkers, etc. We may find that it is, indeed, a very small world.

To build the pattern, we need to query the FOAF entry for the person identified by the URI "http://www.w3.org/People/EM/contact#me" and retrieve all the <foaf:knows> elements defined. We will also query for the <foaf:name> element. Based on the example above, the query should return eight resources.

To build the converse pattern, we need to query for all the instances of the <foaf:knows> element that reference the URI "http://www.w3.org/People/EM/contact#me" as well as the associated the <foaf:name> elements.

<category>
  <pattern>FOAF KNOWN BY HTTP WWW W3 ORG PEOPLE EM CONTACT ME</pattern>
  <template>
    <set name="he">
      <set name="topic">Eric Miller</set>
    </set> knows the following people: Tim Berners-Lee, Dave Beckett, Tod Matola,
      Dan Brickley, Stu Weibel, Tom Baker, Dan Connolly and Ralph Swick. </template>
</category>

<category>
  <pattern>FOAF KNOWS HTTP WWW W3 ORG PEOPLE EM CONTACT ME</pattern>
  <template>The following people know
    <set name="he">
      <set name="topic">Eric Miller</set>
    </set>: Tod Matola, Dan Brickley, Stu Weibel, Tim Berners-Lee, Eric Freese,
      Dave Beckett, and Uche Ogbuji.
  </template>
</category>

<category>
  <pattern>FOAF KNOWN BY WHICH ERIC MILLER</pattern>
  <template>
    <srai>FOAF KNOWN BY HTTP WWW W3 ORG PEOPLE EM CONTACT ME</srai>
  </template>
</category>

<category>
  <pattern>FOAF KNOWS WHICH ERIC MILLER</pattern>
  <template>
    <srai>FOAF KNOWS HTTP WWW W3 ORG PEOPLE EM CONTACT ME</srai>
  </template>
</category>

<category>
  <pattern>WHO DOES ERIC MILLER KNOW</pattern>
  <template>
    <srai>FOAF KNOWN BY WHICH ERIC MILLER</srai>
  </template>
</category>

<category>
  <pattern>WHO KNOWS ERIC MILLER</pattern>
  <template>
    <srai>FOAF KNOWS WHICH ERIC MILLER</srai>
  </template>
</category>

<category>
  <pattern>LIST ERIC MILLERS ACQUAINTANCES</pattern>
  <template>
    <srai>FOAF KNOWN BY WHICH ERIC MILLER</srai>
    <srai>FOAF KNOWS WHICH ERIC MILLER</srai>
  </template>
</category>
Notice in the last <category> element that we decided to combine the lists if the user asks for Eric Miller's acquaintances. In this case, we've assumed that Eric is somehow acquainted with those who claim to know him, even if he didn't list them in his FOAF profile. Also notice that we didn't provide any hints for the user to select any of the names listed. This is because names are relatively easy to enter. This also reduces the number of <category> elements that need to be created to handle the entire list of people.

foaf:interest

What are Eric Miller's interests? (and Who shares an interest with Eric Miller?) Shared interests are another area where groups of people can be formed. The topics or subjects of interest could be included as part of a larger taxonomy that could be used to define broader or narrower areas of interest.

To build the pattern, we need to query the FOAF entry for the person identified by the URI "http://www.w3.org/People/EM/contact#me" and retrieve all the <foaf:interest> and <foaf:topic-interest> elements defined along with their respective titles or names. We will also query for the <foaf:name> element. Based on the example above, the query should return five resources.

To build the converse pattern, we need to query for all the instances of the <foaf:interest> and <foaf:topic-interest> elements along with their respective titles or names. For each item of interest we will query for all persons who have stated an interest in the topic.

<category>
  <pattern>FOAF INTEREST HTTP WWW W3 ORG PEOPLE EM CONTACT ME</pattern>
  <template>
    <set name="he">
      <set name="topic">Eric Miller</set>
    </set> expresses interest in the following subjects: RDF Site Summary (RSS),
      Dublin Core Metadata Initiative, Semantic Web, Resource Description Framework
      (RDF), and Extensible Markup Language (XML).</template>
</category>

<category>
  <pattern>FOAF INTEREST IN HTTP WWW W3 ORG 2001 SW</pattern>
  <template>The following people expressed an interest in
    <set name="it">
      <set name="topic">Semantic Web</set>
    </set>: Eric Miller, Dan Brickley, Stu Weibel, Tim Berners-Lee, Eric Freese,
      Bernard Vatant and Uche Ogbuji.
  </template>
</category>

<category>
  <pattern>FOAF INTEREST WHICH ERIC MILLER</pattern>
  <template>
    <srai>FOAF INTEREST HTTP WWW W3 ORG PEOPLE EM CONTACT ME</srai>
  </template>
</category>

<category>
  <pattern>WHAT ARE ERIC MILLERS INTERESTS</pattern>
  <template>
    <srai>FOAF INTEREST WHICH ERIC MILLER</srai>
  </template>
</category>

<category>
  <pattern>WHO IS INTERESTED IN THE SEMANTIC WEB</pattern>
  <template>
    <srai>FOAF INTEREST IN HTTP WWW W3 ORG 2001 SW</srai>
  </template>
</category>
It should be considered that the lists of names could possibly become very large, which will present a challenge in the user experience for this information. Notice in the last <category> that we do not set up the check for which "Semantic Web". It is hoped that subjects with the same name are actually somehow related. This could be differentiated based on the URI if it were found that the assumption is incorrect.

Scalability

As can be seen, a single RDF statement can yield several AIML categories. This brings up the issue of scalability. Many AIML implementations attempt to load the entire brain into RAM each time they are started. This clearly will not work in this application. A persistent storage method is needed to manage and query the large number of AIML categories and to reduce the necessity of storing all the categories in RAM. A database will also provide query methods that will allow for rapid searching of the information. There are several large RDF datasets available and are discussed in the next section. The extension of AIMEE's brain to include these will demonstrate the scalability of the concept presented in this paper. Unfortunately, this testing is not complete at the time of this writing. It is hoped that this issue can be addressed when the paper is presented in Montreal.

The Linking Open Data Project

The Open Data Movement aims at making data freely available to everyone. There are already various interesting open data sources available on the Web. Examples include Wikipedia, Wikibooks, Geonames, Musicbrainz, Wordnet, the DBLP bibliography and many more which are published under Creative Commons or Talis licenses.

The goal of the Linking Open Data project is to build a data commons by making various open data sources available on the Web as RDF and by setting RDF links between data items from different data sources. Collectively, the published datasets currently consist of over one billion RDF triples, which are interlinked by 120,000 RDF links. Current candidate datasets include:

  • DBLP - computer science bibliography
  • DBpedia - structured information from Wikipedia
  • DBtune, Jamendo - Creative Commons music repositories
  • Geonames - world-wide geographical database
  • Musicbrainz - music and artist database
  • Project Gutenberg - literary works in the public domain
  • Revyu - community reviews about anything
  • RDF Book Mashup - Books from the Amazon API
  • US Census Data - statistical information about the U.S.
  • World Factbook - country statistics, compiled by the CIA

RDF links enable you to navigate from a data item within one data source to related data items within other sources using a Semantic Web browser. RDF links can also be followed by the crawlers of Semantic Web search engines, which may provide sophisticated search and query capabilities over crawled data. As query results are structured data and not just links to HTML pages, they can be used within other applications.

There are already some data publishing efforts. Examples include the DBpedia.org project, the Geonames Ontology, the D2R Server publishing the DBLP bibliography and the DBtune music server. There are also initial efforts to interlink these data sources. For instance, the DBpedia RDF descriptions of cities includes <owl:sameAs> links to the Geonames data about the city. Another example is the RDF Book Mashup which links book authors to paper authors within the DBLP bibliography.

Conclusion

This project is demonstrating that the knowledge contained within an RDF dataset can be extracted into an AIML-based bot for natural language use by the general public. A methodology for extracting the knowledge has been discussed and demonstrated. The extraction methodology is also extensible as new patterns are discovered allowing the bot to gain additional "understanding". A demonstration system shows how these technologies can be connected successfully into a working application.


Bibliography

[AIML] Wallace, Richard S. AIML tutorial. http://www.pandorabots.com/pandora/pics/wallaceaimltutorial.html

[AJF] Faaborg, Alexander J. Leveraging Metadata for Natural Language Processing: Dublin Core XML to AIML Conversion. 2001.

[DCMI] Dublin Core Metadata Element Set, Version 1.1. 2006. http://www.dublincore.org/documents/dces/

[EDF] Freese, Eric. From Metadata to Personal Semantic Webs. 2006.

[RDF] RDF Primer. http://www.w3.org/TR/rdf-primer/

[RDF2] Tauberer, Joshua. What is RDF? 2006. http://www.xml.com/pub/a/2001/01/24/rdf.html

[RSW] Wallace, Richard S. The Elements of AIML Style. 2003.

[RSW2] Wallace, Richard S. Be Your Own Botmaster. 2005.

[TT] "Turing test" http://en.wikipedia.org/wiki/Turing_test



Enhancing AIML Bots using Semantic Web Technologies

Eric Freese [LexisNexis]
eric (dot) freese (at) lexisnexis (dot) com