The unsolvable identity problem

William Kent
billkent@bkent.net

Abstract

The identity problem is intractable. To shed light on the problem, which currently is a swirl of interlocking problems that tend to get tumbled together in any discussion, we separate out the various issues so they can be rationally addressed one at a time as much as possible. We explore various aspects of the problem, pick one aspect to focus on, pose an idealized theoretical solution, and then explore the factors rendering this solution impractical. The success of this endeavor depends on our agreement that the selected aspect is a good one to focus on, and that the idealized solution represents a desirable target to try to approximate as well as we can. If we achieve consensus here, then we at least have a unifying framework for coordinating the various partial solutions to fragments of the problem.

Keywords: Modeling

William Kent

William (Bill) Kent is the author of Data and Reality, perhaps the best book ever written about the concepts behind data modeling. Originally published in 1978, Data and Reality was republished in 1998 by 1stBooks.

In the preface to the second edition, Kent says: “Many texts and reference works are available to keep you on the leading edge of data processing technology. That’s not what this book is about. This book addresses timeless questions about how we as human beings perceive and process information about the world we operate in, and how we struggle to impose that view on our data processing machines. The concerns at this level are the same whether we use hierarchical, relational, or object-oriented information structures; whether we process data via punched-card machines or interactive graphic interfaces; whether we correspond by paper mail or e-mail; whether we shop from paper-based catalogs or the web. No matter what the technology, these underlying issues have to be understood. Failure to address these issues imperils the success of your application regardless of the tools you are using. … The scope of the book extends beyond computer technology. The questions aren’t so much about how we process data as about how we perceive reality, about the constructs and tactics we use to cope with complexity, ambiguity, incomplete information, mismatched viewpoints, and conflicting objectives.”

Bill Kent claims to have had several careers, including technical writing and research on object-oriented data models. He has worked at Hewlett-Packard and IBM, and has degrees in chemical engineering and math.

The unsolvable identity problem

William Kent

Extreme Markup Languages 2003® (Montréal, Québec)

Copyright © 2003 William Kent. Reproduced with permission.

Introduction and context

A good keynote speech should be like a bolt of lightning that powerfully illuminates the subject at hand (without being destructive).

The problem is, I can’t “identify” the subject at hand. So, I will focus on my own area of competence, and hope that it finds applicability to your problems. In particular, I will focus on the problem of identity in a computational context.

Objectives

We aren’t really going to prove the unsolvability of the identity problem in any formal way. Demonstrating its intractability is good enough for our purposes.

We do want to shed light on the problem, which currently is a swirl of interlocking problems that tend to get tumbled together in any discussion. One of our goals here is to separate out the various issues so they can be rationally addressed one at a time as much as possible — divide and conquer.

We will explore various aspects of the problem, pick one aspect to focus on, pose an idealized theoretical solution, and then explore the factors rendering this solution impractical. The success of this endeavor depends on your agreement that the selected aspect is a good one to focus on, and that the idealized solution represents a desirable target to try to approximate as well as we can. If we achieve consensus here, then we at least have a unifying framework for coordinating the various partial solutions to fragments of the problem.

My context

First of all, my background, so you’ll know where I’m coming from.

I retired from Hewlett-Packard almost four years ago (having previously worked at IBM), and my serious involvement with data processing ended a few years before that. My work focused on databases, data modeling, conceptual modeling, object systems, and standards. I’ve been on a number of committees, and I’ve written a book and numerous papers. I have no PhD and no formal education in computer science; my degrees are in chemical engineering and math.

I’ve never been in touch with GML or SGML. My closest contact was as a user of HTML. As far as I knew, markup languages dealt with text formatting.

Since being invited to give this keynote talk, my contacts have been limited to email exchanges with members of the program committee for this conference, and the documents cited in the references below. I haven’t even read all of those very thoroughly. That will serve as my excuse if anything I say seems inaccurate, or if you find that your perspective is not being represented.

What I (don’t) know about XML facilities

The heart of the XML paradigm seems to be the insertion of tags into textual documents on computer media. Such tagging provides access to information in tagged documents and pointers to untagged documents. Document tagging is supplemented with some facilities for accessing information in untagged documents via intermediaries, such as tagged descriptions or abstracts of untagged documents as well as mappings to databases.

In the usual spectrum that runs from the “real world” on top down to hardware at the bottom, somewhere below XML is a notion of architecture, as reflected in REST [16]. Also somewhere below XML are all the actual data sources on computer media, such as documents and databases.

Immediately above XML seem to be schemas and DTDs (which I only recently learned means “Document Type Definitions”), which describe the structures of documents.

Around the same level, or maybe just slightly above, are query languages. These enable retrieval of information from the underlying data sources, and perhaps also update of that information.

At a somewhat higher level are a host of facilities for managing semantics, the connection to the “real world”. In this area I’ve seen mention of such things as topic maps, scopes, themes, name spaces, user profiles, standard vocabularies, schemas, and perhaps other things as well. Existing database facilities such as schema definitions are also involved.

Don’t attach great significance to the ordering in this spectrum. It’s just an intuitive way of trying to get some perspective.

What I don’t see is which of these facilities are formally part of the XML picture, how they interrelate to each other, and how they all interact in managing information semantics. Even REST, at the architectural level, has things to say about identifiers. One paper at this conference talks about constraints in XML. And what I’ve seen of topic maps seems to incorporate semantic specifications as well as specific data.

So, as I said, I’ll just pay attention to the simple matter of identity.

The “real world”

The information we are managing is by no means limited to the “real world”, whatever that may be. The information may be historical (no longer true), speculative and predictive (not yet true), opinionated or subjective (true to some people, not to others), and even fictional or false (not true at all).

Cyberspace, domains, the universe, and ice-nine

We are talking about information management within computer facilities. Cyberspace is our overarching context.

Our territory is expanding from structured and carefully managed databases and files to the broad field of documents, whatever that may mean. We live in a climate of increasing globalization. The Internet paradigm shift is moving us all into one huge cyberspace.

It thus seems natural to feel that we are heading toward a time when anybody can provide and retrieve any information anywhere in this cyberspace, subject of course to appropriate access rights [04].

This is obviously unmanageable as one huge monolith.

Let me specialize the term “domain”, since you have pre-empted similar terms such as “scope”, “subject”, and “topic” to have specialized meanings.

By domains I mean various ways of grouping information, such as by enterprise organizational units (agency, project, department, division, company, college, university, etc.), by business (pharmaceuticals, medicine, printers, clothing, education, etc.), by subject matter (physics, astronomy, astrology, poetry, history, literature, etc.), by application (whatever that means), by document (whatever that means) such as one book or one database, by users’ areas of interest, by geographical units such as countries and cities, by cultures, and other kinds as well.

There are also subtle linguistic domains, as between American English and English English, e.g., “the committee has” vs. “the committee have”. Do you know what “napkin” means, or used to mean, in England? BTW, what does “English” mean? And how many people know what BTW means?

Even worse are regional variants within a language. In Prodigal Summer, Barbara Kingsolver observes that in Appalachia, “I don’t care to” means “I don’t mind” rather than “I dislike”. People in the Midwest often say “I miss not having something” to mean they miss having it. What do “dinner” and “supper” mean to you? Is a spider a skillet?

Combinations of domains can proliferate. The union of any domains can be a domain. We could define Universe to mean the union of all domains. Thus, the terms “global”, “Universe”, and “cyberspace” might be fairly synonymous.

To be a little more precise, cyberspace and the “real world” overlap, since cyberspace contains information about itself (Section 2-8-2). Furthermore, the boundary between cyberspace and the “real world” is blurry because some of the information we deal with is contained in documents we can only point to, such as text, maps, photographs, and other graphics which are not on computer media. We might wish to speak of an “information space” which includes cyberspace as well as such external documents.

You obviously have recognized the phenomenon of multiple domains and their interrelationships. A good start is made in Marc de Graauw’s “Business Maps: Topic Maps Go B2B” [13].

You recognize the impossibility of managing a single Universe within which interrelationships among all other domains can be managed consistently and efficiently. Your general paradigm seems to be the development of semantic tools within a single domain, then developing means for coordinating across a few domains at a time. You do recognize most of the specific problems involved in coordinating domains. It’s like crystallization starting at a multitude of local sites, extending slowly outward from each site.

Email from Mike Sperberg-McQueen 4/7/03:

But the first requirement of allowing different communities to develop is to ensure that they can develop in relative isolation from each other; at the syntactic level, this means ensuring that markup defined by one community can be distinguished mechanically from markup defined by a different community — i.e., that name collisions can be avoided.

I don’t know if you fully realize the complexity of the interrelationships among domains. They overlap in all sorts of ways, including time-varying partial overlaps as well as the inclusion of one domain within another. For example, if a name space is defined for some domain, does it automatically apply to its subdomains? How would it interact with another name space defined in a subdomain?

Even within the scope of your own work, there is serious ambivalence around such terms as reference, resource, document, identifier, name, topic, subject, entity (I’m sure you can all extend this list). And any of those terms might be used both in a specialized technical sense and in the ordinary natural-language sense within the same document.

You’re going to wind up with such a heap of interlocking domains that overlap and subsume others. You’re going to have to deal with a mess of mappings, constraints, inheritance rules and exceptions. How are you going to keep them consistent? All of that is likely to become a major problem.

And a given user is likely to have various user profiles in various situations. The several profiles of a given user will have the same sorts of interactions with each other, not to mention the interaction of all those profiles with all those other domains.

I hope you don’t harbor an expectation that the semantics of all these domains will crystallize like ice-nine (Vonnegut: Cat’s Cradle) into a single crystalline whole without fracture lines. Our Universe is more like sea ice, continually breaking up and reassembling itself in big lumpy pieces. We’re not going to skate over it smoothly.

Global and local solutions

Various localized solutions exist. In some domains, employee number or social security number or part number serve as adequate identifiers. Such localized solutions require some means of knowing which kind of identifier it is. Real problems arise from trying to coordinate across multiple domains.

Our attention here goes more toward global solutions that can work across all domains. By “global” we might mean the Universe, i.e., the union of all domains (Section 1-5 above).

The identity of the problem

The first problem of identity is the identity of the problem. What is the problem we’re trying to solve? What’s the key question?

The identity problem arises in various computer situations. Sometimes we want to know how to “identify” something, i.e., specify some thing we are interested in. Sometimes we want to identify a link in a chain: to determine if George is David’s uncle, we need to ask whether any of David’s parents is the same as any of George’s siblings. Sometimes we want to supply information about a thing, and we need to determine whether that thing is already in the information system, i.e., avoid introduction of duplicates.

What is this?

Sometimes by “identity” we simply mean “what is this?”.

Somehow we’ve managed to point to something, and we want to know what it is. You’d have to be a mind-reader to answer that properly. We’d rarely be satisfied to get back just a part number or social security number. Do you want to know what type of thing it is (assuming that we’ve even agreed on the meaning of “type” [26])? Do you allow that it might have multiple types? Do you want to know something about its properties? Do you want to know what roles it plays in relation to other things?

Clearly,

(1) What is this?

is not a very useful statement of our key question about identity.

What do I want?

Another possibility is:

(2) What can I specify in a request to get what I want?

That seems too vague. It includes all sorts of names, expressions, queries, function calls, and even search engine keyword phrases, as well as progressive refinement of searches.

That’s not a very crisp key question about identity.

But, as near as I can understand, that seems to be the notion of identity presented in REST [16]. Maybe that’s another topic for debate.

Is it the same thing? or identical?

Another possibility:

(3) How can we tell if two things are the same?

Big fat ambiguities here. A guy gets a hamburger at a lunch counter. The guy next to him says, “Hey, that looks good. I’ll have the same thing.” So the waitress slides over the first guy’s plate.

“Same thing” and “identical thing” don’t always mean the same thing. “Same” doesn’t always mean the same thing, nor does “identical”. Identical twins are not the same object. Is this pencil the same as that one? If you and I are each eating hamburgers, are we eating the same thing?

Are they one and the same?

Let’s resolve that ambiguity by specializing identical to mean “one and the same thing” (Section 3-3-1).

In terms of classical set theory, this means that a set constructed of identical things has a cardinality of one, i.e., the set contains one member.

This refines our question to:

(4) How can we tell if two things are identical?

Careful, now. There’s a grammatical absurdity there. If they are two things, can they possibly be one and the same thing? In fact, what did the plural mean when we said we constructed a set of “identical things”?

Is it the same to you?

On the other hand, maybe it’s not absurd. Maybe it isn’t objectively decidable whether we are talking about one and the same thing. Maybe it depends on roles, or points of view. There are situations in which what appears to be one thing from one viewpoint might be several things from another.

Even the notion of being embodied in the same physical object doesn’t assure “oneness”.

Who am I? George says I’m his brother; David says I’m his father. Sometimes roles matter more than physical objects. If an employee holds two jobs with the same company, shouldn’t he count as two employees (maybe with two employee numbers)? Representatives of several groups at a meeting might incidentally be the same person. The distinction matters depending on whether we are counting chairs or votes. “Embodied in same person as” might be a legitimate relationship among distinct entities.

That leads to a much more complex, situational concept of identity. The question now becomes:

(5) Should these things be considered identical in this situation?

I will ignore that question. Would you?

Does identity depend on time? on the properties of a thing?

(6) What else might determine identity?

There are other notions of identity as well. Sometimes it is tied to some notion of “inherent qualities”, and their change over time. Am I the man I used to be?

Some people even argue that identity by definition is time-dependent. A thing at one point in time is not “the same thing” at another point in time. “Is that the same person in these pictures?” “No, one’s a six-year-old boy and the other’s a 30-year-old man.” You’re entitled to your opinion.

A related question can also be troublesome for us: How much can a thing change and still be the same thing? How many parts can we swap between two cars before one becomes the other?

Identities also evolve in other ways. The murderer turns out to be the butler. Countries, cities, and corporations change their names, merge and split.

Even if you have uniquely identified the city in which a person was born, can you then deduce the country in which he was born? The country may have changed its name, or split, or merged with another, or a border may have shifted to the other side of the city.

Does an Agilent employee work for the same company, or hold the same job, as before Agilent’s spinoff from Hewlett-Packard? Can he have more years of service than the age of his company?

What’s the “identity” of a share of stock before and after a stock split (e.g., for tax purposes: how long have you held the stock)?

From Steve Newcomb’s bio:

…GCA/IDEAlliance “Extreme Markup Languages” annual conference series. This conference series began in 1994 under the name “International HyTime Conference”, and it has been held every year since then. The name changed to “Metastructures” in 1998, and, when it was merged with the GCA Markup Technologies conference series in 2000, the name was changed to its current name, “Extreme Markup Languages”.

From Tommie Usdin’s 6/16 email to me:

IDEAlliance used to be GCA (the Graphic Communications Association). GCA was a subdivision, and then a spin-off of Printing Industries of America, a trade association. GCA was started by typesetters (and printers) who were looking for more efficient ways to typeset material that was also to be used for other purposes. In 1970 or thereabouts the GenCode Committee of the GCA set out to create a list of all of the generic tags that anyone would use in typesetting — to standardize generic tagging. They came to the conclusion that they had set out to make a list of everything anyone would ever be interested in, and that was impossible. SGML grew out of the work of this committee.

Consequences are that we may have to accept that the notion of which things are identical may vary with time.

Problems of institutional security extend the problem in another direction, incorporating confidence levels. How similar are these things? How confident are we that they are the same?

I will also ignore such questions. Would you?

Are we referring to the same thing?

Let’s return to question (4), “How can we tell if two things are identical?”, and try to cast it in a more sensible form. Let’s assume that at any point in time we can definitely decide whether two things are one and the same. Oops, that’s still absurd: two things are one? It doesn’t make sense to talk about the things. It makes more sense to talk about references to things.

A reference is loosely meant to include names (of any sort, including identifiers), variables, expressions, queries, or anything else intended to denote one particular thing at some particular time (we now ignore references to collections of things, unless we want to treat a collection as an identifiable thing in itself). There is a sense in which a reference “returns” or “evaluates to” a thing. Then the identity question can be put as:

(7) Do two references refer to one and the same thing?

We’ll return to that in Section 3-3.

What’s identifiable?

A rich source of debate. The following examples cover a broad spectrum from the reasonable to the unreasonable. Do we all agree on where to draw the line? What happens if the line is drawn in many different places in many different domains?

Individuals

The traditional no-brainers: employees, departments, companies, products, customers, buildings, books, countries, cities, streets…

No-brainers??? There’s grief enough around even these simple identifiable things [17]. But that’s not our topic for today.

Computer stuff

Just so we don’t forget, computer facilities are themselves part of our Universe. Cyberspace contains information about itself. Somebody cares about identifying bits of hardware and their interconnections, disks, tracks, sectors, transactions, sessions, packets, routes, log files, lock files, recoverable “deleted” files, other files, buffers, error occurrences and other events, software elements (source and compiled, in granularities from application suites down to statements and variables)…

Data values?

There’s a common and very ill-defined notion that attributes and relationships are different, and that things and data values are different [28]. The ideas are connected. Relationships are connections among things, while attributes connect things with data values. There are secondary issues as to whether relationships are binary or nary (i.e., can a relationship connect more than two things?), and whether attributes can be multi-valued (i.e., can they be a list or set of things?). If you want to make such distinctions, this is another rich and time-honored area of disagreement.

The best I can do here is to illustrate by example. “John is married to Mary” is generally considered to be a relationship between John and Mary, while “John’s eyes are blue” is generally considered to be an attribute of John.

At the core of the distinction is the question of whether data values represent things. If Johann’s eyes are blau, and Jean’s eyes are bleu, do they all have the same eye color? As data values, they are different character strings that don’t match. Alternatively, we could recognize that the color is a single underlying identifiable thing (concept, topic) which has various names in various languages.

The same thing arises with dates. Is a date a formatted character string (with simple mappings, e.g., mmddyy ↔ yyyymmdd), or a representation-independent reference to a point or interval of time, interpretable in any calendar system? Is Date a topic type?

Will a query seeking the events that occurred on a specified date find events recorded according to the Hebrew or Chinese calendars? Remember, we are in global cyberspace.

There’s a secondary issue around date and time. Are they expressed locally or universally, i.e., in terms of local time zones or GMT? Do we assume that people born on the same date (and perhaps even at the same time) were born simultaneously? The same date and time in different locations might not occur simultaneously. Did you ever arrive at your destination before you started out?

Postal addresses are another thorny area. An addressable location is an identifiable thing, whose “name” can be spelled many different ways. And, once in a while, the address of a location does change.

The “data value” question also arises with any other measured quantity. (Did you know that dates and times measure time intervals from certain starting points?) Will equal lengths be detected if they are expressed (“named”) in different units? What about rectangular and polar coordinates of the same point? What about GPS coordinates using different datums?

The question even arises with numbers themselves. Are numbers topics? Could 101 and 5 be different names for the same thing? Do you recognize the possibility that:

10 + 10 = 10 and 10 × 10 = 10?

Those equalities are true if we tag with appropriate number bases:

10<2> + 10<8> = 10<10> and 10<2> × 10<8> = 10<16>

In effect, “10” is serving as the name of many different numbers.

Quiz: do you recognize the following names for ten?

1010, 101, 12, 10, A, 00010000, X, 1E1

Further complications arise from inconsistent treatment of such matters in various domains. Databases are notoriously inconsistent in their treatment of such things as cities, countries, languages, skills, and even the identities of relatives. They might be treated either as data values or as identifiable things (e.g., identified by unique keys). Treating them as data values opens the door to all sorts of matching problems arising from different spellings, abbreviations, misspellings, synonyms, etc. [23], [25].

Do you agree these are identity issues? Another point of divided opinion.

These were quite trivial and ignorable concerns in the old days, when most domains consisted of one or a few closely managed databases. Textual information was entered in the same language, specialized terminology was locally understood, measurement data was entered in manageable units (application programmers were supposed to know when conversions were needed), dates were all entered according to the local calendar, and times were according to the local time zone. Information validity was much more closely screened than in newspapers or magazines. Errors were corrected in situ, immediately visible in future data retrieval — unlike belated errata and retractions, which don’t amend the original and often go unnoticed.

It will matter more in the global Universe.

Aggregates?

Are aggregates (sets, bags, lists, arrays, matrices, etc.) identifiable things?

In classical set theory, a set is defined by its extension (members), regardless of how the set expression is written. Thus, {x,y} and {y,x} are the same set. Furthermore, since classical sets don’t include duplicate members, if x and y are identical (Section 3-3-1 below), then these are also the same set as {x} and {y}.

In effect we have here one set denoted by four different names.

Suppose I asked you to write down the list of program committee members for this conference. What are the chances you will all write them in the same order? Or spell and punctuate their names the same? We are likely to see many “names” for that one list.

Do you agree this is an identity issue? Another point of divided opinion.

Unreals?

Trolls, leprechauns, gnomes, elves, fairies, wizards, witches, goblins, hobgoblins, dragons, gods, spirits, Lilliputians, mermaids, and all the other inhabitants of fiction and dreams. Does the troll that attacked Hermione have an identity?

UFOs, extraterrestrials … (Of course we can argue if these are unreal.)

Indistinguishables?

All flora and fauna. Rocks. Plots of land. Snowflakes. Dinosaurs. Bones and bone fragments. Pottery and shards. Pencils. Sheets of paper. Pieces of currency…

Somebody somewhere cares. Somebody in some laboratory is tagging individual ants and recording their behavior. For trial evidence, people are tagging and distinguishing drops of blood and saliva, fingernail scrapings, fragments of glass … Life and death could hang in the balance.

Other stuff?

Dreams, desires, goals, potential, disappointments, skills, personality traits, success and its criteria …

Somebody somewhere cares about identifying these things, too. Anything anyone might study or write about might be identifiable.

Any word whatsoever?

Similar considerations apply to the meanings of any word. There are synonyms within a single language, and translation equivalents in other languages. There’s also the ambiguity problem: which possible meaning is intended by a particular occurrence? Words even get redefined within the scope (oops, non-technical sense) of a document or other domain.

In all of these cases, there is a single identifiable concept denoted by various names, or a single ambiguous name which might denote various concepts. In fields involving language translation, a plausible paradigm would be to assign identifiers to concepts and let words be names for the concepts.

The email that Mike Sperberg-McQueen sent me on April 7, 2003 is a gold-mine of challenges (opportunities?). Here’s an excerpt:

To take a simple example: suppose that for whatever reason I want to talk about graph theory. I define ‘arc’, ‘node’, ‘graph’, ‘simple graph’, ‘complex graph’, ‘tree’, and so on. Meanwhile, somewhere in France at one or the other CNRS laboratory, someone is doing the same thing. Even supposing that they define the terms in English and that they use the terms ‘arc’, ‘node’, etc., instead of ‘edge’, ‘vertex’, etc., what are the chances that a system will know that my arc is the same as the French researcher’s arc? my node the same as hers?

Now suppose they have actually used different terminology — graph theory is notorious for its practitioners’ inability to decide whether a graph can contain multiple arcs connecting a pair of nodes or not (if it can, a graph with no cycles is a simple graph; if it can, a graph with cycles is a complex graph, or something else). However small the chances were before, they are smaller now.

Now suppose that I have used one definition of ‘tree’ and the French researcher a different definition. There are several definitions of ‘tree’ in graph theory, and one of the exercises set for first-year courses is to prove that all of them are equivalent.

And now suppose that (in the course, perhaps, of a study of graph-theory pedagogy) I have chosen to create distinct identifiers for the different definitions of ‘tree’, so that I can trace who says what where about the different definitions.

Now let us suppose the existence of a third party interested in graph theory and in using RDF, or topic maps, to represent propositions about it. And let us suppose that, chastened by warnings about the need to use standard identifiers for standard concepts, in order not to stand in the way of later data integration, this third party does what neither I nor the French team did: they actually look to see whether someone has defined identifiers for the requisite concepts before they set about their business. What is the likelihood that they, or any system which tries to integrate the different data sources, are going to do something consist and useful when asked to distinguish among:

tree_1: a connected forest
tree_2: a connected graph with no cycles
tree_3: a graph with no cycles and n-1 edges (where n is the number of vertices)
tree_4: a connected graph with n-1 edges
tree_5: a connected graph such that each edge is a bridge
tree_6: a connected graph such that any two vertices are connected by exactly one path
tree_7: a graph with no cycles, such that adding any new edge creates exactly one cycle

Now, let us suppose that my actual interests lie with more complex concepts with slightly less clear-cut definitions. I wish to refer to the fact that the right to speak Spanish and to transact government business in Spanish was guaranteed to the people of New Mexico, at least for a period of 20 years, by the treaty of Guadalupe Hidalgo in 1848. How do I refer to the treaty and its provisions, in such a way as to make the reference distinct from a reference to, say, the copy of the treaty residing in the National Archives of the U.S., or the piece of parchment on which that copy was engrossed? Steve tried to explain this to me once, but did not manage to persuade me that there is a reliable answer.

The experience of institutions which use controlled vocabularies for whatever purpose is, to put it bluntly, not encouraging. I would summarize it this way: if it matters enough to someone, controlled vocabularies can improve their situation enough to be worth using, i.e., to be better than the alternative. But even substantial effort will only reduce, not eliminate, false agreement among things which ought in theory to be distinct or false disagreement among things which ought to be the same. Inter-indexer consistency in the application of controlled vocabularies for subject indexing, for example, is unlikely to be 100%, no matter how it’s measured.

All systems which rely on resource identity seem to me to be controlled vocabularies.

Then there’s this from an email Debbie Lapeyre sent me on June 6th:

3) What is XML?

People who say the word “XML” mean several incompatible things:

  1. XML the Data Format (and data model(s))
  2. XML the Data Vocabulary (these are also called “languages”)
    XML schemas define tags sets and tag relationships. What relationships can be defined depends on which schema language is specified. A set of tags, data models, and business rules is called a “language” (sorry about that) or a vocabulary. Folks write a vocabulary and use it to markup their data.
    There are 100s of these. The eBusiness/eCommerce folks have tags for invoices, quantity-ordered, and trust-levels. The semiconductor people have tags for characteristics, conditions, and part-number. The pharmaceutical folks have tags for dosage, side-effect, and drug-name. The STM journal publishers have tags for genus-species, footnote, and authors-affiliation. The recipe folks have tags for ingredient, cooking-time, and cooking-method. The smoke stack emissions folks have tags for CO2-level, mean-temperature, and particle-size. The travel agents have smoking-room-preference, elite-membership-status, and surname.
  3. XML the Application — use of tags, vocabularies, and supporting software to make SOMETHING happen: publishing a webpage, drawing vector graphics on a screen, finding all the documents that support my argument, carrying a phone message to another place, facilitating a data exchange between two unlike software packages or applications.
  4. XML the All-Encompassing Technology. Around the original XML recommendation (which is just the data format and one schema language no explicit data model) has grown a monster community of standards:
    • alternate schema languages: W3C XML schema, RELAX NG, XDR, etc.
    • hypertext markup conventions: XLink, XPointer, XPath
    • querying languages (SQL equivalents for XML)
    • protocols for message exchange, metadata definition exchange, etc.
    • formalized data models
    • infrastructure practice standards
    • APIs

    When some folks say “XML”, they mean all of this, which they insist is necessary for the development of interoperable applications.
  5. I’ve probably missed a few.

The problem even extends to XML tags. An email from Jim Mason April 8th:

I’m still trying to solve problems for publishing documents, but life has gotten more complex. SGML/XML allows users to define their own descriptive markup. So I have two copies of the same document, marked up by different users at different times. It’s easy for me to translate the other fellow’s <para> to my <p>, but what do I do with the fact that he and I see the hierarchy of document structures differently: he encapsulates groups of related units differently from me. (For example, is a list at the end of a paragraph part of it or a free-standing unit.) So I’m taking trees apart and reconstructing them. He’s just interested in typesetting his version. I’m mostly concerned with maintaining mine in a web of dependencies (if this document changes, I have to change a lot of others that refer to it as an authority).

Where do we draw the line between the identity problem and the generalized dictionary problem?

Harry Potter

Consider Harry Potter and the Sorcerer’s Stone.

  • That in itself is a single identifiable concept or topic. It is “one thing” written by J.K. Rowling.
  • There are “the” book and “the” movie, two more distinct identifiable topics.
  • There is the book in hardcover and paperback.
  • There are translations of the book into various languages.
  • There are various abridgements and condensed versions published.
  • There could be revised editions in the future.
  • There were various drafts before publication.
  • If the books were shorter, the book could be incorporated into a single collection of the Harry Potter books bound in one cover.
    • Do you allow for a book within a book?
    • What would a <book_title> mean in this document?
    • If I’m holding it in my hand, how many books am I holding?
  • The book may be made available on tape and/or CD, perhaps with various narrators and different abridgements. Certainly in different languages.
  • The movie may have several versions: director’s cut, edited for TV, maybe future remakes, not to mention subtitling or dubbing in various languages.
  • The movie may be on film, videotape, or DVD, perhaps packaged with different sets of bonus features.
  • It may get turned into one or more stage plays, operas, TV programs. All with scripts of their own. And videotapes of them.
  • There are multiple physical copies of most of these things, including digital copies in many computers.
    • If I’m holding three copies of it, how many books am I holding?

This “one book” may have literally millions (billions?) of identifiable manifestations. Somebody cares about each one of them. How can they each be identified to everyone’s satisfaction?

We don’t address such problems of versions, copies, etc. in our treatment of the identity problem. Treatments are likely to differ in various domains.

That’s yet another obstacle to finding a universal solution to the identity problem.

Is the key question none of the above?

Have I still not asked the right question? Maybe you think the identity problem is concerned with none of the above. So it goes.

Do you know the answers?

Even if you believe you know the right questions and the right answers, do you think your neighbor agrees?

Do you care?

You might not care. You might not be concerned with the Holy Grail of a universal identity theory. I hope you still get some good insights about the identity problem from this paper.

The unsolvable identity problem

The identity of the problem

Why is the identity problem unsolvable? To begin with, as just shown, we don’t agree on what the identity problem is. That’s not unusual. According to Kent’s Law, the experts in topic X can’t agree on a definition of X. In some sense, while “the identity problem” is a singular identifiable topic, it has a multitude of real manifestations (versions?) that are not identical.

Therefore, it’s quite unlikely that there exists a single perfect solution to all these variants of the problem. That’s sort of a cheap trick argument, but it’s a start.

The unsolvability of one identity problem

We can describe one idealized solution that we all seem to be trying to approximate, based on one particular key question as the problem to be solved. Then we can show that it is theoretically impossible to achieve this solution. While there may conceivably exist other theoretical ideal solutions, I can’t imagine one, unless it solves a different problem. Then we’ll examine practical limitations.

An idealized solution

Let me describe one idealized solution [19], addressing the problem described by question (7), “Do two references refer to one and the same thing?”, in Section 2-7 above:

(7) Do two references refer to one and the same thing?

The solution involves references (Section 2-7 above), an Identical predicate (Section 3-3-1), and globally unique and singular identifiers (Section 3-3-3).

Since we are operating inside a computational system, we don’t usually expect a reference to materialize the real thing, but rather some computational surrogate for the real thing. The real thing might be materialized if it’s inside the computational system (Section 2-8-2 above).

The “Identical” predicate

To formalize the identity notion, we introduce an abstract Boolean Identical predicate, such that

Identical(Reference1, Reference2)
is either true or false. We redefine the symbol ≡ to stand for this predicate, i.e.,
Reference1≡Reference2
means:
Identical(Reference1, Reference2)

Language still gets in our way. We will speak of one reference being identical to another, meaning that they evaluate to the same thing. The references themselves may appear different.

The key question for the identity problem now becomes:

(8) When is it true that Identical(Reference1, Reference2), i.e., Reference1≡Reference2?

This treatment crystallizes the identity question in a rather formal way. If we are to have a consensus solution to the identity problem, we must agree on how the Identical predicate works. To some extent this is open to debate (another question we might disagree on), but we will constrain its behavior somewhat.

Whatever else we may disagree about, I hope we agree that identity should at least have the following characteristics…

Equivalence

To begin with, Identical is an equivalence relation:

  • Reflexivity (a reference is identical to itself):
    Reference1≡Reference1
  • Symmetry (if one reference is identical to another, the second is identical to the first):
    Reference1≡Reference2 ⇒ Reference2≡Reference1
  • Transitivity (references identical to the same reference are identical to each other):
    ( Reference1≡Reference2 ∧ Reference2≡Reference3 ) ⇒ Reference1≡Reference3

Cardinality

We also impose a counting constraint insuring that identical references refer to one thing. Using the usual meaning of {} in set theory as the constructor of a set without duplicates, we have (Section 2-8-4)

Reference1≡Reference2 ⇒ {Reference1, Reference2} ≡ {Reference2, Reference1} ≡{Reference1} ≡ {Reference2}

The cardinality of this set is one.

A simple illustration of what’s happening here:

{1+1, 2} ≡ {2, 1+1} ≡ {1+1} ≡ {2}

In effect, we have one set as an underlying abstract topic having a multitude of names.

By the way, this counting constraint could be circular. The behavior of the {} constructor, i.e., the elimination of duplicates, may well rely on how the Identical predicate is defined.

Behavior (substitutability)

Another constraint is that identical things (I use the phrase loosely) should behave the same. This is sometimes called “substitutability”. Let’s use “function(x)” to mean any sort of command or request that induces some behavior relative to x, such as returning some of its properties. We should require that:

Reference1≡Reference2 ⇒ f(Reference1)=f(Reference2) for any function f

Here we mean more than simple equality of returned values. We also mean the same behavior in all other respects. Note that the phrase “for any function f” gets us into second order logic.

The converse doesn’t always hold, since we are at best talking about behaviors and responses in the computational system. The fact that things behave “identically” doesn’t really prove they are identical. We know that two pencils are distinct because they are in different locations, but we probably haven’t recorded the locations of pencils. Thus, things may behave identically in a computational sense, and still not be the same thing.

Identification strategies

One approach to determining whether Reference1≡Referencen is to exploit the transitivity property and seek a chain of intermediate references such that:

Reference1≡Reference2 ∧ Reference2≡Reference3 ∧ … ∧ Referencen-1≡Referencen

For example, suppose that Reference1 returned a social security number and Reference5 returned an employee number. We could establish the equivalence Reference1≡Reference5 if we can find references such that:

  • Reference2 returned the passport number of someone having the given social security number,
  • Reference3 returned the driver’s license number of someone having the given passport number,
  • Reference4 returned the employee number of someone having the given driver’s license number, and
  • That employee number matched Reference5.
(Actually, it would be more precise to illustrate this as a chain of nested function calls, but the principle is there.)

Such a strategy might be described as point-to-point path finding. It seems quite impractical in the general case.

An alternative strategy would be based on globally unique and singular identifiers (GUSIs) (Section 3-3-3). Each reference would be resolved to a GUSI, if possible, and the references are equivalent if the GUSIs match. This generally reduces the transitivity paths to be explored.

That strategy might be described as a hub-and-spoke algorithm.

There can be hybrid strategies in which certain identifiers are designated as local “hubs” in the search path, perhaps then requiring hub-to-hub equivalences.

Such identification strategies would have to be incorporated into the definition of the Identical predicate.

Globally unique and singular identifiers

This is a generalized approach to such notions as object identifiers, base names, etc.

We define GUSIs [globally unique and singular identifiers] as a class of computational surrogates that can be placed in one-to-one correspondence with things they denote. By definition, there are enough of them to denote all things that will be identified. Alternatively, we say that the number of things identifiable by GUSIs is limited to the number of GUSIs.

In today’s technology, computational surrogates and hence GUSIs ultimately reduce to finite bit strings. We assume some contextual means of determining when a bit string is intended to serve as a GUSI, somewhat analogous to the notion of data type or field name.

Globality means that a GUSI is recognized throughout the Universe. Uniqueness means that different things cannot be denoted by the same GUSI. Singularity means that a thing cannot be denoted by more than one GUSI. (In contrast, social security numbers are unique but not singular, since a person may have more than one of them.)

When a thing is denoted by a GUSI, that GUSI represents its identity. Identity can be based on GUSIs. If two references return the same GUSI, they refer to one and the same thing.

A reference might be a GUSI, it might return a GUSI, or it might be found identical to a GUSI — or none of these may occur. We don’t assume that every identifiable thing has a corresponding GUSI.

Let’s use GetGUSI to be an operator (function) that returns the GUSI associated with a reference, if it has one. Thus, GetGUSI(Referencei) returns at most one GUSI. If the reference is itself a GUSI, it requires no further evaluation:

GetGUSI(GUSIi) = GUSIi

Its implementation is likely to involve path-following (Section 3-3-2) until a GUSI is reached, as well as a decision process for detecting when such a GUSI does not exist.

The core identity proposition

The core identity proposition is that, if the GUSIs exist, then two references are identical if and only if the GUSIs match:

(∃GetGUSI(Reference1) ∧ ∃GetGUSI(Reference2) ) ⇒ GetGUSI(Reference1)=GetGUSI(Reference2) ⇔ Reference1≡Reference2
Thus, (if the GUSIs exist), references are identical if the GUSIs match, and the GUSIs must match if the references are identical.

That’s not the only mechanism for establishing identity. Whether or not GUSIs exist, Reference1≡Reference2 can be established by other means. That’s useful for things not identified by GUSIs. Thus,

Reference1≡Reference2 if GetGUSI(Reference1)=GetGUSI(Reference2) or some other way works (e.g., matching employee numbers).

Matching by employee number must thus be incorporated into the definition of the Identical predicate.

But remember that if things identified by employee numbers also have GUSIs, then things with matching employee numbers must have matching GUSIs.

The future of computational surrogates and GUSIs

A parenthetical digression.

Until fairly recently, our notion of computational surrogate was limited to some sort of linear string of text. Modern technology allows us to consider computational surrogates in the form of pictures, fingerprints, hand prints, voice prints, and retinal scans. Clearly the algorithm behind the Identical predicate gets more complex.

We could go way out on a limb and speculate that future systems might (re-)embrace non-digital technology, such as analog and even holographic techniques. This could relax some of the theoretical limitations of finiteness and countability (Section 4-1 below).

Why that idealized solution doesn’t work

Theoretical reasons

Some reasons why that solution doesn’t work depend on what you think should be identifiable (Section 2-8), and to what extent we rely on GUSIs for identification.

The set of potentially identifiable things may be uncountable (more infinite than the integers). It certainly is infinite (just consider the set of possible books, or possible musical compositions). And it is really larger than the largest numbers that can practically be represented in any computing system.

GUSIs, being finite strings over a finite alphabet, are countable. Hence GUSIs cannot identify uncountably many things.

GUSIs as defined are finite, but unbounded in length. Thus, there are in theory infinitely many GUSIs. In reality, at any point in time there will be some upper limit to the manageable lengths of GUSIs, making the set of GUSIs finite. Thus, GUSIs cannot identify an infinite number of things.

This isn’t really a proof of unsolvability. In point of fact, the set of things we wish to identify at any given moment is likely to be finite.

Pragmatic reasons

Management of GUSIs

Practical considerations may limit the reasonable lengths of GUSIs, further limiting the set of potentially identifiable things.

Maintaining global recognition, uniqueness and singularity of GUSIs is highly impractical. In effect, it requires a facility that behaves like a global omniscient naming authority.

The GetGUSI function can be a challenge. Even if GetGUSI(Referencei) exists, it may be quite hard to find.

Different domains may impose different length limits on GUSIs. They may also take different positions on the question of identifiable things (Section 2-8), as well as other critical issues such as management of versions and copies (Section 2-8-9). The obstacles get even worse if the various domains don’t interpret and apply various XML facilities in the same way (Section 1-3).

There is a question as to whether GUSIs can disappear, and how to propagate such disappearance if it is allowed. Also, if allowed, can a GUSI be reused? (This might be considered a theoretical problem.)

Such considerations limit the capacity of any identification system. I won’t elaborate; you know these things better than I do.

The Identical predicate

Managing the transitivity of the Identical predicate (Section 3-3-1-1) is another obstacle. Trying to determine the equivalence of two references may require searching the Universe for a third reference (or chain of references) to which they are both equivalent.

The Identical predicate in effect incorporates all identification heuristics. It may involve uncertainty and user intervention for confirmation. It could get arbitrarily complex, and might even be uncomputable.

The Identical predicate will rarely be invoked by a user directly. Most often it will occur as an internal step in some path following process, making decisions hidden from the user. Its correctness thus becomes a matter of much greater concern.

The absence of GUSIs or any other suitable identifiers

Existing databases may not get GUSIs incorporated. Most databases will have their own separate forms of identifiers, which may or may not be mappable to each other or to GUSIs.

Even worse, documents in the broad sense (books, magazine and newspaper articles, etc.) often won’t include unique identifiers of any kind.

“The TAO of Topic Maps” [07] suggests in Section 3-3-1 that “Puccini was born in Lucca” could be in a topic map. Such a fact could also appear in various documents or databases. There might be no further identification of “Puccini” or “Lucca”, or there might be some computationally undetectable elaboration elsewhere in the document.

Just for fun, I looked up “Puccini” in Earthlink’s White Pages, and found:

  • 30 Puccini’s in the New York City area
  • 21 in the San Francisco area
  • More than 250 in Italy, and
  • 21 in Lucca alone!
And this only includes living people currently living in these places with listed telephone numbers.

You might try this with “John Williams”. I know of at least three famous ones in the music domain alone.

I then played with city names in Google. Luckily, there seems to be only one Lucca. But what of cities like Springfield, Athens, Cairo? They might be mentioned in a document like a local newspaper, with no further identification.

And if we are just told the name of a person’s birthplace, the place name might designate a city, state, country, hospital, or something else.

Managing the evolution of identity

Evolution of identity (Section 2-6) poses real headaches. Distinct GUSIs may need to merge somehow, and one GUSI may need to be replaced by several. This is quite hard to manage, and much, much harder to manage consistently across multiple domains.

Semantic headaches

An ideal solution becomes even more remote if we consider the semantics of such issues such as:

  • Situational or relative identity (Section 2-5)
  • Evolution of identity (Section 2-6)
  • What’s identifiable (Section 2-8)
  • Versions and copies (Section 2-8-9)

Where the rubber meets the road

Maybe my concerns are too philosophical.

OMG [Object Management Group]has been highly successful in reaching its goals. That success was driven by giving priority to the need for a practical common interface allowing all the participating businesses to profitably sell software and hardware. Philosophy fell by the wayside. Though I was often a dissenter, in a pragmatic way I suspect they’re right. Deep and consistent semantic foundations may well be an illusory Holy Grail, a black hole, because there doesn’t seem to be any single universal philosophical foundation. That, too, was part of the message of Data and Reality.

I wrote this about X3H7, the ANSI/X3 Technical Committee dealing with Object Information Management:

Ideally, X3H7 should proceed serially, first learning the current state of object technology from the work of other groups and the published literature, then synthesizing it all into a coherent strategy and recommendations, and finally trying to guide other groups toward a common goal. In reality, we are constrained by the same accelerated time-to-market as everyone else, and so we are tackling all those stages simultaneously.

You need to be pragmatic about objectives. Basically we are all intent on capturing, maintaining, and delivering information cost-effectively (in the short and long run) in the context of computer capabilities and performance. We have conflicting criteria:

  • Do it efficiently and profitably — now.
  • Do it in a way that will survive the evolution of technology and life.

We are doomed to forever argue the tradeoffs. Those who do not learn from history…

So, how do we do the best we can within the constraints of reality? Stick to our principles, but accept compromise, constant evolution, occasional revolution, and sometimes disappointment.

Conclusions

A general unified theory of identity is elusive. It probably doesn’t exist. The main reasons:

  • The problem is not well defined.
  • There are theoretical and practical limitations to what can be achieved.
  • There are too many semantic issues.
  • There are too many domains. We can’t achieve a consistent solution across all of them.

But this quest for the Holy Grail is educational. We have identified a number of semantic issues that need to be considered in any solutions we implement. There are many, many more, but that’s beyond the scope of this paper.

So what do we do? Cope, as we always do. If there is no ideal solution, we develop solutions that are good enough. The trouble is that what’s good enough for you today isn’t good enough for me tomorrow. We are forever doomed to compromise, extend, patch and rework to make our good enough solutions a little better. We’ll never get it right. That’s life.

Human beings manage to cope somehow with imperfect identification schemes. Our computer systems might do no better than that.

The tone of this paper is not intended to be discouraging. The intent is to foster a state of mind, a wary humility about how far you can get with how much effort. There will always be booby-traps out there. It’s very close to the notion that the more you know, the more you know you don’t know.

You are to be commended for venturing farther out from shore into uncharted territory. If I wasn’t retired, this is the field I’d love to be working in.

Editorial note

As drafted, the bibliography for this paper was divided into subsections for “XML stuff” and “Information semantics”, something not possible with the present DTD. References 01-16 pertain to XML. References on information semantics are further divided into those materials written by Mr. Kent (references 17-28) and those by other authors (references 29-36). Mr. Kent acknowledged that the references are incomplete and advised readers to “Follow the reference chains”. With regard to XML materials, he noted, “These were my main sources about XML facilities, but I haven’t read all of these thoroughly”. As to his own works on information semantics, Mr. Kent indicated that most “... are available on my web site http://www.bkent.net”.


Acknowledgments

My thanks to the program committee members for all their help: B. Tommie Usdin, Deborah A. Lapeyre, James D. Mason, Steven R. Newcomb, and C.M. Sperberg-McQueen.


Bibliography

[01] “Introduction to XML” (video), Synthbank Technology Series.

[02] “Extensible Markup Language (XML)”, W3C, http://www.w3.org/XML/.

[03] “The Cover Pages”, OASIS, http://xml.coverpages.org/.

[04] Steven R. Newcomb, “A Perspective on the Quest for Global Knowledge Interchange”, Chapter 3 of XML Topic Maps — Creating and Using Topic Maps for the Web (Jack Park and Sam Hunting, eds.), 2003: Addison-Wesley. (This chapter available free at http://www.aw.com/samplechapter/0201749602.pdf.)

[05] Email from C.M. Sperberg-McQueen to Bill Kent, June 11, 2003 7:29 AM. (This provocative position paper should not be lost in the dead-email bin.)

[06] Lars Marius Garshol, “What Are Topic Maps?”, O’Reilly, http://www.xml.com/pub/a/2002/09/11/topicmaps.html.

[07] Steve Pepper, “The TAO of Topic Maps”, Ontopia, http://www.ontopia.net/topicmaps/materials/tao.html.

[08] “A Practical Introduction to Topic Maps”, Techquila, http://www.techquila.com/practical_intro.html.

[09] “Learn more about topic maps”, Ontopia, http://www.ontopia.net/topicmaps/learn_more.html.

[10] “Topic Map Specification” (the topic map standards topic map), Ontopia, http://www.ontopia.net/omnigator/models/topicmap_complete.jsp?tm=tm-standards.xtm.

[11] “Guide to the topic maps standards”, ISO/IEC JTC 1/SC34 N323, http://www.y12.doe.gov/sgml/sc34/document/0323.htm.

[12] Lars Marius Garshol and Graham Moore, “The Standard Application Model for Topic Maps”, ISO/IEC, http://www.isotopicmaps.org/sam/.

[13] Marc de Graauw, “Business Maps: Topic Maps Go B2B”, O’Reilly, http://www.xml.com/pub/a/2002/08/21/topicmapb2b.html.

[14] Ronald Bourret, “XML and Databases”, http://www.rpbourret.com/xml/XMLAndDatabases.htm.

[15] “XML and Databases: General Resources”, The Cover Pages, OASIS, http://xml.coverpages.org/xmlAndDatabases.html#general.

[16] Roy Thomas Fielding, Architectural Styles and the Design of Network-based Software Architectures. Doctoral dissertation, University of California, Irvine, 2000. Chapter 5: “Representational State Transfer (REST)”, http://www.ics.uci.edu/~fielding/pubs/dissertation/rest_arch_style.htm.

[17] William Kent, Data and Reality, 1stBooks, 1998. Available via http://www.1stbooks.com. Excerpts available on my web site.

[18] William Kent, “The Breakdown of the Information Model in Multi-Database Systems”, SIGMOD Record 20(4) Dec 1991.

[19] William Kent, “A Rigorous Model of Object Reference, Identity, and Existence”, Journal of Object-Oriented Programming 4(3), June 1991, pp. 28-38.

[20] Elizabeth Fong, William Kent, Ken Moore, and Craig Thompson (editors), X3/SPARC/DBSSG /OODBTG Final Report, Sept 17, 1991. Available from NIST.

[21] William Kent, “The Many Forms of a Single Fact”, Proc. IEEE COMPCON, Feb. 27-Mar. 3, 1989, San Francisco.

[22] William Kent, “The Leading Edge of Database Technology”, in E.D. Falkenberg, P. Lindgreen (eds), Information System Concepts: An In-depth Analysis, North Holland, 1989 (Proc. IFIP TC8/WG8.1 Working Conference, Oct. 18-20 1989, Namur, Belgium). Also in F.H. Lochovsky (ed), Entity-Relationship Approach to Database Design and Querying, Elsevier Science Publishers (North Holland), 1990 (Proc. Eighth International Conference on the Entity Relationship Approach, Oct. 18-20 1989, Toronto, Canada).

[23] William Kent, “Limitations of Record Based Information Models”, ACM Transactions on Database Systems 4(1), March 1979. Also in John Mylopolous and Michael Brodie (eds), Readings in Artificial Intelligence and Databases, Morgan Kaufman, 1989.

[24] William Kent, “Employee Was a Subtype of Person”, unpublished, 1988.

[25] William Kent, “The Entity Join”, Proc. Fifth Intl. Conf. on Very Large Data Bases, Oct. 3-5, 1979, Rio de Janeiro, Brazil, pp. 232-238, Morgan Kaufman, 1979.

[26] William Kent, “The Type and Class Definition Game”, unpublished, 1992. (Try applying this game to your own terminology.)

[27] William Kent, “Objects and Object Systems”, unpublished, 1992.

[28] William Kent, “Attributes? Why?”, unpublished 1992.

[29] The ANSI/X3/SPARC DBMS Framework, Report of the Study Group on Database Management Systems, (D. Tsichritzis and A. Klug, editors), AFIPS Press, 1977.

[30] P.P.S. Chen, “The Entity-Relationship Model: Toward a Unified View of Data”, ACM Transactions on Database Systems 1 (1), March 1976, pp. 9-36.

[31] Colin Cherry, On Human Communication, MIT Press, 1966.

[32] Don Fabun, Communications: The Transfer of Meaning, Glencoe Press, 1968.

[33] G.M. Nijssen, Modelling in Database Management Systems, North Holland, 1976. (Proc. IFIP TC-2 Working Conf., Freudenstadt, W. Germany, Jan. 5-9, 1976).

[34] G.M. Nijssen, Architecture and Models in Database Management Systems, North Holland, 1977. (Proc. IFIP TC-2 Working Conf., Nice, France, Jan. 3-7, 1977).

[35] Joseph Weizenbaum, Computer Power and Human Reason, W.H. Freeman, 1976.

[36] Benjamin Lee Whorf, Language, Thought, and Reality, MIT Press, 1956.



The unsolvable identity problem

William Kent
billkent@bkent.net