The domain of domains

Robert Schmidt
schmidt@agpw

Abstract

A Domain is a scale that can be used to answer some question. Examples of Domains include Longitude, the color scale, and distance. Domains are perhaps the most fundamental element of all, knowledge. This paper dissects eight lists of values that meet the generally accepted definition of Domain. The goal is to find useful distinctions between the eight that can lead to smarter DTDs, better validation, and more capable automated data processing. This paper challenges the reader to find attributes (AKA dimensions) on which to contrast the eight examples. There are many attributes that can be used to describe Domains, but not all attributes apply to all Domains. A table is devised showing which attributes apply to which Domains. Domains are classified based on the kinds of Attributes each has. Those Domains that can be described by the same Attributes are placed in the same class (or sub-class).

Keywords: Knowledge Representation; Modeling

Robert Schmidt

Bob Schmidt began his consulting career in 1982 with the only recently devalued firm of Arthur Andersen. He stayed with the big "n" accounting firms until he left his position as "Master Analyst" at Ernst & Young in 1990 to pursue a career teaching data modeling. He wrote the critically acclaimed InfoStructor and Data Modeling for Information Professionals. Amazon readers rank the works as five and 4.5 stars respectively. InfoStructor was also sold under the Sybase and IBM brands. He left teaching to liquidate his father's businesses in 1997. The liquidation was very successful, leaving Bob free to pursue his own entrepreneurial desires. In 1998, he returned to information systems work and is currently developing e-commerce businesses.

The domain of domains

Robert Schmidt [agpw, inc.]

Extreme Markup Languages 2002® (Montréal, Québec)

Copyright © 2002 Robert Schmidt. Reproduced with permission.

Prolog

To invent a language such as XML is to take on a serious responsibility. I don’t like to exaggerate -— turning a new tomato plant loose on our beautiful planet is more audacious. Still, turning a new language loose in our Dilbertian environment may have unforeseen consequences. Natural languages, such as English or French, have evolved over centuries to provide a large number tenses, clauses, and other devices to express most of what we want to say. XML will be used by thousands of us for thousands of hours to describe the transactions, policies, and other aspects of our enterprises. So, I know we are all working to understand and develop XML so that it can be the language we need for our enterprises.

To the extent that I might contribute to the development of this language, to XML, I would work to tie the structures of XML closely to the ways we express meaning using natural languages. That is, I would not care to replicate the tenses, clauses, and so on of any natural language. I would like for there to be an XML structure for each unique logical construct natural languages use to express our understanding of our beautiful planet.

In natural languages, Domains are sets of words that have a definition in the way we generally think of individual words to have definition. So, we define ‘one,’ but we also define ‘integer.’ Defining the set independently of context enables us to make inferences across facts that appear in completely different contexts or languages. I can compare the weight of a Sumo wrestler with that of a half-back. When we begin to study anything, one of the first things we do is develop a standard measurement. So we invent distance when we are running after wildebeests, and then we invent cholesterol level when we keep eating the meat but stop the running. We cannot start to understand the relationship of running to cholesterol except that we have these measures. As Terry Halpin, the caretaker of ORM [Object-Role Modeling] asserts, Domains are the ‘glue that holds all human knowledge together.’ [ORM]

Domains are the foundations of Attributes, which are in turn the foundations of Classes and in turn Supertypes. A thorough understanding of Domains will make every other part of language more easily understood.

XML is a language. A language must have all the necessary devices to express all relevant facts. The Domain is a critical device of all natural languages. If the varieties of Domain described in this paper are not just an appendix to language but devices with important uses, then XML should be reviewed for possibly extending its ability to describe Domains.

Candidate Domains

Assume that the following concepts can be compared and contrasted based on their Attributes:

  1. US ZIP Codes
  2. Color Scale as a child would know it
  3. The idea of 'Yes' in opposition to 'No'
  4. Counts of things
  5. Measurement of value in U.S. Dollars
  6. Position as expressed in degrees of Longitude
  7. Percentage
  8. Measurement in feet

The eight items in the list are chosen because they all meet the definition of 'Domain' as described by C.J. Date, i.e., "A pool of values from which the actual values appearing in a column (in a relational table) may be drawn."[Date] Because this is the generally accepted definition, I will refer to each of the eight items as 'Domains' even though one conclusion of this paper is that one of these eight is not a Domain.

Ground Rules

Although all eight fit Date's definition, each of these examples is unique in some fundamental way. For example, 'Position as expressed in degrees of Longitude' differs from 'Measurement in feet' in obvious ways; for one, there are only 360° of Longitude. For our purposes, this is a superficial distinction because we can think of both Longitude and Feet as having the Attribute 'range of values.' For the purposes of this paper, when Domains have ALL the same Attributes they will stay in the same class. Consider the differences between 'Position as expressed in degrees of Longitude' and ‘Counts of things.’ We speak of 37° 30’ Longitude; we do not speak of 37½ passengers. So Longitude has an attribute describing the way we describe fractions; Counts does not. When one Domain has even a SINGLE Attribute that is not an Attribute of a second Domain, then those two Domains may not be in the same class.

Each of the eight Domains are made up of more or less distinct elements, e.g., Zip Codes include 63112, 77098; 'ten feet' is an element of 'Measurement in feet.' For purposes of this discussion, the way each element of any of these Domains looks or sounds is immaterial. So I can write ten feet, 10 feet, diez feet, or 1010 feet and mean exactly the same thing. Instead, I will focus on the meaning of the elements and the coherence of each list.

Pre-Work

I encourage the reader to complete the following table before reading ahead. This is, of course, optional; I am not looking.

Table 1
List Example Elements of List Distinguishing Attribute
US ZIP Codes 63139|77081  
Color Scale as a child would know it Red|Blue  
The idea of 'Yes' in opposition to 'No' Yes|No  
Counts of things 100 squid|100 wigs  
Measurement of value in U.S. Dollars 1$US|2$US  
Position as expressed in degrees of Longitude 0°|180°  
Percentage 100%|0%  
Measurement in feet 10 feet|11 feet  

In the last column, the reader should come up with an Attribute setting that Domain apart from at least one other entry in the table. My somewhat surprising (but hopefully not too surprising) answers are given as the paper proceeds.

I am waiting.

Take your time.

Look, I would really rather you get involved in this than just read it. How about I make this a matching question? Which of the Domains 1-8:

  1. Is subjective?
  2. Has a finite range of values?
  3. Is meaningless without an arbitrary designation?
  4. Measures in increments that are constantly changing?
  5. Has unique rules of sets and subsets?
  6. Is the combination of other Domains?
  7. Does not measure anything?
  8. Always involves two things being measured?

In the sections that follow, I1 will take each Domain in turn and discuss the Attribute that most sets it apart from the other Domains.

Measure

'ZIP Codes' is the Domain that stands out among the eight. In a sense, it is the opposite of the other seven. A ZIP code can be measured: it has a place in Longitude, a size in square feet, and a percentage of Hindi-speaking inhabitants. But a ZIP code does not measure anything; I am not '63112' anything.

Knowing that I live in ZIP Code ‘63112’ is probably meaningless to you. Of course, you could learn a lot about me by looking up information about the 63112 ZIP. For instance, I live in a racially diverse neighborhood. You could write me at 63112. To the post office, 63112 maps to a place in the real world. The ZIP Code, an identifier, is a key to other information but has no meaning in and of itself. The other seven Domains have an inherent meaning; for example, I know what 100% means. It may not be relevant until it is attached to something like a ZIP code, but I know what it means.

The elements of the other seven Domains cannot be described. For example, what can you say about 10 feet? You could say 10 feet is the width of Broadway, but it is the road that is described, not 10 feet. You could say that 10 feet was ‘wide,’ but you are only creating an equivalence between one Domain, ‘narrow|wide,’ and the Domain of ‘distance.’ You can create an infinite number of such equivalences without giving meaning to 10 feet.

What does give meaning to 10 feet? We have direct experience with the concept of a foot. It's at the end of our leg. Who hasn't measured something by placing one foot in front of the other? We understand all the elements of the Domain 'Measurement in feet' once we understand one foot. There is no way, and no need, to define each element of a Domain.

With constructs such as ZIP Code, we must describe each element; with constructs such as weight, we cannot describe any single element. Zip Codes gain meaning with measurement and description; Domains give meaning by measuring and describing. So, lists of ZIP codes, and innumerable similar lists2, are conceptually 180° from Domains. The organizing idea bringing together the elements of a true Domain is the fact that the Domain measures. Constructs such as Zip Codes should not be considered a Domain at all.

My answer for the row in the pre-work table 'US ZIP Codes' is 'Does not measure anything.'

Specific

The type of Domain characterized by 'Color Scale as a child would know it' is very common. This Domain is one of the first things we teach infants; children's sections of book stores are full of books showing the color red and associating that color with the word 'red.' There is no other way to teach about 'red' but to see it. In this same category of Domains we have 'big|little, hot|cold, young|old, person-of-color|caucasian.' I do not know the actual percentage of all Domains in use that are loosey-goosey like color, but it is a whole, whole lot.

The other six Domains are objective. If you and I were to disagree about how long something is, then we would remeasure together. If you say it is blue, but I say it is blue-green, then we would have to resort to a different Domain to resolve our difference. We would have to actually measure the saturation of different wavelengths of light and convert our measurement into something objective.

My answer to 'Color Scale as a child would know it' is 'Is subjective.'

Range/Precision

Five of the six remaining Domains are an infinite (or nearly infinite) set of elements. I can be at 103° 13' 10" of Longitude. The coast of England can be 100,000 miles long. I can have a google of atoms.

One Domain is black and white, on or off, pregnant or not. You can be 'probably pregnant' or 'almost on' or a 'shade of grey,' but these are not the same logical construct as ‘on|off.’ When you are 'probably' anything, you are using the Domain 'percentage' — the number of times a trial was true divided by the number of trials.

While it may or may not be a rarity in actual experience, the concept of 'yes' or 'no' is not lost on a criminal lawyer, a semi-conductor, or a mother.

So my answer to the question what makes "The idea of 'Yes' in opposition to 'No'" distinct is ‘Has a finite range of values.’

Standard

Four of the remaining five Domains have a standard for measurement that is applied across whatever is being measured. If I want to know where a boat is or where a building might be, I can use Longitude. If I want to know how far it is between the ball and the cup, I can use meters—just like I can between the boat and the building. I can price my boat and my ball in US$.

When used properly, a squid makes as good a Domain as a foot. Just as the proper use of a foot is for measuring distance, the proper use of a squid is in counting.3

But I can only count squid in squid. I can only count wigs in wigs. Squid are not just useful for counting squid, but they are in fact the only proper way to count squid whether you are in Melbourne or New York City. There is no metric v. English argument.

Squid lack the Attribute of 'convention,' but they do have a peculiar property all their own. 'Squid' can be used to count any of a variety of cephalopod mollusks if you are not too keen on keeping each variety separate. Likewise, the count of squid can get lost in a count of 'marine animals.'

The ability to count items of a type or of a subtype has its own set of rules. These rules are obvious to any four-year-old, and so I will only bring them to your attention without further definition.

My answer for ‘Counts of things’ is ‘Has unique rules of sets and subsets.’

Determinate

In a museum in Paris, there is a rod that is the standard for the length of a meter. The time it takes for the Earth to complete one revolution around the Sun determines the length of a year. Whether it is a length of stick or a length of time, these standards are determinate.

One Domain in our list is notoriously unpredictable. In fact, fortunes have been made and lost riding the changing reach of each of the elements of this Domain. We can measure the distance to a star +/- a light-year, but we cannot measure the value of a US$ with any predictability from year to year.

When we say that a Dollar just will not 'go as far' in Paris as it used to, we mean to say that relative to the French Franc the US$ is smaller. Or when we say that 'inflation is raising prices,' we mean to say that the US$ is shrinking in value relative to squid.

The predictable nature of most Domains makes them generally convertible so long as they measure the same thing. For instance, a sidereal year4 equals 1 standard year plus 6 hours, 9 minutes and 9.54 seconds. A decameter is ten meters. A meter is about 39 inches. But, to convert the value of something from US$ to French Francs would require a stopwatch and a 16-page contract. This is why currencies are said to 'float': there is no external point of reference. Even when currencies were said to be 'tied to the price of gold,' then gold was just another currency.

If you and I disagree about how big, or heavy, or old something is, then we can get a different measuring device or ask an independent person to measure it. But when we disagree about how valuable something is, then we resort to negotiation.

My answer to 'Measurement of value in U.S. Dollars' is 'Measures in increments that are constantly changing.'

Point of reference

It would be useless to you to know that you were at 89ºW Longitude if you didn't know that 0º Longitude ran through Greenwich, England. ‘Position as expressed in degrees of Longitude’ typifies a type of Domain that describes things in relation to an abstract construction. In the same way, our systems of timekeeping require the establishment of a zero year. Christians figure it all started about 2002 years ago, the Jewish tradition puts it at 5762, and my Macintosh figures everything based on 1904.

On the other hand, ‘Measurement in feet’ describes the relationship of two points irrespective of any imaginary grid. Distances do not imply that any one point is an arbitrary zero; for example, Houston to Dallas is 150 air-miles, Dallas to Austin is 100 air-miles, and Austin to San Antonio is 50 air-miles. Note that no city serves as a zero point.

This is not to say that being independent of a zero point is some kind of advantage for measurement systems such as distance. You can also see the superiority of using Longitude and Latitude to describe location since the distance between any of these cities could be calculated if you knew their coordinates.

I will digress to point out another example, the Richter Scale, a logarithmic earthquake magnitude scale. Because of the logarithmic basis of the scale, each whole number increase in magnitude represents a tenfold increase in measured amplitude; as an estimate of energy, each whole number step in the magnitude scale corresponds to the release of about 31 times more energy than the amount associated with the preceding whole number value. [USGS] Like Longitude, the Richter scale is relative; it is relative to no release of energy. The Richter Scale example also demonstrates that there can be a mathematical relationship between the elements or steps of the Domain other than simple multiples. Though these differences are substantial, I did not feel that they amounted to a different class of Domain.

My answer to 'Position as expressed in degrees of Longitude': 'Is meaningless without an arbitrary designation.'

Atomic

If only those types of Domains that we have already discussed existed, then there would be very few Domains in our language. Except for government fiat (e.g., the invention of Argentenian Australs) or advances in science (e.g., measurement of previously undetected phenomena), there are not many new Domains being added to our language. But, we can create Domains from Domains, making the list of Domains as infinite as the list of ZIP Codes.

We are able to combine Domains algebraically to create new Domains. ’Percentage’ is a very common example. If your fuel tank holds 20 gallons, and there are 10 gallons left, then you are 50% full; literally, 10 gallons / 20 gallons. A wage can be $6.25/hour. A Domain such as ‘Percentage’ or ‘Dollars per Hour’ is distinct from all other types of Domains in that it is made up of other Domains. Domains can involve more than two other Domains, as in the gravitational constant feet/second/second.

The answer to 'Percentage': 'Is the combination of other Domains.'

Distance

The one Domain we have not discussed is 'Measurement in feet.' Domains of this type:

  • Measure (contrast Zip Codes)
  • Are objective (contrasted with colors)
  • Can be infinite in range and precision (contrast true|false)
  • Uses a standard for measurement that is independent of the thing measured (contrast counting)
  • Have a base measurement that does not change (contrast US$)
  • Do not rely on an arbitrary zero point (contrast Longitude)
  • Are atomic, not made up of other Domains. (contrast percent)

One distinct property of this type of Domain is that instead of being relative to a zero point, the measurement is between two things being measured. Consider a distance table as shown here:

Table 2
  Houston St. Louis
Houston NA 850 miles
St. Louis 850 miles NA

Notice how the table has an inherent redundancy; the distance from St. Louis to Houston must be the same as the distance from Houston to St. Louis.

So my answer to ‘Measurement in feet’ is ‘Always involves two things being measured.’

Summary Table

The following table completes the pre-work from the beginning of this paper and summarizes the points made in the body of the discussion.

Figure 1: Summary Table
[Link to open this graphic in a separate page]

Conclusion

These eight examples have led me to identify seven different sub-types of Domains. I have organized the Domains according to rules for classification. In brief, Domains with different Meta-Attributes are in different rows. Not too surprisingly, the Domains I have chosen as examples correspond to the sub-types I identify.

Being able to discuss the Attributes of each of these different Domains will lead us toward creating DTDs that support more kinds of documents, queries, and consistency checking.

Admittedly, this paper runs right up the edge of practicality and stops short. I do not mean to be abrupt, but it would be presumptuous of me to suggest additions to XML to distinguish the seven constructs I catalog. I leave those who have a responsibility for the development of XML to consider these questions:

  • What need has human communication for these seven distinct constructs?
  • Does XML adequately account for these logical constructs?
  • Should Domains be an externally defined part of the XML standard?

There is thinking to be done on Domains. Should we also explore all the logical constructs that we all intuitively use to understand our world: stages, classes, and super-classes. This paper focuses narrowly on the ways we measure the world; it catalogs seven different logical constructs that we use regardless of what language we speak. The greatest importance may be in the very exercise of exploration and cataloging—a practice at the heart of all our logical disciplines.

Notes

1.

"We", not just "I", if you please.

2.

Come on, use your imagination. U.S. states, currencies, customers, products, ...

3.

Do not use your foot for counting when in polite company.

4.

The sidereal year measures the time in which the sun returns to the same position against the background of stars.


Bibliography

[Barker/Longman] Barker, Richard, and Cliff Longman. CASE*Method Function and Process Modelling. Addison-Wesley, 1992

[Date] Date, C.J. A Guide to DB2. Addison-Wesley, 1984

[Muench] Muench, Steve. Building Oracle XML Applications. O'Reilly, 2000

[ORM] http://www.orm.com

[Schmidt] Schmidt, Bob. A Taxonomy of Domains, Database Programming and Design. Sept. 1997

[USGS] http://vulcan.wr.usgs.gov/Glossary/Seismicity/description_richter.html



The domain of domains

Robert Schmidt [agpw, inc.]
schmidt@agpw