Browser bookmark management with Topic Maps

Thomas B. Passin
tpassin@comcast.net

Abstract

Making effective use of large collections of browser bookmarks is difficult. The user faces major challenges in finding specific entries, in finding specific or general kinds of entries, and in finding related references. In addition, the ability to add annotations would be very valuable.

This paper discusses a practical model for a bookmark collection that has been organized into nested folders. It is shown convincingly that the folder structure in no way implies a hierarchical taxonomy, nor does it reflect a faceted classification scheme. The model is presented as a topic map.

A number of simple enhancements to the basic information are described, including a very modest amount of semantic analysis on the bookmark titles. An approach for preserving user-entered annotations across bookmark updates is delineated. Some issues of user interface are discussed. In toto, the model, the computed enrichment, and the user interface work together to provide effective collocation and navigation capabilities.

A bookmark application that embodies this model has been implemented entirely within a standard browser The topic map engine is written entirely in javascript. The utility of this application, which the author uses daily, is remarkable considering the simplicity of the underlying model. It is planned to give a live demonstration during the presentation.

Keywords: Topic Maps; Metadata

Thomas B. Passin

Thomas Passin has been working with XML-related technologies since 1998. He helped to create the XML version of the message set in SAE J2354 Advanced Traveler Information Systems, currently in balloting, and has created a number of demonstration applications that use XML, XSLT, and Python technologies together. He also consults at work about XML and XSLT matters, and is active on a number of related discussion lists.

His interest in Topic Maps developed naturally from past experience with data modeling. He is currently finishing a manuscript about the Semantic Web.

Mr. Passin studied physics at the Massachusetts Institute of Technology and the University of Chicago.

Browser bookmark management with Topic Maps

Thomas B. Passin [Principal Systems Engineer; Mitretek Systems]

Extreme Markup Languages 2003® (Montréal, Québec)

Copyright © 2003 Thomas B. Passin. Reproduced with permission.

Introduction

Browser bookmark collections pose both an opportunity and a challenge to knowledge management technology. Browser bookmarks often play the role of a major database of reference information. Everyone has them, and there is a large amount of semantic content in the arrangement and names of folders, and the titles of the actual bookmarked resources.

The tools available for navigating and viewing bookmark collections are not able to make much use of this information, and even the best that the author has tested have serious weaknesses. Thus, users turn to Google or another web search site. These sites often do the job, but the user has to repeat the process of winnowing out the undesired hits.

After reviewing problems and issues with current bookmark managers, this paper presents an analysis of the nature of bookmark collections — assumed to contain nested sets of folders into which the bookmarks are organized — and describes a Topic Map-based approach that significantly increases the usefulness of a bookmark collection. Some user interface issues are discussed. Then an example browser-based implementation is briefly described.

Why bookmarks are hard

In this paper we are primarily considering relatively large collections of bookmarks. The author’s own collection currently has 1855 bookmarked pages organized into a structure that has 723 folders. In fact, these bookmarks are collected by three different browsers. As we will see, this fact makes the collection even harder to keep organized.

Forgetting

It is impossible to remember where every interesting bookmark is located, and impossible to recall everything that has been stored. It is also hard to recall why sets of folders were organized the way they were. This makes it difficult to consistently file new bookmarks and to know where to look for old ones.

Organizing

Often it is plain hard to know how to classify a particular web page. With the rational for the folder structure half-forgotten, and in the rush of the moment, a bookmark may get filed in strange and (later) unexpected places. New subtrees of folders may begin to grow. Naming conventions drift. For example, the author has noticed that he has been creating more plural folder names recently.

When multiple browsers are involved, it is even harder to keep a consistent organization of the folders.

Finding

In some browsers, and most bookmark managers, it is possible to search the bookmark collection by title. This is quite useful if one has some memory of the title, but this is often not the case.

Beyond finding a specific title, one often wants to find related bookmarks. If similar bookmarks are scattered among distinct folder subtrees, a simple search will not find them.

Serendipity is valuable and exciting. It should be possible to find references that are related but unexpected.

Merging bookmark sets

It should be possible to combine bookmark sets from several browsers. The difficulty in merging is that the various browsers will probably not have the same organization.

Annotating

The ability to annotate a bookmark in various ways would be useful. Some bookmark managers do allow a description to be attached to a bookmark, and some will let the user create one or more searchable keywords. But often one wants to make a number of different kinds of notes, a capability apparently not supported. A good knowledge management system should support this desire.

In addition, the annotations should not get deleted when the collection is refreshed or changed. Some bookmark managers are able to store a global set of bookmarks separate from the browser, which would reduce the update problem, but it would probably be better to be able to work with the bookmarks maintained by each browser.

Desirable features of a bookmark manager

The key goals for a bookmark manager are the same as for any other library-like collection of information — collocation of related information, and effective navigation of the collection (see [Svenonius 2000] for a modern review of these objectives). Merging and annotation capabilities go beyond the classical objectives of library science but are very desirable.

The problem for the design of a bookmark manager lies in providing these capabilities given the extremely uncontrolled and variable nature of the organization of bookmark collections.

Bookmark issues

In this section we look at the main issues with bookmark collections that make them hard to work with.

What do the folders mean?

In a typical display, bookmarks are laid out in a tree-like display with folders inside of other folders, much like a display of a file system. Some bookmark managers dispense with folders altogether, relying instead on title searches and category labels or keywords. This works for relatively small collections, but as the collections increase in size, management of the growing collection of categories generally becomes a major problem. If they exist in a flat space, there are too many of them and they are not structured effectively. If they are allowed to be structured hierarchically, they amount to the same thing as a set of nested folders.

Do the folders form a hierarchy?

Obviously the folder names have some relationship the user’s notions of classification. It is tempting to imagine that the folders form a hierarchy of concepts, like a taxonomy. This is not the case, though. They are not a hierarchy, and sub folders do not depict progressively more specialized versions of their “parents”.

To show this, consider a fragment of an actual collection, taken from the author’s collection. Under one top-level folder, we have this structure:

Food 
   Bakeries 
   Baking 
      Sourdough
      Bread 
   Cooking 
   Recipes 
   Suppliers
   Tools 
      Knives

Now, obviously, Bakeries is not a kind of Food, nor is Sourdough a kind of Baking. On the other hand, Bread could be considered a subclass of Food, and likewise Knives could be a subclass of Tools.

Clearly, this fragment is in no way a taxonomy, structured as class-subclass-sub-subclass... Next we will see that is not even a proper hierarchy.

Leaving aside the obvious point that the same bookmark could appear under different folders, it is more significant that the order is often arbitrary and could easily have been reversed. For example, we could easily have had this listing instead:

Food
   Bakeries 
   Bread 
      Sourdough 
      Baking

If the order could be altered and still make sense, it is impossible that the folders represent a true hierarchy. As we have seen, though, a particular sub folder could in fact be a subclass of its “parent”. But this is not true in general, and not even necessarily within any one branch.

The KWIK [KeyWord In Context] technique sometimes permutes the order of keywords in compound terms. It would seem that bookmark folder can tolerate some degree of permutation, suggesting that they are more akin to compound indexing terms.

Do the subfolders represent facets of their parent folders?

Faceted classification schemes present a variety of subproperties that may be applied to an entity. A complete classification includes all facets that apply to the thing of interest, but their order is not generally significant.

Since we have seen that folder order is not always significant, could the subfolders represent facets? Unfortunately, this is not true in general either. Facets are supposed to represent orthogonal and exhaustive collections of applicable subproperties [Tzitzikas 2003]. Obviously, the subfolders shown above do not represent orthogonal concepts, let alone exhaustive sets.

Indeed, in most cases the subfolders do not represent properties or subproperties at all. Many times they represent different perspectives on the “parent” folder. Thus, Baking is more a perspective on Food than a facet of it. At the same time, some subfolders may well actually be facets.

To sum up, typical collections of bookmark folders are not true hierarchies, are not taxonomies, and are not faceted classification schemes. They are instead a somewhat incoherent combination of all of these together with other, probably unnamed schemes. This is just what one would expect from a person, untrained in library classification and without a controlled vocabulary.

In Section 1, we will see how to model real folder collections so as to get the most mileage from them.

Semantic content of bookmark collections

There is a great deal of semantic information in the titles of bookmarked resources. Although its quality is variable and its terms are uncontrolled, most titles had some meaning to the page author and also have some meaning to the user who created the bookmark. Because of its uncontrolled and variable nature, and because they are short, it is hard to do much useful computerized analysis of individual titles, especially with relatively simple software.

Users, though, are skilled at extracting useful information from titles. Title searches and browsing techniques make use of this human strength.

In a similar way, the titles of folders made sense to the person who created them, in the context of her thoughts and goals at the time. Thus, searches and browsing of folder titles is likely to be useful as well. Whether much useful semantic analysis can be done on the titles is open to question, again because they are short and inconsistent.

The structure of the folders somehow reflects the user’s notions of classification. In some way, a subfolder must have seemed at one time to have some meaningful relation to its containing folder. The structures can be analyzed, if suitable principles can be found for doing so.

Change and stability in Folderland

As noted earlier, both the organization and the naming styles of bookmark folders are prone to drift over time. This argues against trying to derive a fixed ontology ahead of time to model the collection. Instead, the design must be created anew from time to time, so that it can adapt to the changing structure of the collection.

Over time, it is likely that more and more duplicates of certain bookmarks will be filed in different folders. This can happen because the user forgets that a particular page was already captured, because it is being viewed in a different browser, or because the user just wishes to classify the same page differently because the focus of her interests has changed.

If the same bookmark is filed in different locations, it is likely that they have something in common. For example, a page might eventually get filed under “RDF”, and also under “Ontology”, and perhaps under “Knowledge Management” as well. What do these three have in common. Well, obviously, many things, but it might be hard to articulate the commonality that caused the specific page to be filed.

In a library setting, the librarian would spend the time necessary to arrive at a convincing classification, using a controlled vocabulary. But in the browsing setting, the user will file the page in a matter of seconds.

Because the different filing locations most likely have something in common, to satisfy the goal of effective collocation, it would is desirable to retrieve all of them when a particular bookmark is found.

Of course, some bookmarks get deleted from the collection as well. Dead bookmarks may or may not get purged.

Updates and annotations

The principal problems relating to personal annotations in a bookmark collection are first, to be able to preserve them when the bookmark collection changes or gets restructured, and second, to be able to save them if the user decides to change her software.

For the first, it seems best to attach annotations to each bookmark as defined by its URL. That way, even if the structure of the collection changes, the annotations will stay with the URL, which is generally what is intended.

For the second, it should be possible to save the bookmark collection with its annotations in some relatively standard format, either an XML format, or following some standard such as Topic Maps or RDF.

The model

In this section, we arrive at a suitable model for representing the folder structures. This model underlies the implementation covered in Section 6.

Folder structure as a subject language

We have seen how a set of bookmark folders is likely to be neither a taxonomy nor a set of classification facets, and how it tends to be inconsistent. In fact, uncontrolled bookmark structures are normally very informal. Any given subfolders may represent a subclass, a facet, a perspective, a “See also” relationship, or some other relation to the parent folder that may not even be easy to articulate. For example, in the author’s own collection, there are many folders whose names begin with “And” — “And Java”, “And Web Services”, “And Python”, and the like. These “And” folders are an attempt to indicate a relationship of equal status between the two concepts.

A further complication is that several folders may have the same name. For example, “Articles” may appear in folders in several unrelated branches. So the folder name by itself may not be enough to identify the right concept or filing place, we need somehow to represent the context in which it appears.

In the study of the organization of information as applied in the library sciences, there is a concept called “Subject Language” [Svenonius 2000] . A subject language is a vocabulary for describing the subject of a work in such a way that it can be recognized and thereby found — that is, to provide navigation capability for a collection. The Dewey Decimal System is a familiar subject language, but there are a vast number of others.

The insight here is that the folder collection represents a kind of subject language. It is informal to be sure, but a subject language none the less. Now, there are many types of subject languages some of which are hierarchical and some not, some ordered and some not. The terms of the language may be atomic or compound. In fact, a faceted classification scheme can be seen as a kind of subject language.

Thus, we seek a subject language design in which there are compound terms, and in which order is not of great importance, but not completely ignorable either. The next section shows a practical way to accomplish this.

The approach to a subject language

In XML, nested elements are usually represented as some kind of a tree, much like nested folders in a file system or in a bookmark collection. But there is a common alternative way to express the same structure. That is with path expressions, as for instance XPATH expressions. A set of three nested folders can be represented by the path expressions A/B/C, instead of the tree-like view:

A
   B
      C

Now a compound subject language term might be written like this:

Food::Baking::Bread

This form matches a path expression exactly, except for the choice of separator, which is arbitrary anyway. With this insight, we can directly interpret the folder structure, written as path expressions, as a set of compound subject language terms. As we will see below, we can decompose the compound terms into simpler ones, and into atomic terms as well.

By using path expressions, the terms automatically carry their context with them. Atomic terms that are obtained from the decomposition of the compound terms no longer have their identifying context, but if some other atomic term is found to have the same name, there is a reasonable chance that the two have something in common. This will help us to find related resources in places where we have not thought to look.

Consider this fragment — Software/Language/Java. We see again that it is not a taxonomy, for Language is not really a Kind of software. Language/Software might have been better, since we speak of “Software Languages” (although “Programming Languages” is heard more often). “Language”, as used here is more of a perspective — or perhaps a facet — of Software. Java, on the other hand, is a kind of software language.

It is hopeless to expect to deduce such relationships by analyzing the particular bookmark collection. Very likely the person who created it was not very clear or consistent in her notion of why she created the folders this way in the first place. But we can automatically create and decompose the subject language terms without regard to their semantics. Then we can look to see if there is any semantic enhancement that would be practical and useful. This is the path taken here.

Terms

The example fragment — Software/Language/Java — is thus to be considered a compound indexing term in an unknown subject language. It can be decomposed into one compound and one atomic term, that is to say, into Software/Languages and Java. We call the term first term the head term, and the second the tail. Figure 1 depicts this decomposition, which is the key to the analysis of the bookmark collection.

Figure 1: Model of bookmark indexing terms
[Link to open this graphic in a separate page]

Basic model of a bookmark collection, showing compound indexing terms decomposed into “head” and “tail” terms. Note that a bookmark is indexed by the compound term, rather than by the “tail” term, even though the folder tree display appears to show the bookmark under the “tail” term.

In the tree-like view, actual bookmarks seem to be filed under the folder “Java”, However, as we saw above, the context is critically important, and the context is captured by the full path expression. Therefore, as shown in Figure 1, the actual bookmarks (that is, the URL resources) are associated with the compound term, which in this case is Software/Languages/Java. For labeling purposes, we may want to display the label of the tail term, which we can get through the compounded from association.

Figure 1 is drawn in the style of a topic map [ISO Topic Maps 2002]. In the topic map pattern, relationships, called associations, are non-directional. Each arc is described by the role it plays in the association. Topics are computer structures that represent concepts, and the type of a role is a topic because (of course) it is a concept in its own right. Figure 1 depicts a second kind of association as well, the association labeled describes that connects the actual bookmarked pages to their (compound) indexing term. The figure illustrates several resources associated to the indexing term via the single association, but alternatively there could be several describes associations, each with just one associated resource. Of course, each bookmarked resource also gets its own topic.

The head term, which in this example is still a compound term, also gets decomposed into its own head and a tail terms. We continue to decompose the compound terms until they have all been fully decomposed. Of course, each intermediate compound term may have its own bookmarked resources associated with it.

At each step of this process, each compound indexing term gives rise to one association and three topics — one for the original term and one each for the head and tail terms. We note in passing that there might be duplication of atomic folder names, although those tail terms are not equivalent because they arise in different contexts.

This analysis into paths, compound indexing terms, and then into atomic terms, is the main step in analyzing the bookmark collection. It is technically easy to do, and ought to scale approximately as O(n) (except for any issues of indexing the new topics after they have been created).

This construction procedure is entirely mechanical, yet it does capture some of the semantics of the collection. That is because the user created the structure based on her personal concepts and connotations. To capture the structure is to capture some part of its semantics. It is as if the structure and connections communicate to us in ghostly ways. If any analysis of the actual semantics of the labels should be possible, it would only enhance a structure that is already surprisingly rich. We will see in Section 4 that there is a simple semantic enhancement that can sometimes be made.

When a topic is created for a bookmarked resource, a check is made to see if a topic already exists with that URI. If there is one, it is used and a new one is not created. It is intended that no resource topic be duplicated. Since each bookmarked resource becomes a topic, naturally it is easy to attach any kind of annotation or meta data to it.

Enrichment

In principle it should be possible to perform some analysis of folder labels, but with a small sample and little supporting text it would be hard to accomplish much automatically. Although it would be possible to arrange for the user to add semantic information (since the creator of the collection presumably would understand it better than a computer), in this work we take a different approach.

Those “And” terms

Earlier it was mentioned that the author’s collection includes a number of folders whose name starts with “And”, such as Python/And Java. Although this may be idiosyncratic to the author, it turns out that a very useful bit of semantic analysis can be done on these folders. The analysis is very simple, in fact it might be called “simple-minded”, but it has turned out to be quite effective.

The analysis consists in nothing more than splitting the name and creating a corresponding association. Figure 2 depicts this process.

Figure 2: Modeling folders whose names begin with “And”.
[Link to open this graphic in a separate page]

A compound term whose “tail” term starts with the word “And” is treated specially. The tail term is decomposed by splitting the string, and a new term is created, if one does not exist already. The type of relationship is called a “Co-mention” association, because the terms “Co-mention” each other.

The tail term, which is the term with “And” in it, is regarded as compound, and the label is split to extract the part following the “And”. A topic is created with this label, if one does not already exist, otherwise the existing one is used. A co-mention association is created to relate these terms back to the parent compound term.

For this scheme to be useful, the new terms have to match existing atomic terms. This is often the case, at least in the author’s collection. When this happy situation exists, the co-mention relationships allow for finding related subjects in unexpected places. This will be illustrated in Section 6 where a working implementation is discussed.

Equivalent terms

Search capabilities have proved to be useful in the implementation of this model, as discussed in Section 5 below. One could add some ability to search for synonyms and other equivalent kinds of terms, just as for any other search. One would presumably then want to bring in stemming, and the task starts growing beyond simple programming techniques.

In this connection it is interesting to ask how to present search results that came from matches on equivalent terms, since little semantic analysis has been done on the bookmark collection. For example, since the words and words senses are not well known for the collection, should all the results be mixed together, or should they be segregated somehow? The author has done a little experimenting about this in the implementation (see Section 6).

Other possibilities

Obviously any number of other enhancements can be devised. The unanswered question is to what extent could they be useful given the uncontrolled and variable nature of bookmark collections?

One possibility is to try to control the collection by getting the user to contribute her knowledge, the knowledge and ways of viewing the world that led to the structure of the collection in the first place. How to accomplish this, how useful the results would be, and whether users could be bothered to cooperate with the software in this way over time must remain for future work to answer.

Another possibility is to try to map the collection to some controlled vocabulary, and over time migrate the structure and navigation to make progressively more use of the controlled vocabulary. In addition to the question of user cooperation, it is unclear how the user could be induced to start using the controlled classifications in a typical browsing environment. When browsing, the user wants to make decisions and file bookmarks in a matter of seconds, and any questions or suggestions by the computer might seem intolerable. At the least, this represents an extremely difficult problem in usability and user interface design. But perhaps with sufficient cleverness a satisfactory approach could be devised.

User interfaces

A topic map constructed according to the model described in Section 3 is able to supply a rich serving of data to the user. How can it be presented? What kinds of interactions will be useful? Devising a good user interface for the map can be challenging. The author’s preliminary experiments with several general purpose topic map viewers showed clearly that a custom application would be necessary. The general purpose viewers were simply not able to present and navigate the linked information effectively enough. What characteristics are likely to be important?

Navigation

There are two classic ways to navigate bookmark collections. One is to browse, and the other is to search titles or, if available, keywords. The existence of our topic map will not change their usefulness (or lack of it, as the case may be). What the topic map of the collection does offer is links that are not present in the original, non-topic-mapped, collection. There are links between terms and bookmarked pages, just as for any collection, there are links between compound terms and atomic terms, and there are links between compound terms of differing degrees of decomposition. There are links between the “co-mention” terms, and there are indirect links between folders that are known to share the same bookmarked resource, simply by virtue of the fact that they share a resource. In this way, a search of folder titles usually returns a rich set of entries into the collection.

Tree views

To make good use of the topic map, then, we need to be able to capitalize on the links. As we learned in Section 3, it is the compound path that carries the context for a particular act of filing. We need to show the context, and this suggests that we show the path in some fashion. This could be done with a pseudo-tree view. A tree view can show the local context very well, but it can take up a lot of vertical space in the display. Also, a tree view is not as good for displaying remote but related contexts.

Path views

Alternatively, the path expressions can be listed. This provides a compact display in which it is easier to pick up separated but related contexts. But a list of paths is harder to make easy to read. Here is an example of a group of path statements:

Agents/FIPA/Tools
Annotation/Tools
Architecture/Tools
Conceptual Graphs/Tools
Food/Tools
Graphics And Visualization/Tools
Knowledge Representation/Tools
Ontology/OWL/Tools
Ontology/Tools
Publications/Tools
Python/And Web Services/Tools
Python/Tools

This list (which has been truncated to save space) is presented by the sample implementation in response to clicking on the atomic term “Tools”. If this set of data were presented in a tree format, it would be fairly incoherent and hard to absorb. Tree views work well when many leaves are at the end of a few branches, but here we have many branches, and it is their leaves that are related. A tree view would not be very effective, but with the path view, it is possible to scan and notice that, for example, if I am interested in tools for ontologies, I might want to look at tools stored under Conceptual Graphs as well.

In the example above, all the paths ended with the term of interest — “Tools” in this case. This is by no means always the case. To illustrate, here is another fragment:

Agents/FIPA/Ontology
Ontology/Applications
Ontology/Articles
Ontology/Cyc
Ontology/DAML+OIL
Ontology/Dublin Core
Ontology/Examples
Ontology/OWL

The user would probably have forgotten that an Ontology directory exists under Agents. This listing brings back not only the fact but also something about the rationale. These fragments merely hint at the power available using this system, which will emerge more clearly though this the rest of this section and in Section 6.

One might think that, since it is argued above that path expressions are usually more useful than tree views, that a list of all path expressions in the collection should replace the usual full tree view. This can be done with the sample implementation, but in the author’s opinion, it does not work well because it is too dense and too rich for casual browsing. It seems that a path listing style view is better suited for limited sets of information about closely related subjects. Perhaps this conclusion could be changed by a sufficiently well designed display format.

Graphical views

The importance of the links suggests that some kind of graphical representation would be useful. In graphical views there are always three challenges to be met. One is achieving a satisfactory layout automatically, another is to avoid an overly cluttered graph, and the third is to have intelligent grouping of the nodes. Programming such graphical displays is harder than programming textual displays.

Clearly there is potential for devising a really good graphical display for the collection’s data. The sample implementation so far uses only textual displays.

Hyperlinking the compound paths

Obviously we expect to be able to click on a displayed path and receive some useful information in return. But with a path display such as the ones illustrated above, how should the hyperlinks be arranged?

In many applications that show navigation links in the form of compound paths, each step of the path is a separate hyperlink that allows a user to return to previous pages and to skip intervening links. However, this is not a good plan for our case. Here, each compound path represents a context that may contain filed URLs or other paths (i.e., folders). It is forward-looking and not backwards looking in the sense that the links are not there to help the user return anywhere, but instead to help her proceed.

Thus, it has proved better to let the entire compound path be a single link that returns all related information about it, including bookmarked pages and related folders. But this design has one drawback, because it is sometimes desirable to also allow clicking on the various steps in a compound path after all. The sample implementation deals with this conundrum by also listing the next higher part of each compound term but in a separate section of the listing. This makes for a longer listing but still seems to be effective.

Collocation

The examples above also illustrate a degree of collocation, that is, finding related information in one place or at least nearby. We saw, for example, potentially related folders. Not shown here but depicted in Section 6, the stored bookmarks for any expression are also shown on the same page as related path expressions.

If every related resource cannot be shown in one place at the same time, the next best thing is that there are easy routes to get from one set of related information to another. In an on-screen application, this usually translates to few mouse clicks and minimal cognitive effort. With other features that will be discussed further in Section 6, it is usually possible to move to potentially related information in three or less mouse clicks. Of course, one still has to decide if the bookmarked resources found in this manner are relevant. This would be done by using the titles of the resources, together with any annotations the user has made.

Searching

As mentioned at the start of Section 5, searching is likely to remain important, topic map or no. Can this topic map approach bring any new aspects to searching, compared with ordinary bookmark managers? The answer is both “yes” and “no”.

The answer is “no” in the sense that we can mainly search resource titles and folder titles. Some ordinary bookmark managers also offer searching of the URL strings themselves, and of keywords. With the topic map, we could search all of these, and also the compound paths. In addition, there are the new terms generated from decomposing the “And” terms (see Section 4-1), and of course, any other enrichment of the topic map that might be devised.

In the author’s experience, searching the path expressions per se is not very helpful because it is difficult to remember them and to spell them right. What does work very well is text search with partial matching among the titles of the atomic terms — the leaves of the folder branches, in other words, together with any synthetically generated atomic terms. The full set of compound paths that are related to the atomic “hits” can be just one mouse click away. This design has proven to be highly effective.

Of course, searches among the resource titles themselves continues to be extremely valuable. The titles contain a great deal of semantic information, and searching is a prime way to access it. So far the author has not felt the need to search the URL values themselves. Once in a while it would be useful, but rarely enough that he as not gone to the effort to write the code.

Browsing

The main method most people use to navigate and search their bookmarks is the classic browse through the folder tree. The sample implementation provides an especially convenient way to expand and contract the folders to make this easier. Nevertheless, the author has found browsing the folder tree to be the feature he uses the least. It is good to have, but rarely essential. That is because it is so easy to find a starting point and to see related paths through the collection. After getting used to the system, the author feels rather crippled when he is reduced to mere browsing of a tree view.

With the implementation, browsing is only the first step. Selecting any one folder in the browse immediately brings up the entire array of linked information, and one normally does not need to go back to browsing the tree view.

For example, as an experiment, the author began to browse the tree view for “Bookmark”, an appropriate term here. But there is no top level folder with that name. Where to look? Instead of browsing the folder tree any longer, a search of topic titles immediately returned “Bookmark Managers” and “Bookmarklets”. The first of these led through one intermediate mouse click to “Web/Browsers/Bookmark Managers”, which had URLs for five bookmark manager products, and a related link to Web/Bowsers as well.

“Co-mentions” and pseudo-facets

Recall that so-called “co-mention” associations are created when terms starting with “And” are encountered. There may of course be other idioms that could be easily analyzed. When the user selects a term, the displayed results include any co-mentions, which are, of course, hyperlinks to their atomic terms. This provides another path to related information, one that would not be found by plain browsing. Here is an example. The selected term is “Python” (the programming language, as this collection has no entries as yet for the snake variety). The co-mentions are:

Python/And Web Services
XML/And Python

Notice how the target term, “Python” need not appear in either the first or the last position. The idea behind co-mentions is that the two terms are peers, and the use of “And” is a hint to that effect. As was mentioned earlier, this is the only bit of actual semantic analysis in the model to date, and it is barely worthy of the name “semantic”. But this simple relationship turns out to open up the collection to a surprising degree.

During the discussion of the model in Section 3, we inquired whether child folders represent “facets”. It became clear that in general they do not, but sometimes they do. Because little semantic analysis of the collection is possible, there is no way to tell which child folders are facets and which are which are not. The same is true for perspectives and subclasses. There are also common patterns. For example, many folders in the author’s collection have a subfolder called “Articles”. Certainly an Article is some kind of information about the subject of the parent folder, even if it is not a facet or perspective.

It turns out to be useful to break out the immediate child folders as if they were facets or perspectives. In other words, we finesse the fact the we do not have enough semantic information about these “pseudo-facets” by ignoring it. For example, Under “Python” there is a folder called “Articles” (path Python/Articles). So we include “Articles” in a list called, in the sample implementation, “Perspectives and Facets”.

This is useful because selecting “Articles” leads to all the other subjects for which we have also have articles.

Screen real estate

Showing all this information on a single computer screen is of course difficult. Choices include limiting the amount in any one view, making it smaller, and opening other windows. None of these options are desirable, but so far one or more of them must be invoked. In the author’s opinion, there is a great need for creativity in this area.

In the sample implementation, information is presented in three side-by-side panels, and the actual bookmarked resources are opened in a separate window. The panels are side-by-side to minimize the amount of vertical scrolling needed when the returned listing is long. It succeeds because the entries are typically relatively short, so that horizontal scrolling or excessive wrapping is rarely needed. Long path expressions fare less well with this design. The approach works fairly well, at least for the author, but more creative and sophisticated designs are called for. Of course, this applies to any rich information-presenting system, not just the topic map bookmark manager.

Implementation

Up to now, a “sample implementation” has been mentioned many times. This application was developed to test and refine the model, to prototype user interface features, and to eventually provide a usable application for the author’s personal use. To make development and changes quick, and to avoid having to create user interface machinery for the application, the whole system is implemented in javascript as a stand-alone program in a web browser.

Standards support and cross-platform capabilities

The code makes use of HTML, CSS2, Javascript, (ECMAScript,) and DOM. Any standard browser that supports these standards sufficiently well should work. In practice, this means Internet Explorer 6 and Mozilla-based browsers. Since the code runs in Mozilla, the application should work cross-platform, although this has not been tested yet.

Architecture

The system is highly modularized and consists of a set of “core” javascript modules that contain the topic map engine and related utility code. This core has been used in several other topic map applications without modification. The engine, and a topic map editing application that uses it, is now — thanks to Alexander Johannesen — an open source project hosted on Sourceforge, under the name “TM4JScript”.

The engine is designed as a set of classes that attempt to implement the structures described in the [XTM] specification as closely as possible. XTM [XML Topic Maps] is an interchange format, part of the ISO standard for topic maps [ISO Topic Maps 2002], not a program description, but it is very feasible to consider the XML elements to be data structures and to turn them into objects,

This approach does not necessarily lead to an “efficient” program, but it is very efficient for its intended purpose. It is not necessary to learn specialized, high-performance programming structures and to mentally translate them to topic map constructs. This maximizes the power of the system to support prototyping and experimentation. Where performance becomes too slow, the first line of defense is to create indexes rather than sophisticated data structures.

Applications write pages dynamically, since there is no server to do it for them. Typical applications have several frames in an HTML frameset. Common code resides in the frameset where all the frames can refer to it. Frames communicate only by requesting the reload of another panel, passing any required information in query parameters. In this way, the application could be converted to a server-based one with very little effort beyond the actual porting of the engine. An earlier version of the engine was ported to Python with ease, which is not surprising because of the similarities between Javascript and Python, and because the code was written as if it were for Python so far as was possible.

Persisting and interchanging Topic Maps

The native format of a topic map file is a generated set of javascript instructions that cause the map to be constructed. There is utility code to write the native format and also to write standard XTM format topic map files. Because standard browsers cannot write to the file system in a portable way, it is necessary to do a source view, then save from there.

The bookmark application does not need to import XTM files to date, but a set of XSLT stylesheets is able to convert an XTM file into the native javascript format. This capability is used by other applications that use the core engine.

Creating a bookmark Topic Map

To create the topic map from a set of bookmarks, the XML format for browser bookmarks called XBEL is used. Python scripts turn the browser bookmarks into XBEL files. The author uses three different web browsers (Internet Explorer, Mozilla, and Firebird, which is based on Mozilla). The XBEL files for these three browsers get merged by an XSLT stylesheet. Duplicate URLs in the same directory get purged (to avoid duplications between the browsers). Another XSLT stylesheet generates the path expressions for each folder and creates the basic topic map in an intermediate XML format (not XTM). A final XSLT stylesheet converts the intermediate XML to the native javascript format.

The whole process is driven by a batch file, which puts the javascript topic map file into a standard directory where the browser can find it. When the frameset for the application loads, the topic map gets imported and processed. This processing builds the basic map and then enriches it as discussed in Section 5.

It is interesting to note that the basic topic map can be constructed with XSLT. That is, topics are created for the compound terms and for the bookmarks themselves, and they are related by associations. The javascript application decomposes the compound terms, creates topics for the head and tail terms along with the associations that relate them, and enriches the topic map.

Screen captures

In this section we use screen captures to illustrate some of the workings of the interface. Figure 3 shows a classic tree view. The display on the right is the result of clicking “Programming” under “Books” in the tree view. Notice that the tree view shows only folders and not actual bookmarks. This design feature is intended to reduce the amount of information in the view. The bookmarks themselves get listed when a folder is clicked. In this case there are three bookmarks in the chosen folder.

Figure 3: Screen shot of the implementation

The image shows a tree view on the left and on the right, the results of selecting the “Programming” folder under “Books”.

[Link to open this graphic in a separate page]

The figure with its annotations is fairly self-explanatory. Under Related Terms, the link to Books is of course a link to the “head” term of the path expression. The link to Programming under the heading Perspectives and Facets is more subtle. It lists all tail terms in which the indexing term for the page (i.e., Books/Programming) plays the head role. This amounts to a listing of the last step of the path of the given indexing term. However, the collection is not a tree and there is not necessarily a one to one relation between “parent” and “child”. The same indexing term may play a role in many other paths. Thus, listing these terms, which were called “pseudo-facets” earlier, provides navigational shortcuts that span the collection.

Figure four illustrates these pseudo-facets. It is the result of clicking on the “Programming” link.

Figure 4: Screen shot of the implementation

The image shows three indexing terms listed under “By Context”. Each one is related to the topic that is the focus of the page.

[Link to open this graphic in a separate page]

Three indexing terms (that is, path expressions) are listed under By Context. They all have Programming, the focal topic of the display, as part of their paths. This means that there is a “programming” folder in the tree-like view under the each of the three topics Books, Logic, and Software. The user might not have remembered that the other folders also contain a “Programming” folder, but now she has instant access to them.

In this example, the focal term always appears on the right hand side of the paths, but this is not always true, as we see in Figure 5.

Figure 5: Screen shot of the implementation

The image illustrates that the “Related Context” indexing terms may contain the focal topic at either end.

[Link to open this graphic in a separate page]

Notice how the display for Parsers shows a reversal of order — in two the term comes last, while in the other one, it comes first. Once again we see how easily the application brings potentially related links together, in a manner impossible for conventional bookmark managers.

The last screen capture, Figure 6, illustrates several more features of the implementation. First, the left hand panel contains two “co-mention” links. This kind of link was discussed in Section 4. It comes from removing the “And” from the name of a term that starts with it, and then linking to the topic that has that name — in this case, XML.

Figure 6: Screen shot of the implementation

The image illustrates two “Co-mention” indexing terms, and on the right hand panel, shows a note added to a bookmarked page. The controls for editing annotations and adding new ones are also visible.

[Link to open this graphic in a separate page]

Once again, the design promotes additional modes of navigation through the collection.

The right panel shows details about a particular resource. Normally this view is displayed by clicking on the details link in one of the other views. In this case, after the data was displayed about the bookmarked resource, additional navigation was done in the left-hand panel. Therefore, subjects of the two panels no longer match. In this illustration, a note has been added to the data for a specific bookmark. The figure also shows controls to add new annotations, edit existing ones, and to save the annotations. This display also shows all associations in which the resource of interest participates. As the topic map is constructed, the only relationship is the one that links it to its indexing term. In the future, should any other associations be added to the resource, they would be listed here.

These static screen shots cannot capture the real power of the application. A live demonstration is planned for the conference presentation. The application is responsive in use, although the time to load a large collection of bookmarks is longer than desirable. For reference, the javascript-format topic map for the author’s bookmarks is over 2.5 MB long. It contains nearly 2000 bookmarks and over 700 folders.

Conclusions

The model delineated in Section 3 seems to work very well. It is simple and easy to understand once the mental picture is changed from hierarchies to compound indexing terms. The application is very effective for the author, who never uses the bookmark capabilities of his web browsers any more, except to capture new bookmarks. It cannot replace a good web search engine like Google, of course, but now it is feasible to search the bookmark collection before resorting to Google. Previously, it was often better to simply go to the Web even though the information would be in the bookmark collection.

Even better, the author constantly finds references of interest that he had lost or forgotten about. It is now rewarding to explore the collection, whereas in the past that had become impractical because of its size. So the experiment is a success, even though more development of the user interface is needed. The original goal of improving navigation and collocation has been achieved.

Potential improvements

The system certainly can be improved. The user interface is very busy because there is so much crosslinked information, and clever ideas in this area would be very welcome. It would probably be useful to devise a keyword scheme to give fast access to web pages that are often used. The usual problem with keyword schemes lies in their management, once there get to be many of them. This, of course, is another user interface problem, not a technical one. Naturally, the keywords would become topics in the topic map.

No doubt there are many ways to enrich the information in the topic map. It is remarkable how useful the system is given how little analysis is done. However, the variability and even instability over time that occurs in practical bookmark collections makes it hard to arrive at good approaches. In one experiment, the author devised a subschema to designate equivalent terms. Just one pair of such terms is built in at present, the pair Article and Paper. A search for “Article” will also match “Paper”, and vice-versa. This is useful, but it is unclear how far it would be practical to extend the set of word pairs.

RDF and other technologies

The basic model is very simple, and could easily be implemented in RDF, a relational database, or by other means as well. The topic map pattern was very helpful to the author in the beginning, and sticking to the pattern has made it relatively easy to see how to extend the application with new capabilities. The clear distinction in topic maps between identifiers and retrievable resources is also helpful. There are capabilities specific to topic maps, such as the scoping mechanism, that remain to be applied to the model. For example, scopes could be helpful in delineating larger contexts than the simple path expressions used to represent the folder structure. Such possibilities remain for future exploration.

Summary

This paper analyzes the structure of a bookmark collection and presents a model that represents it effectively as a topic map. User interface issues are explored. Finally, an example implementation is presented that uses a modular javascript topic map engine as the core of a standalone, browser-based, bookmark manager application.


Acknowledgments

The author would like to thank Nikita Ogievetsky, Sam Hunting, and Steve Newcomb for discussions and suggestions that were extremely helpful for this work. Thanks also go also to Alexander Johannesen for creating the Sourceforge project TM4JScript to host the javascript topic map engine.


Bibliography

[ISO Topic Maps 2002] ISO/IEC 13250 Topic Maps, http://www.y12.doe.gov/sgml/sc34/document/iso13250-2nd-ed-v2.pdf.

[Svenonius 2000] Svenonius, E., “The Intellectual Foundation of Information Organization, The MIT Press”.

[Tzitzikas 2003] Tzitzikas, Y. et. al., “An Algebraic Approach for Specifying Compound Terms in Faceted Taxonomies”, From The 13th European-Japanese Conference on Information Modeling and Knowledge Bases, http://www.csi.forth.gr/~tzitzik/publications/Tzitzikas_EJC_2003.pdf.

[XTM] XML Topic Maps (XTM) 1.0, http://www.topicmaps.org/xtm/1.0/.



Browser bookmark management with Topic Maps

Thomas B. Passin [Principal Systems Engineer, Mitretek Systems]
tpassin@comcast.net