Ninth ACM International Conference on Hypertext and Hypermedia

HYPERTEXT '98:

Marriott City Center, Pittsburgh, PA, USA, June 20-24, 1998

HT'98 Demos and Posters

Separating Textual Contents from Structures for Reading Hypertext Structured Medical Records

Vincent Brunie (1, 2), Pierre Morizet-Mahoudeaux (1), Bruno Bachimont (3)

(2) UMR CNRS 6599 Heudiasyc
University of Technology of Compiegne
BP 529, 60205 Compiegne cedex, France
Tel: (33) 1 40 77 96 06
Fax: (33) 1 45 86 56 85
E-mail: brunie@biomath.jussieu.fr

(1) Service d'Informatique Medicale de l'AP-HP
91 boulevard de l'hopital, 75634 Paris cedex 13, France

(3) Institut National de l'Audiovisuel
4, avenue de l'Europe, 94366 Bry-sur-Marne cedex, France

Hypertext systems are typically organised as sets of nodes and links. Nodes represent data and links support the navigation functions. An efficient implementation of a hypertext system is to use a structural markup scheme, for example SGML, to build structured representations of the documents contained in the hypertext. In this case navigation is driven by the documentary structures rather than relying on explicitly marked links. Furthermore, this structural approach is efficient for automatically generating links and for building a typology of links.

We have used this approach at the Pitie-Salpitriere Hospital, Paris, France, to implement a prototype of a computerised medical record, called Hospitext. This experiment has shown that the efficiency of this approach relies on the existence of synthesis documents, which correspond to medically standardised readings of the record. The Hospitext system is equipped with tools that automatically collect textual units and handle pointers, which refer to their position in the structures, for dynamically building new documents corresponding to these standard readings of the record. These synthesis tools are grouped into two categories covering the range of all the syntheses, which we have enumerated in the hospital. The first category groups tools, which build traditional navigation devices: table of contents, indexes, navigation buttons, etc. The second category corresponds to more sophisticated tools, providing medical added value readings: summaries of long documents, graphs, patient discharge summaries, etc.

This first experiment has already shown several limits of this hypertext documentary representation approach for using synthesis tools. The main weakness is that the synthesis documents generated by the system contain textual information, which is redundant with textual information contained in the original documents:

an efficient updating management of information data is not possible with redundancy,

it is not possible to have structural links between the generated and the original documents, thus making it difficult for the reader to build associative relationships,

it is not possible to build a structure representation spanning across several documents (which we call inter-documentary structures). Effectively, synthesis documents are just replications of parts of other documents, and thus cannot represent the corresponding inter- documentary structures. They are the material proof of the existence of these structures, but cannot represent them,

it is not possible to represent multiple levels of structures for the same document content, since the structured representation scheme is hierarchical.

We propose a new representation scheme for hypertexts to resolve the problems addressed above. It is based first, on separating the textual contents from the structures, and second, on dynamically building documents for reading. The initial goal of this scheme was to allow the representation of multiple levels of structure for the same document, but it has shown to be also a good solution to the four addressed problems.

This representation scheme is based on two main categories of objects. First, contents are built from textual parts of the documents. They are composed of lexicographic characters standing for themselves. Contents have an addressing mechanism on the whole record, thus allowing identifying unambiguously any character. In addition, addresses have a partial order relation corresponding to the order in which the characters appear in the original documents. There is no need for the order relation to be complete, so the addressing scheme may be multidimensional. Second, structures are tag trees, representing a valid instance of a SGML Document Type Definition (DTD). Each tag is a SGML-like tag, bound to an address pointing to a content. Each tag is then composed of a label, a list of attributes, and an address. There are two types for tags: opening tags and closing tags. The address points to the first character after the tag insertion point. Each tree represents an original document, or a generated structuring document.

The data model is managed by several operators consisting in inputs, increments, and outputs. Input operations consist in adding a structured document to the hypertext. Increment operations may add contents, structures, or both, to the hypertext. This can be done automatically by synthesis tools, or manually by annotation means. Outputs are based on the projection operation. This is the operation consisting in building an output tagged document from contents and structures. Projected units are documents (structure trees), which select a set of content items. Each item can then be output in the order prescribed by the main structure. The output is a tagged document of the type defined by the projection. Moreover, a projection must define how content sharing between several structures should influence the output process.

We have yet implemented the kernel of a hypertext system corresponding to this representation scheme. It has been developed by using the Java programming language. The system is able to take SGML and XML documents as input, to build their representation as a hypertext, to add structuring documents with a number of synthesis tools, and to make projections on a HTML based displaying browser.

This prototype has been used for building a hypertext system containing ten computerised medical records. It has shown the technical validity of the approach, and provides a kernel for future experiments. Although many developments and improvements are still necessary to obtain an entire hypertext system comparable to the Hospitext prototype, it has yet provided efficient solutions to the problems presented above with the Hospitext approach.

The next stage of our project is to build an experiment comparable in volume to what was made for developing the Hospitext system. We intend to build a prototype containing hypertextual medical records based on the scheme proposed in this paper. Automatic synthesis tools implemented in the Hospitext prototype will be implemented in this new architecture. New synthesis tools will be experimented. More precisely, we will provide the user with the possibility of specifying personalised syntheses based on several levels of structures. They could permit, for example, to request syntheses of all the biological results appearing in the introduction or conclusion of a given patient report. They could also make it possible to build synthesis documents corresponding to given annotations such as: "all the bacteriological reports tagged as important", for example.

>Return to the home page of Pierre Morizet-Mahoudeaux