您的当前位置：首页 An Architecture for Transforming Graphical Interfaces

An Architecture for Transforming Graphical Interfaces

来源：尔游网

W. Keith Edwards and Elizabeth D. MynattGraphics, Visualization, and Usability Center

College of ComputingGeorgia Institute of TechnologyAtlanta, GA 30332-0280

keith@cc.gatech.edu, beth@cc.gatech.edu

ABSTRACT

While graphical user interfaces have gained much popularityin recent years, there are situations when the need to useexisting applications in a nonvisual modality is clear.Examples of such situations include the use of applicationson hand-held devices with limited screen space (or even noscreen space, as in the case of telephones), or users withvisual impairments.

We have developed an architecture capable of transformingthe graphical interfaces of existing applications intopowerful and intuitive nonvisual interfaces. Our system,called Mercator, provides new input and output techniquesfor working in the nonvisual domain. Navigation isaccomplished by traversing a hierarchical tree representationof the interface structure. Output is primarily auditory,although other output modalities (such as tactile) can be usedas well. The mouse, an inherently visually-oriented device,is replaced by keyboard and voice interaction.

Our system is currently in its third major revision. We havegained insight into both the nonvisual interfaces presentedby our system and the architecture necessary to constructsuch interfaces. This architecture uses several noveltechniques to efficiently and flexibly map graphicalinterfaces into new modalities.

KEYWORDS:Auditory interfaces, GUIs, X, visual

impairment, multimodal interfaces.

INTRODUCTION

The graphical user interface is, at this time, the mostcommon vehicle for presenting a human-computer interface.There are times, however, when these interfaces areinappropriate. One example is when the task requires that theuser’s visual attention be directed somewhere other than thecomputer screen. Another example is when the computeruser is blind or visually-impaired [BBV90][Bux86].The goal of providing nonvisual access to graphicalinterfaces may sound like an oxymoron. The interface designThis paper was presented at the UIST ‘94conference and is included in theproceedings forUIST ‘94: The SeventhAnnual Symposium on User InterfaceSoftware and Technology ConferenceProceedings, November 1994.

issues of translating an interactive, spatially presented,visually-dense interface into an efficient, intuitive and non-intrusive nonvisual interface are numerous. Likewise, thesoftware architecture issues of monitoring, modeling andtranslating unmodified graphical applications are equallycomplex.

The typical scenario to providing access to a graphicalinterface is as follows: While an unmodified graphicalapplication is running, an outside agent (or screen reader)collects information about the application interface bywatching objects drawn to the screen and by monitoring theapplication behavior. This screen reader then translates thegraphical interface into a nonvisual interface, not onlytranslating the graphical presentation into an nonvisualpresentation, but providing different user input mechanismsas well.

During UIST 1992, we presented a set of strategies formapping graphical interfaces into auditory interfacesprimarily with the aim of providing access for blind users[ME92]. These strategies, implemented in a system calledMercator, demonstrated a scheme for monitoring XWindows [Sch87] applications transparently to both theapplications and the X Windows environment. Guidelinesfor creating a complex auditory version of the graphicalinterface using auditory icons and hierarchical navigationwere also introduced.

Much has happened since November 1992. Both formal andinformal evaluations of the techniques used to createMercator interfaces have offered new insights into the designof complex auditory interfaces. Since these interfacetechniques have generally been welcomed by the blind andsighted communities, they now form a set of requirementsfor screen reader systems.

More significantly, the entire architecture of Mercator hasbeen replaced in response to experiences acquired inbuilding the system as well as by the auditory interfacerequirements. The new architecture is based onaccess hookslocated in the Xt Intrinsics and Xlib libraries. These hooksallow state changes in application interfaces to becommunicated to outside agents such as screen readers,customization programs and testing programs. The Mercatorproject played a significant role in championing anddesigning these hooks which were accepted by the X

Consortium (a vendor-neutral body which controls the Xstandard) and released with X11R6.

In addition to modifying Mercator to use these new hooks,we have restructured Mercator to support a simplified eventmodel which allows nonvisual interfaces to be loaded andcustomized in an extremely flexible manner. A litmus test ofour work is that this architecture is sufficient to model andtransform numerous X applications.

This paper is organized as follows. The following sectionsummarizes the design of Mercator interfaces. It also brieflydescribes some of the modifications to the interfaces as lastreported in this forum. The next section introduces theMercator architecture and the design issues that haveinfluenced the new implementation. We step through theconstruction of our system, highlighting the generalapplicability of our work to other user interface monitoringand manipulation tasks. We close by summarizing the statusof our current system, introducing some of our futureresearch goals and acknowledging the sponsors of ourresearch.

MERCATOR INTERFACES

The design of Mercator interfaces is centered around onegoal--allowing a blind user to work with a graphicalapplication in an efficient and intuitive manner. PreviousMercator papers have discussed the preferred use of audiooutput for North American users, as well as the object modelfor the auditory interface [ME92]. In short, informationabout the graphical interface is modeled in a tree-structurewhich represents the graphical objects in the interface (pushbuttons, menus, large text areas and so on) and thehierarchical relationships between those objects. The blinduser’s interaction is based on this hierarchical model.Therefore blind and sighted users share the same mentalmodel of the application interface--interfaces are made up ofobjects which can be manipulated to perform actions. Thismodel is not contaminated with artifacts of the visualpresentation such as occluded or iconified windows andother space saving techniques used by graphical interfaces.In general, the blind user is allowed to interact with thegraphical interface independent of its spatial presentation.The contents of the application interface are conveyedthrough the use of speech and nonspeech audio. The firstMercator system established the use of auditory icons[Gav] andﬁltears [LC91] to convey the type of an objectand its attributes. For example, a text-entry field isrepresented by the sound of an old-fashioned typewriter,while a text field which is not editable (such as an errormessage bar) is represented by the sound of a printer.Likewise a toggle button is represented by the sound of achain-pull light switch, while a low pass (muffling) filterapplied to that auditory icon can convey that the button isunavailable (this attribute may be conveyed by “graying out”in a graphical interface). Additional design work has led tothe use of auditory cues to conveyhidden information in theauditory interface, such as mapping a pitch range to thelength of a menu [My94][MW94]. Finally, the label for that

button, and any other textual information, can be read by aspeech synthesizer.

At the simplest level, users navigate Mercator interfaces bychanging their position in the interface tree structure viakeyboard input. Each movement (right, left, up or downarrow keys) positions the user at the corresponding object inthe tree or informs the user, through an auditory cue, thatthere are no objects at the requested location. Additionalkeyboard commands allow the user to jump directly todifferent points in the tree structure. Likewise keyboardshortcuts native to the application, as well as user-definedmacros, can be used to speed movement through theinterface.

The navigation model has been extended to work in a multi-application environment. Essentially the user’s desktop is acollection of tree structures. Users can quickly jump betweenapplications while the system stores the focus for eachapplication context. The user’s current focus can also beused to control the presentation of changes to the applicationstate. For example, a message window in an applicationinterface may (minimally) use the following modes ofoperation:

•Always present new information via an auditory cue and

synthesized speech.

•Signal new information via an auditory cue.

•Do not signal the presentation of new information.

These modes of operation can be combined in various waysdepending on whether the application is the current focus.For example, an object can use one mode (always present viaspeech and/or nonspeech) when the application is the currentfocus and use another mode (signal via an auditory cue)when the application is not the current focus. Cues fromapplications which are not the current focus are preceded bya cue (speech or nonspeech) which identifies the sendingapplication.

Mercator interfaces have undergone numerous informalevaluations by blind computer users. Generally, Mercator isdemonstrated alongside commercial screen readers in theexhibit area of conferences devoted to technology andpersons with disabilities. Feedback from blind usersconfirms that the hierarchical navigation and nonspeechauditory cues are intuitive and welcomed by the blind usercommunity. This feedback is significant since Mercator isthe only screen reader which uses these techniques, althoughversions of these techniques are now beginning to appear inother screen reader systems.

The capsule summary of the system requirements for thecreation of Mercator interfaces cluster around three commonthemes.

•The construction of the interfaces must be based on the

semantic organization of the application interface, notjust its graphical presentation.

Internal(Modify Applications)Hybrid(Modify Toolkits)External(Use Only Existing Facilities)Per-application Access SystemsCurrent Mercator SystemEarly Mercator SystemFIGURE 1. A Spectrum of Solutions for Information Capture•The interfaces should be highly interactive and intuitivein the auditory space.•The system must be able to generate efﬁcient andcompelling interfaces for a broad range of applications.The remainder of this paper is devoted to describing the newarchitecture for Mercator, and how it is able to meet theserequirements as well as providing a platform for other userinterface monitoring and modeling tasks.ARCHITECTUREGiven the user interface requirements described above, ourgoal was to build a software architecture capable ofconstructing these interfaces. Any architecture which iscapable of producing such interfaces must address thefollowing issues:•The architecture must gather information with sufﬁcientsemantic content to support our object-based interactionmodel and yet be broadly applicable to a wide variety ofapplications.•Our information capturing techniques must guaranteecomplete knowledge about the application’s graphicalinterface.•The architecture must support both a ﬁne degree ofcontrol to conﬁgure the individual interfaces as well ascoarse-grained control to explore the design space ofnonvisual interfaces.•Since we are providing new interaction techniques tocontrol existing applications, we must support aseparation between the semantic operations that theapplication provides and the syntactic grammar of userinput.In the following sections, we describe an architecture whichaddresses the aforementioned issues. This discussionexplores the range of techniques for information capture thatcan provide varying degrees of semantic information aboutthe graphical interface. Next, we detail the particularinformation capture strategy used by Mercator and discussits applicability to other interface monitoring and modelingtasks.Given the demands of fully monitoring interactive graphicalapplications, we describe the cooperation between thefunctional components of our system as it dispatchesmultiple forms of input from both the applications and theuser. Next, we provide an overview of our modelingtechniques for representing application interfaces andgeneric text, as well as strategies for handling potentialinconsistencies in the data store.We explain how our design allows us to provide highlyflexible and dynamic nonvisual interfaces as well as aflexible and powerful event model which can represent userinput as well as application output. Finally, we describe theinput and output mechanisms in our system which supportinteractivity in the nonvisual domain, and provide aseparation between syntax and semantics of applicationcontrol.Information CaptureWhen we began our work on this project, it became clearthat there is, in fact, a spectrum of solutions for capturinginformation from applications. Along this spectrum we seeessentially a trade-off between transparency and thesemantic level of the information available to an externalagent (see Figure 1).At one extreme of the spectrum, we have the option ofdirectly modifying every application so that it providesinformation about its state to the external agent. While thisapproach provides the highest possible degree of semanticinformation about what the application is doing, it iscompletely non-transparent: each application must berewritten to be aware of the existence of the external agent.Obviously this end of the spectrum serves as a referencepoint only, and is not practical for a “real world” solution.At the other extreme of the spectrum we can rely only on thefacilities inherent in whatever platform the application wasbuilt on. In our design space, this platform meant the XWindow System, specifically applications built using theXlib and Xt libraries. This use of existing facilities isessentially the approach taken by the first version ofMercator. Our system interposed itself between the X serverand the application and intercepted the low-level X protocolinformation on the client-server connection. This approachhad the benefit that it was completely transparent to both theapplication and the X server (indeed, it was impossible foreither to detect that they were not communicating with a“real” X client or server), but had a rather severe limitation:the information available using this approach was extremelylow level. Essentially we had to construct a high-levelstructural model of the application from the low-level pixel-oriented information in the X protocol. Our first system alsoused the Editres widget customization protocol whichappeared in X11R5 [Pet91], but we found that Editres wasinsufficient for all our needs. The use of these approacheswas the only practical solution available to us in our firstsystem, however, because of our requirement for applicationtransparency.There is a third possible solution strategy which lies near themiddle point of these two extremes. In this strategy, theunderlying libraries and toolkits with which the applicationis built are modified to communicate changes in applicationstate to the external agent. This approach is not completelytransparent--the libraries must be modified and applicationsrelinked to use the new libraries--but all applications builtwith the modified libraries are accessible. The semantic levelof information available to the external agent depends on thesemantics provided by the toolkit library.Modiﬁcations to Xt and XlibDuring our use of the first version of Mercator, it becameclear that the protocol-level information we wereintercepting was not sufficient to build a robust high-levelmodel of application interfaces. We began to study a set ofchanges to the Xt Intrinsics toolkit which could provide theinformation needed to support a variety of external agents,including not only auditory interface agents, but also testers,profilers, and dynamic application configuration tools.Originally our intention was to build a modified Xt librarywhich could be relinked into applications to provide access.Through an exchange with the X Consortium, however, itbecame clear that the modifications we were proposingcould be widely used by a number of applications. As aresult, a somewhat modified version of our “hooks” into Xtand Xlib are a part of the standard X11R6 release. Aprotocol, called RAP (Remote Access Protocol) uses thesehooks to communicate changes in application state to anexternal agent. RAP also provides communication from theexternal agent to the application.This section describes (in fairly X-specific terms) the designof the toolkit modifications which are present in X11R6. Wefeel that the modifications are fairly complete and can serveas a guideline for developers of other toolkits who wish to beable to communicate information about interface statechanges to external agents. Further, these hooks can be usedto implement all of the functionality of the Editres systemused in our previous implementation; the new hooks andRAP subsume the configuration capabilities of Editres.Table 1 presents the basic messages in RAP.MessageDescriptionGetResourcesRetrieve the resources associated with aparticular widget.QueryTreeRetrieve the widget hierarchy of theapplication.GetValuesRetrieve the values of a list of resourcesassociated with a given widget.TABLE 1. Remote Access ProtocolMessageDescriptionSetValuesChange the values of a list of resourcesassociated with a given widget.AddCallback“Turn on” a particular callback in theHooks Object.RemoveCallback“Turn off” a particular callback in theHooks Object.ObjectToWindowMap an object ID to a window ID.WindowToObjectMap a window ID to an object ID.LocateObjectReturn the visible object that is under thespeciﬁed X,Y location.GetActionsReturns a list of actions for a widget.DoActionInvoke an action on a widget (may not beavailable in all implementations).CloseConnectionShut down the RAP connection.BlockStall the client so that an external agentcan “catch up.”ObjectCreatedInform an agent that a new widget hasbeen created.ObjectDestroyedInform an agent that a widget has beendestroyed.ValueChangedInform an agent that a resource haschanged.GeometryChangedInform an agent that a widget’s geometry(size, position) has changed.Conﬁguration-Inform an agent that a widget’s conﬁgu-Changedration (map/unmap state) has changed.TABLE 1. Remote Access ProtocolThe hooks consist of a new widget, called the Hook Object,which is private to Xt. The hook object maintains lists ofcallback procedures which will be called whenever widgetsare created or destroyed, their attributes (resources) arechanged, or their configuration or geometry is updated.Some bookkeeping data is also maintained in the HookObject. A new API has been added to Xt which allowsapplication programmers to retrieve the Hook Objectassociated with a connection to the X server.All of the Xt Intrinsics routines which can create or destroywidgets, or modify widget state have been modified to callthe appropriate callback functions that have been installed inthe Hook Object. By default, no callbacks are installed in theHook Object. Instead, a set of callbacks is installed in theHook object when an application is initially contacted by anexternal agent such as Mercator. These callback routinesimplement the application-to-Mercator half of the RAPprotocol which informs Mercator about changes inapplication state.Protocol Setup. The Xaw Vendor Shell widget (essentiallythe “outer window” interface object) has been modified tosupport the initial “jump-start” phase of the connection setupprotocol. External agents (including testers and profilers--X ServerPseudo ServerX ClientRules EngineEditres MgrSound MgrOld ArchitectureNew ArchitectureModel MgrX ApplicationWidget SetToolkit AgentXt IntrinsicsXlibRules EngineX ProtocolSound MgrModel MgrX ServerMercator componentsNetwork communicationInter-object communicationFIGURE 2. Old and New Mercator Architecturesnot just Mercator) pass a message to an application via aselection. This message contains the name of the protocolthe external agent wishes to speak. Code in the Vendor Shellcatches the message, and searches a table for the namedprotocol. If found, an initializer routine will be called whichwill install the callback routines appropriate for that protocolin the Hooks Object. If the application needs to be able toreceive messages from the external agent (in addition tosimply passing information out via the hook callbacks), thenit can create a communications port with an associatedmessage handler for incoming messages from the externalagent. This port is used for Mercator-to-applicationmessages in the RAP protocol.Replacing the Pseudoserver. We have also argued for aapplication are passed via a function installed in an already-extant client-side extension.The modifications described above provide a generalframework to allow a wide variety of external agents tocooperate closely with applications; these modificationsconsist of a number of small changes to Xt and Xlib whichare, for the most part, invisible to applications which do notcare about them. They allow protocols to be automaticallyinstalled and thus our goal of transparency has beenachieved: any application built with the X11R6 libraries willbe able to communicate with an external agent.It is our conjecture that this foundation will be usable byothergeneric tools that will work across a wide array ofapplications. Although it may be possible to instrument asingle application with, say, a profiling system, thisinfrastructure makes it possible to construct a genericprofiler which will work across applications. Given thesignificant requirements that screen readers have formonitoring and interacting with graphical applications, it isreasonable to conclude that this architecture will besufficient for less demanding tasks such as profiling,configuring or monitoring a graphical application. Since thesystem is built on standard X hooks which minimally impactthe performance of a running application, this platformshould be well-suited to other commercial and researchendeavors.change to the lower-level Xlib library which has beenadopted by the X Consortium for inclusion in X11R6. Thischange is an enhancement to the client-side extensionmechanism in X which allows a function to optionally becalled just before any X protocol packets are sent from theapplication to the server. A function can be installed in this“slot” by the protocol initializer which will pass to anexternal agent the actual X protocol information beinggenerated by an application. This modification to Xlib servesas a “safety net” for catching any information which cannotbe determined at the level of Xt. This hook in Xlib allows usto operate without the use of a pseudoserver system, unlikeour previous implementation. Events from the server to theControl Flow

Mercator must constantly monitor for changes in the statesof the graphical applications as well as for user input. Thenew techniques for information capture mean that we nolonger have to be slaved to the X protocol stream. In ourearlier pseudoserver-based implementation, care had toconstantly be taken to ensure that Mercator never blocked,since blocking would have stalled the X protocol stream andeffectively deadlocked the system.

In our new system, using the Xlib hook, we have a moreflexible control flow. We can generate protocol requests tothe server at any time, and engage in potentially longcomputation without having to worry about deadlock. In ourprevious implementation a fairly complex callback systemwas constructed so that computation segments could bechained together to prevent potential deadlock situations.We avoid potential deadlocks in the new system because,unlike with the pseudoserver implementation, the clientapplications can continue running even if Mercator isblocked. The Xlib hook approach gives us the benefits of thepseudoserver approach--access to low-level information--without the costs associated with pseudoservers.

All components of Mercator which perform I/O aresubclassed from a class called FDInterest. Each instance ofthis class represents a connection (over a file descriptor) tosome external component of the system. For example, aseparate FDInterest exists for each connection Mercator hasto an application, each connection to an audio server, and soon. Each FDInterest is responsible for handling I/O to theentity it is connected to. This architecture makes the divisionof I/O responsibility much cleaner than in our older system.Figure 2 shows the older pseudoserver-based Mercatorarchitecture alongside the newer architecture which replacesthe pseudoserver with the Xlib hook.

The overall system is driven by RAP messages to and fromthe client applications. For example, whenever anapplication changes state (pops up a dialog box, forexample), a RAP message is generated from the applicationto Mercator. It is received in Mercator by the FDInterestconnected to that client. The FDInterest determines the typeof the message and dispatches it according to a hard-codedset of rules which keep our model of the application interfaceup-to-date. For example, if a new widget is created, theFDInterest generates commands to add the new widget,along with its attributes, to our model of the interface.

Interface Modeling

Application interfaces are modeled in a data structure whichmaintains a tree for each client application. The nodes in thistree represent the individual widgets in the application.Widget nodes store the attributes (orresources) associatedwith the widget (for example, foreground color, text in alabel, currently selected item from a list).

There are three storage classes in Mercator: the ModelManager (which stores the state of the user’s desktop in itsentirety), Client (which stores the context associated with a

single application), and XtObject (which stores the attributesof an individual Xt widget). Each of these storage classes isstored in a hashed-access, in-core database for quick access.Each storage class has methods defined on it to dispatchevents which arrive while the user’s context is in that object.Thus, it is possible to define bindings for events on a global,per-client, or per-object basis.

Other objects in Mercator can access this data store at anytime. A facility is provided to allow “conservative retrievals”from the data store. A data value marked as conservativeindicates that an attempt to retrieve the value should result inthe generation of a RAP message to the application toretrieve the most recent value as it is known to theapplication. This provides a further safety feature in casecertain widgets do not use approved APIs to change theirstate.

Text is stored in a special entity called a TextRep object.TextReps are created automatically whenever text is drawnto a window for the first time. TextReps are associated withobjects in the data store and can be accessed by othercomponents of Mercator to retrieve an up-to-date account ofthe text present in a given window. The Xlib hook keeps thisinformation current; the text model maintains consistencyover scrolling, font changes, and refreshes.

Embedded Computation to Dynamically ConstructInterfaces

One of the more novel concepts in the Mercatorimplementation is its use of embedded interpreters todynamically build the interface “on the fly” as the user isinteracting with the application. Unlike graphical interfaces,where there is a constant, usually-static, presentation of theinterface on the screen, auditory interfaces are much moredynamic. In Mercator, the auditory presentation for a givenobject is generated at run-time by applying a set oftransformation rules to the application model. These rulesare solely responsible for producing the user interface (playsounds, change the user’s current context, and so on). Nointerface code is located in the core of Mercator itself.In the earlier implementation, these rules were hard-codedinto the system in a stylized predicate/action notationexpressed in C++. In the current implementation, all of theinterface rules are expressed in an interpreted languagewhich is parsed and executed as users interact with theapplication. The interpreted approach has the benefit that wecan quickly experiment with new auditory interfaces withouthaving to recompile the system. It also allows easycustomization of interfaces by users and administrators.The interpreted language is based on TCL (the ToolCommand Language [Ous90]), with extensions specific toMercator. TCL is a light-weight language complete withdata types such as lists and arrays, subroutines, and a varietyof control flow primitives, so Mercator rules have availableto them all of the power of a general-purpose programminglanguage. Table 2 presents some of the Mercator-specificextensions to TCL.

Key WordDescriptioncurrentobjectGet or set the current object.currentclientGet or set the current client.callactionFires the named action procedure.playsoundPlays a sound, allowing control over vol-ume, mufﬂing, rate, etc.addactionMake the named action procedure callablefrom C++.setfocusMoves the pointer and sets the focus to thenamed object.buttonSynthesize either a button press or releasefrom the mouse.speakSend a string to the speech synthesizer. Con-trol over voice and interruptibility is pro-vided.sreaderInvoke screen reader function on the speci-ﬁed object (word, line, paragraph reading,etc.)widgetRetrieve information from the data storeabout widget hierarchy and state.bindkeyShortcut for bindevent which associates thenamed action procedure with a keypress.bindeventAssociates an action with an event type.keySynthesize a key press or key release eventto the object which currently has the focus.resourceGet or set the value of a resource on thespeciﬁed object.TABLE 2. Mercator Language ExtensionsWhen Mercator is first started, a base set of rules is loadedwhich provides some simple key-bindings, and the basicnavigation paradigm. Each time a new application is started,Mercator detects the presence of the application, retrieves itsname, and will load an application-specific rule file if itexists. This allows an administrator or user to configure aninterface for a particular application according to theirdesires.Event/Action ModelAfter start-up time, rules are fired in response to Mercatorevents. Mercator events represent either user input or achange in state of the application (as represented by a changein the interface model). Thus, we use a traditional event-processing structure, but extend the notion of the event torepresent not just user-generated events, but alsoapplication-generated events. Events are bound to actions,which are interpreted procedures which are firedautomatically whenever a particular event type occurs.Action lists are maintained at all levels of the storagehierarchy, so it is possible to change event-action bindingsglobally, on a per-client basis, or a per-widget basis.As stated before, actions are fired due to either user input, ora change in the state of the application. In the second case,we fire actions at the point the data model is changed whichensures that application-generated actions are uniformlyfired whenever Mercator is aware of the change. The call-outto actions occurs automatically whenever the data store isupdated. This technique is reminiscent of access-orientedprogramming systems, in which changing a system variablecauses some code to be run [Ste86].

Here is an example of an extremely simple action. Thisaction is defined as a TCL procedure with four arguments:the name of the application, its class, the initial currentlocation within that application, and an ID token which canbe used to programmatically refer to the application. Whenthe action fires, speech output is generated to inform the userof the presence of the new application, and the user’s contextis changed to the new application.

proc NewApplication {name class loc id} {

speak “Application $name has started.”currentclient $id

speak “Current location is now $loc.”}

This action procedure is first made visible to the C++ side ofMercator through a call toaddaction.Addaction is alanguage extension we have added which “publishes” thename of a TCL procedure so that it may be called fromcompiled code. After this, bindevent is called to bind theaction procedureNewApplication with the event type(also calledNewApplication) which is generatedwhenever a new application is started:

addaction NewApplication

bindevent NewApplication NewApplication

Bindings can be changed at any time, and rules canthemselves change the event to action association.

Output

All output to the user is generated through the interfacerules. The “hard-coded” portions of Mercator do notimplement any interface. This reliance on interpreted code toimplement the interface makes it easy to experiment withnew interface paradigms.

Interface rules generate output by invoking methods on thevarious output objects in the system. Currently we supportboth speech and non-speech auditory output, and we arebeginning to experiment with tactile output. The Speechobject provides a “front-end” to a speech server which canbe run on any machine on the network. This server is capableof converting text to speech using a number of user-definable voices.

The Audio object provides a similar front-end to a non-speech audio server. The non-speech audio server is capableof mixing, filtering, and spatializing sound, in addition to anumber of other effects.

Both the Speech and Audio objects are interruptible, whichis a requirement in a highly interactive environment.

Stage 1Stage 2Rules EngineStage 3“Select Print!”Select event causes action to fireApplicationMouseButton1DownMouseMotionMouseButton1UpSelectAction {Look up current objectRetrieve translation tableresourceDetermine which events maponto selection for theobjectGenerate those events}FIGURE 3. Stages of Input ProcessingInputUser input handling can be conceptually divided into threestages (see Figure 3). At the stage closest to the user, actualuser input events are received by Mercator. These eventsmay be X protocol events (in the case of key or buttonpresses) or events from an external device or process (suchas a braille keypad or a speech recognition engine).At the middle stage, the low-level input events are passed upinto the rules engine where they may cause actionprocedures to fire. The rules fired by the input may cause avariety of actions. Some of the rules may cause output to anexternal device or software process (for example, brailleoutput or synthesized speech output). Some rules, however,will generate controlling input to the application. This inputis passed through to the third stage.At the third stage, the stage closest to the application,Mercator synthesizes X protocol events to the application tocontrol it. These events must be in an expected format for thegiven application. For example, to operate a menu widget,Mercator must generate a mouse button down event, mousemotion to the selected item, and a mouse button releasewhen the cursor is over the desired item. Note that the actualevent sequence which causes some action to take place in theapplication interface may be determined by user, application,and widget set defaults and preferences. Thus Mercator mustbe able to retrieve the event sequence each interfacecomponent expects for a given action. This information isstored as a resource (called the translation table) in eachwidget and can be retrieved via the RAP protocol. The figureshows an example of Mercator generating the eventsrequired to select a menu item (“Print”) when a spokenselection command is detected.We have opted to synthesize input to the application bydirectly generating the low-level events which would berequired if a user with a mouse and keyboard were operatingthe interface. One possible alternative, providing messagesin the RAP protocol to operate interface components, wasrejected as not practical because it is possible that someapplications rely on actually receiving “real” events.Consider the example of selecting a push button. The user’scontext is at a push button widget in the applicationinterface. The actual input the user provides to select thepush button may be a spoken command or a keypress.Mercator detects the user input and fires any actionprocedures bound to this input. If the input provided by theuser is bound to an action which attempts to select thecurrent object, this action will query the current widget toretrieve its translation table. The translation table will beparsed to determine the events which must be generated tocause an actual “select” action to take place in theapplication interface. The action will then generate theseevents to the widget.We are currently using the XTEST X server extension togenerate these events to the widget (see Figure 2).STATUSThe client-side library hooks have been implemented and arepresent in the X11R6 release from the X Consortium. TheRAP protocol is currently not shipped with X11R6 pendinga draft review process; we hope that in the near future RAPwill ship with the standard distribution of X.The various components of Mercator are written in C++; thecurrent system is approximately 16,000 lines of code, notcounting I/O servers and device-specific modules. Ourimplementation runs on Sun SPARCstations running eitherSunOS 4.1.3 or SunOS 5.3 (Solaris 2.3). Network-awareservers for both speech and non-speech audio have beenimplemented as Remote Procedure Call services, with C++wrappers around the RPC interfaces.

The speech server supports the DECtalk hardware and theCentigram software-based text-to-speech system andprovides multiple user-defined voices. The non-speech audioserver controls access to the built-in audio hardware andprovides prioritized access, on-the-fly mixing, spatializationof multiple sound sources, room acoustics, and severalfilters. In our previous release, the non-speech audio serverran only on DSP-equipped workstations (either a NeXTmachine or a Sun SPARCstation equipped with an ArielDSP board). The current system will run on any SunSPARCstation, although a SPARCstation 10 or better isrequired for spatialization [Bur92].

We are undertaking a commercialization effort to bring ourwork to the potential users of such a system.

FUTURE ISSUES

There are several new directions we wish to pursue with boththe Mercator interface and the Mercator implementation.Our current implementation of the Mercator core is single-threaded. While the various I/O servers are implemented asseparate heavy-weight processes, the actual applicationmanager itself consists of only one thread of control. Thiscan create problems in the case of, for example, non-robustinterpreted code. Any interpreted code which loopsindefinitely will effectively “hang” the system. We believethat a multithreaded approach will provide more modularity,robustness, and performance to the system.

We have also begun to experiment with voice input to thesystem. We are using the IN3 Voice Control System, fromCommand Corp, which is a software-only speechrecognition system for Sun SPARCstations. Recognizedwords from the voice input system are passed into the rulesengine just like any other events: keypresses, mouse buttonpresses, and so on. We are investigating high-levelabstractions for input and output so that users can easilyselect which I/O media they wish to use on the fly.Essentially these new abstractions would add a level ofindirection between the low-level hardware-generated eventsand the tokens which the rules engine uses to fire actionprocedures.

ACKNOWLEDGEMENTS

This work has been funded by Sun Microsystems and theNASA Marshall Space Flight Center. We are grateful tothem for their generous support.

REFERENCES

[BBV90]

L.H. Boyd, W.L. Boyd, and G.C. Vander-heiden. The graphical user interface: Crisis,danger and opportunity.Journal of VisualImpairment and Blindness, pages 496–502,December 1990.

[Bur92]

David Burgess. Low Cost Sound Spatilization.InUIST ‘92: The Fifth Annual Symposium onUser Interface Software and Technology andTechnology, November 1992.

[Bux86]

William Buxton. Human interface design andthe handicapped user. InCHI’86 ConferenceProceedings, pages 291–297, 1986.

[Gav]

William W. Gaver. The sonicﬁnder: An inter-face that uses auditory icons.Human ComputerInteraction,4:67–94, 19.

[LC91]

Lester F. Ludwig and Michael Cohen. Multidi-mensional audio window management.Inter-national Journal of Man-Machine Studies,Volume 34, Number 3, pages 319-336, March1991.

[My94]

Mynatt, E.D., “Mapping GUIs to AuditoryInterfaces, In Kramer G. (ed), Auditory Dis-play: The Proceedings of ICAD ‘92. SFI Stud-ies in the Sciences of Complexity Proc. Vol.XVIII, Addison-Wesley, April 1994.

[ME92]

Elizabeth Mynatt and W. Keith Edwards. Map-ping GUIs to Auditory Interfaces. In UIST ‘92:The Fifth Annual Symposium on User InterfaceSoftware and Technology Conference Proceed-ings, November 1992.

[MW94]

Elizabeth Mynatt and Gerhard Weber. Nonvi-sual Presentation of Graphical User Interfaces:Contrasting Two Approaches,” in theProceed-ings of the ACM Conference on Human Fac-tors in Computing Systems, 1994.

[Ous90]

J.K. Ousterhout. “TCL: An Embeddable Com-mand Language,” in theProceedings of the1990 Winter USENIX Conference, pp. 133-146.

[Pet91]

Chris D. Peterson. Editres-a graphical resourceeditor for x toolkit applications. InConferenceProceedings, Fifth Annual X Technical Confer-ence, Boston, Massachusetts, January, 1991.[Sch87]

Robert W. Scheiﬂer. X window system proto-col speciﬁcation, version 11. MassachusettsInstitute of Technology, Cambridge, Massa-chusetts, and Digital Equipment Corporation,Maynard, Massachusetts, 1987.

[Ste86]

Steﬁk, M.J., Bobrow, D.G., and Kahn, K.M.“Integrating Access-Oriented Programminginto a Multiparadigm Environment.”IEEESoftware, 3,1, IEEE Press, January, 1986, 10-18.

因篇幅问题不能全部显示，请点此查看更多更全内容

查看全文