Accessing Web Based Documents Through a Tree Structural Interface

Esmond Walshe

Barry McMullin

eAccessibility Lab

Research Institute for Networks and Communications Engineering (RINCE)

Preprint: July 2004
Presented at ICCHP 2004
Final version to appear in Proceedings of ICCHP 2004, Springer Lecture Notes in Computer Science (LNCS).
[Preprint also available in PDF Format.]



In addition to the intrinsic accessibility difficulties posed by a graphical user interface for blind web users, specific usability penalties arise from the serial nature of blind adapted interfaces (speech, braille). We propose an approach to the design of a browser tailored for blind users which may mitigate some of these difficulties. We suggest this may be achieved through maximal exploitation of structural HTML mark-up to support highly dynamic and interactive user control of content rendering.


The graphical user interface (GUI) has become the predominant mode of human computer interaction in the last decade. The ability to simply point and click on a screen element to perform simple tasks negates the necessity for the (visually enabled) user to learn numerous commands to achieve similar effects. However, adaptation of such interfaces for blind users poses many difficulties (Barnicle, 1999). So-called "screen-readers" attempt to lexically analyse the primary visual interface structure to determine relationships between objects, and thereby generate a secondary synthesized speech or braille output. Nonetheless, it is difficult to efficiently convey inherently spatial information in a non-visual manner. Thus, the incorporation of a primary non-visual interface has been advocated (Savidis & Stephanidis, 1998).

This paper proposes a primary non-visual (speech-based) interface for a web browsing application. This is an alternative to current approaches which layer a secondary screen reader interface on top of a generic, primarily GUI-based, browser. We are interested in the user efficiency of primary, non-visual, interfaces. Specifically, if a browser were designed from the bottom up, with a primary optimization for blind users, what would its interface characteristics be? How would they differ from those of present solutions?

It is well known that effective web accessibility for users with disabilities depends on a combination of careful server-side content design with tailored client side browser technology. Unfortunately, relatively few sites currently deliver adequately accessible content (McMullin, 2002). However, it is expected that improved training and education of web developers and authors, together with a variety of legal measures, will lead to progressive improvement in this situation. In anticipation of this, we here assume the availability of fully accessible web content and consider the design of a web browser for blind users which would maximally exploit the accessibility features of such content.

Current Techniques

Human vision supports highly efficient scanning and integration of complex, multi-dimensional, scenes and images. By contrast, audio perceptual channels (such as speech) are one-dimensional or serial in nature, and intrinsically less efficient. This imposes a significant performance penalty on access to electronic information for blind users. A variety of ad hoc techniques have emerged in hybrid GUI/screen reader systems which attempt to mitigate this.

One such approach is to supply additional navigational functions based on direct analysis of the structural mark-up of the page (rather than its visual presentation). For example, Jaws for Windows in conjunction with Microsoft Internet Explorer (we'll term this platform JAWS-IE), allows the user to directly jump from the current position to the next header element, next hyperlink element, or next interactive control, in a HTML page's underlying structure. Another possibility is to generate alternative views of the page, in conjunction with the main page view. Again, JAWS-IE allows the user to create a list of hyperlink elements contained in the page.

Another suggested measure is to analyse the content to automatically create an abridged version--for example, by extracting the sentences containing the most common word trigrams and presenting them as a page summary (Zajicek et al., 1998). The effectiveness of such an approach will necessarily vary depending on the content being analysed.

All of these forms of interaction can signicantly improve efficiency for blind users. However, in current implementations, such functionality is ad hoc and not systematic. Further, the functions may not be well integrated (e.g. co-ordination between multiple "alternative" views is generally complicated).

Proposed Approach

Instead of statically exposing the entire web page to the user in a linearised form, and/or providing a fixed set of alternative partial views, we propose a single integrated page view which can be dynamically and interactively varied based on the page structure. Specifically, the user would be able to systematically and dynamically hide or expose aspects of the page structure and content. We conjecture that this will allow the user to easily vary the granularity or level of page detail being exposed, and thus to alternate efficiently between scanning and drilling down at any given level of the page's hierarchical structure.

Tree Structure

XHTML based pages are naturally and necessarily structured in a hierarchical tree-like arrangement, with the higher-level nodes taking on the form of container elements and the lowest level nodes containing the actual textual content.

We suggest that incorporating the facility to directly navigate this tree-like structure should provide the user with a powerful method for establishing a mental model of the page structure and interacting with it. (The w3c's prototype editor/browser amaya provides an example of such tree-oriented rendering--however, only in the form of a distinct, alternative, page view.)

Coupled with this tree-based navigation, we propose the ability to expand or collapse any node in the structure. Expanding the lowest level tree nodes will result in their textual content being exposed. Thus, both the tree structure and the content will be rendered within one single, integrated, but dynamic page view.

Not all XHTML elements are equally useful for tree based navigation; we propose that tree navigation should be limited to the XHTML block (as opposed to inline) elements.

The page view will thus consist of a combination of tree structural element controls and the textual content from expanded nodes. Prosodic cues and auditory earcons will be used to distinguish block level tree-structure information from inline content, and to indicate the structure of inline elements. However, to avoid adding unnecessary complexity to the user interface, the use of such prosodic and auditory cues must be carefully controlled (James, 1998). The more additional cues utilised the higher the expected learning curve placed on the user.

Dynamic Rendering

The ability to expand or collapse all elements at a given level in the tree, or all elements in a given sub-tree is an important piece of the proposed functionality. In this manner, a user can focus in on a specific section whilst ignoring the content contained in the rest of the page.

In addition to this direct, tree-based, expand and collapse functionality, we intend to provide the facility to dynamically hide or display specific element types. This would provide the user with a systematic ability to constrain the page view by specifying their own criteria. For example a view consisting purely of header elements, or one just containing the page's emphasised text. As this would still act though the one single dynamic page view, there should be no problems with synchronization between views.

Multidimensional Elements

While the overall structure of an XHTML page is a tree, certain elements introduce non-hierarchical relationships. These relationships are typically rendered spatially in visual media. Examples are table and form elements. These have generally posed particular obstacles for blind users.

In the case of tables, it is clearly necessary to be able to navigate directly in two-dimensions--up and down through a column and left or right through a row of cells (cf. the Jaws "table mode"). In addition to this we propose the ability to dynamically hide or expose table content by row or column. (As we are restricting our scope to XHTML basic, the additional complexity of nested tables will not arise.)


Incremental search within a page is an important alternative navigation strategy. We propose a search mechanism which will automatically interact with the dynamic hiding and exposing of elements; and which will have the ability to explicitly search by structure as well as raw element content--for example, constraining a search for text with hyperlink elements or table cells etc.


It is hoped that the approaches outlined here to web page navigation and content presentation will demonstrate a noticeable increase in efficiency for blind users. This should be particularly apparent when accessing larger and/or structurally complex pages. Prototype implementation is underway and user-testing will follow.


Barnicle, K. (1999),
Evaluation of the interaction between users of screen reading technology and graphical user interface elements, PhD thesis, Graduate School of Arts and Sciences, New York.
James, F. (1998),
Lessons from Developing Audio HTML Interfaces, in `Proceedings of the third international ACM conference on Assistive technologies'.
McMullin, B. (2002),
`Users with Disability Need Not Apply? Web Accessibility in Ireland', First Monday 7(12).
Savidis, A. & Stephanidis, C. ( 1998),
`The HOMER UIMS for dual user interface development: Fusing visual and non-visual interactions', Interacting with Computers 11(2), 173-209.
Zajicek, M., Powell, C. & Reeves, C. ( 1998),
Orientation of Blind Users on the World Wide Web, in M. Hanson, ed., `Contemporary Ergonomics', Taylor and Francis, London.


The work described here received financial support provided from AIB PLC. The work was carried out in the Research Institute for Networks and Communications Engineering (RINCE), established at DCU under the Programme for Research in Third Level Institutions operated by the Irish Higher Education Authority.

About the Authors

Mr. Esmond Walshe is currently completing his Ph.D. research studies at the eAccessibility Lab of the Research Centre for Networks and Communications Engineering (RINCE) at Dublin City University (DCU).

Dr. Barry McMullin is a senior lecturer in the School of Electronic Engineering of Dublin City University (DCU), and directs the eAccessibility Lab.

Correspondence Email:


Springer Verlag. Final version to appear in Proceedings of ICCHP 2004, Springer Lecture Notes in Computer Science (LNCS) under Springer Copyright Conditions (PDF Format Link).

Page Administrative Information

  • Level Double-A conformance icon, W3C-WAI Web Content Accessibility Guidelines 1.0
  • Valid HTML 4.01!
  • Valid CSS!