‹header›

‹date/time›

Click to edit Master text styles

Second level

Third level

Fourth level

Fifth level

‹footer›

‹#›

Screen readers enable the web to be browsed by people who can’t see a visual display. They convert content that that is often designed to be viewed visually to either sound or refreshable Braille.

Images can pose a problem because they are inherently visual but often convey important information. On the web, the solution standardized by the World Wide Web Consortium is to provide alternative text for images that can be read in place of the image. If there is no alternative text, then the screen reader can do several different things dependent on the particular screen reading product and user settings. Options include skipping the image, reading the image filename, or, if the image is a link to another page, reading the URL of the linked page.

Thank you. My name is Jeffrey P. Bigham and today I’ll be presenting work that I’ve done with the WebInSight project on making images accessible which is joint work with my advisor Richard Ladner and others at the University of Washington Computer Science & Engineering Department.

In this talk I first hope to convince you that images that lack alternate text are pervasive on the web. Then I’ll describe the WebInSight system which calculates alternative text for web images and inserts it on-the-fly as users browse the web.

The WebInSight system is able to provide alternative text for many images that don’t have it already.

Now that we’ve seen some background material that shows us what the problem is, I’ll discuss some web studies that we did to show that the lack of alternative text is pervasive on the web.

Web designers aren’t generally malicious and they usually don’t actively avoid assigning alternative text or purposely assign incorrect alternative text. Assigning alternative text is often a subjective process and many web designers simply don’t know what alternative text to provide. Because the majority of web designers are not blind, they don’t see the immediate reward for providing alternative text.

Complicating matters, some images need alternative text and some images don’t. If an image conveys information to a sighted user then it should be assigned alternative text that conveys that information to a blind user. If an image does something when you click on it, then it needs alternative text.

Conversely, if it’s purpose is inherently visual – if it’s used for decorative or layout purposes – then it should be assigned a zero-length alternative text. Simply omitting the alt attribute for such images is not good because the screen reader does not know the significance of the image in this case and will announce the image. A common example of this are images that are used for spacing on web pages. Often these images are used 10s ot 100s of times on a single web page. Forgetting to assign zero-length alternative text may cause the user to hear “spacer.gif” repeated multiple times on that page.

Using our subjective idea of what significance is as a guide, we presented an automatic test that correctly determines significance in nearly all cases.

First, some previous studies have shown that less than half of images are assigned alternative text.

Some look only at the percentage of images that have an alt attribute defined and ignore significance.

A study conducted by Petrie et al. manually labeled significance, but only evaluated 100 manually-selected web pages.

By calculating significance automatically, we were able to consider many more web pages and explicitely consider the popularity and importance of these web pages. What is important isn’t just how many images on the web are provided alternative text, but how many of the images that a user views are provided alternative text.

In our first study, we examined all web traffic generated by members of the CSE department at the University of Washington. We counted images viewed multiple times as separate images, in effect weighting the contribution of each image by its popularity. From this we saw that 40.8% of the approximately 12,000,000 images viewed in a week were significant – of these, 63.2% were assigned alternative text.

We also considered 5 important groups of web sites and measured the percentage of significant images labeled. These included the 500 high-traffic websites, the 136 computer science department homepages, and the top 100 international universities. We also considered the 137 federal agencies and 51 U.S. States plus D.C. which we saw as an important group because it shows what effect legislation can do because both of the latter groups are compelled to provide accessible information by Section 508.

In order to help ameliorate this problem, we developed a system that automatically labels many images and inserts these labels into web pages on the fly. The obvious question is where do labels come from? We present 2 fully automatic methods and 1 method that leverages recent work that compels humans to label images.

The first method is Context Labeling. The insight here is that many important images are links and the pages they link to often serve as good alternative text for the images. In this labeling method we simply follow image links and use important fields such as the title or heading tags to construct labels.

The second method is OCR Labeling. Many images on the web contain text, but screen readers cannot directly use it in its graphic form. Optical Character Recognition works well when applied to black text on white backgrounds scanned in at a known resolution, but can have problems with web images. To label images using OCR, we first preprocess the image to convert it to a resolution and format that the OCR prefers. Next, following a process similar to that developed by Jain et al. we convert it to several black & white versions by highlighting major color groups. The OCR can more easily handle these black and white images.

The example here shows that for this image the OCR originally produces no text. Our method produces six new black & white images. The OCR produces no output for 4 of the 6 new images, garbage text for one, and correct text for the last. We verify the output of the OCR using a web-based spelling checker.

The final method of labeling images is motivated by the fact that our automatic methods of labeling images don’t apply to all images and providing alternative text for arbitrary images is currently beyond the state-of-the-art. However, using recent popular games originating from Luis von Ahn et al. at CMU, we can get humans to label such images. These games provide a convenient mechanism to get humans to provide labels quickly and to help ensure that they labels they provide are correct.

Now that you’ve heard the motivation for why we want to add labels to web images without them and a little about how we formulate those labels, I’m going to briefly discuss the WebInSight system.

The WebInSight system dynamically inserts alternative text into web pages and coordinates the labeling sources that I just described. The main concerns we have are that it avoid harming the user experience and maintain the security and privacy expectations of the user.

The important part of any system like this is how well it works. We think it does a good job.

To measure the performance of our system, we attempted to label the images contained on the pages in our web studies using the system. These tests were done completely automatically and, therefore, we used only the context and OCR labeling methods.

The system was able to label 43.2% of the unlabeled significant images and through manual evaluation we determined that it provided correct labels 94.1% of the time. Therefore, the recall of the system is approximately 40%, but the precision quite high at nearly 95%. Although, we’d like our recall to be higher, we’re encouraged that we are able to provide alternative text to so many images that need it but don’t and do so with very high precision.

If we return to the homepage of the Yale Alumni Association, you’ll see that the system was able to correctly label most of the important images on the site. Here it has relied mainly on the labels produced by the context labeler and has provided valuable alternative text for 18 of the 21 images on this page that link to another page.

In conclusion, we’ve presented a system that can dynamically provides alternative text for images on the web as users browse to increase web accessibility.

While the current WebInSight System involves only a client interface, our goal is to attack inaccessibility with a dual approach: tools for users that help them alter their environment to increase web accessibility and tools for content producers that help them create more accessible content. Our belief is that many of the techniques for users can be adapted to help producers.

While we’ve been consulted from the beginning with users of screen readers, we’re currently designing user studies that will attempt to measure what users want out of a system like ours, understand how we can provide it, and how all of this can be consistent with the expected user experience.

At the same time, we’re always trying to improve our system with better labeling modules. We’re currently working on a version of our system that is implemented as a Firefox extension to allow users increased privacy and security. Finally, we’re exploring ways to move beyond images to content restructuring and handling dynamic content.

We believe that at the producing side of content accessibility the primary concern remains understanding the user. What forces create inaccessibility and how can we use information about the motivation of web designers to create tools that will help them make more accessible content. For the foreseeable future people will be better at creating accessible content than our automatic methods but they need tools that better support this. This is an incredibly important time for developing information accessibility tools. Even as the Internet settles into its vital role in society, we’re still dealing with the old struggles of images without alternative text and content that is structured inconveniently for accessibility tools. The web is currently undergoing another revolution that will bring dynamic content and new web applications, both of which pose even harder problems for accessibility. I hope we will be there to meet the challenge.

I encourage you to visit our website at http://webinsight.cs.washington.edu/ for more information about this and other WebInSight projects.

Thank you.