Monday, April 7, 2008

Mono UIAutomation Team: Getting Specific

I have been spending most of my time until now getting to know more about Accerciser and AT-SPI (see this previous post for more information on those). As a byproduct of studying and working with the Accerciser code, I have also been able to work quite a bit more familiar with GTK+, which has been nice.

Sandy, Knocte, and Mike Gorse have all been pumping out some code for the project the last few weeks. We also interviewed a candidate for the team last week. At the end of the week, I recognized that I better get a clear vision of our project and goals so I could assess the work that would need to be done from a QA perspective and perhaps start working on some relevant tools, patches, or scripts.

In addition to what I had learned from the Python Powered Accessibility article (I recommend reading it before you go on if you haven't already done so), this is what I knew about the project:


Granted, this might mean a lot for someone who had worked with accessibility in Linux before, but it didn't mean a whole lot to me. Not enough to have a clear vision, anyway. I wanted to get a really clear picture, so first I turned to Wikipedia and the MSDN documentation to get some fundamental knowledge and came up with the following definitions:

Managed Code - executes under the management of Microsoft's CLR virtual machine in the .NET framework, or another similar virtual machine.

CLR - common language runtime, virtual machine component of Microsoft's .NET initiative. The CLR runs a form of bytecode called CIL (common intermediate language).

CIL - previously known as MSIL. Stack-based object-oriented assembly language executed by a virtual machine (e.g., CLR)

Unmanaged Code - executed directly by the computer's CPU. The programming language used to create the program determines whether it will run as managed code or not. C and C++ are examples of unmanaged code.

CLI - Microsoft's open specification describes the executable code and runtime environment that form the core of a number of runtimes including the Microsoft's .NET framework, Mono, and Portable.NET

Microsoft UI Automation - a managed code application programming interface (API), exposing user interface controls for test automation and assistive technology. Part of the .NET framework starting at 3.0. Successor of MSAA (Microsoft Active Accessibility)

UIA Clients - applications such as screen readers and testing frameworks written in managed code (e.g., C#/VB).

UIA Providers - UI implementations or application controls such as checkboxes. Written in managed code or C/C++ (unmanaged code).

Olive - set of add-on libraries for the Mono core that bring some new .NET APIs to Mono.

UIA Core - the UIA core masks any differences in the frameworks that underlie various pieces of the UI. For example, the content property of a WPF button, the caption property of a Win32 button, and ALT property of an HTML image are mapped to a single property.

AT - assistive technology. A generic term that includes assistive, adaptive, and rehabilitative devices and the process used in selecting, locating, and using them.

AT-SPI - a toolkit neutral way of providing accessibility facilities in applications. AT-SPI can also be used toa tuomated testing of user interfaces. AT-SPI is currently supported by GTK+2, JAVA/Swing, Mozilla, and StarOffice/OpenOffice. AT-SPI will act as the equivalent of the UIA core.

ATK - accessibility tookit. Developer toolkit that allows programmers to use common GNOME accessibility features in their applications.

ATK<->UIA Bridge - mapping of ATK to the UIA provider APIs.

A lot of these definitions weren't new to me, but it's nice having them all in one place. They didn't exactly answer my questions, but they provided me with enough background to ask some (at least somewhat) intelligent questions, so I took my inquisition to Sandy on IRC:

bgmerrell: from msdn "The UI Automation core masks any differences in the frameworks that underlie various pieces of UI. For example, the Content property of a WPF button, the Caption property of a Win32 button, and the ALT property of an HTML image are all mapped to a single property"
bgmerrell: so i assume we're going to have to do this masking for gtk widgets too, right?
sandy: basically, that's the idea with us implementing providers
sandy: has Calvin shown you his drawing of the client and provider sides?
bgmerrell: this thing? http://www.mono-project.com/files/3/37/Architecture.png
sandy: yeah!
sandy: so here's the deal
sandy: on the provider side, we are using the interfaces defined in UIA to map Winforms and Moonlight to UIA, and then we're mapping UIA to ATK
sandy: then on the client side, we're going to map the UIA client interfaces directly to at-spi
sandy: so we don't do anything with gtk
sandy: because the stack is already implemented there
sandy: gtk apps will show up over at-spi
sandy: and our client-side bridge will let us see them via the UIA client interfaces
sandy: UIA has their "core"
sandy: but we already have the equivalent of that
sandy: it's at-spi
bgmerrell: okay that makes sense. so, at-spi is currently supported by "GTK+ 2, Java/Swing, the Mozilla suite, and StarOffice/OpenOffice.org, and some QT support"
bgmerrell: so theoretically we will be able to control/access any of those to some degree?
sandy: that's right
bgmerrell: sandy: do you have any ideas on what we should do from a QA perspective to test the provider piece? Or do you think we're mostly going to be testing the provider via the client?
sandy: I think testing via accerciser and orca makes the most sense
sandy: for QA purposes
bgmerrell: okay, maybe i'm confused. on the provider side you don't really have an accessible UI because that's implemented on the client side isn't it?
bgmerrell: but you guys are talking about working with Accerciser, so accerciser must see something that's accessible
sandy: yes
sandy: so on the provider side
sandy: we are making winforms apps visible over at-spi
sandy: so existing ATs like Accerciser will be able to see our apps and interact with them
sandy: so I think the end-goal of this year is to make is so you can use existing Linux ATs to access winforms apps
sandy: if I understand correctly
sandy: so then I guess the best QA approach would be to test how well we achieve that goal
sandy: perhaps by writing accerciser or dogtail or whatever scripts that test interaction with winforms apps?
bgmerrell: "so existing ATs like Accerciser will be able to see our apps"
bgmerrell: how does that contrast to what we'll be doing on the client side?
sandy: so once we start working on the client side...
sandy: the goal is that we can write new ATs
sandy: or port new ATs from Windows
sandy: written using the UIA client interfaces
bgmerrell: instead of directly using at-spi?
sandy: then those ATs will be able to see all of the Linux apps that are exposed over at-spi, including gtk, winforms, qt, oo.o, mozilla, etc
sandy: bgmerrell: exactly

So now when I looked at the architecture image, things made a lot more sense.

Blue and green items are existing implementations, beige shows items that our team will implement, and I hope to work on a managed AT next year.

This information made things more things a lot more clear for me, and hopefully will help others that are new to the project and/or accessibility. If there is something that is unclear, please let me know; I would like this to be clear enough that anyone interested in working on the project can have a good fundamental understanding of what we're doing.


2 comments:

Ray Wang said...

yes, it makes sense to me a lot. Thanks for your works :)

Unknown said...

when I see it again, it makes me more understand the project architecture, thanks Brian , it awesome