idea for wayland p2p dbus accessibility

for assistive technology providers:

  • assistive technology provider connects to wayland compositor and requests permission for a11y. Since we don't have a good mechanism for permission systems in wayland and also because portals can't be used in this instance, the provider will be considered a privilidged component and generally will be let through. Any additional security mechanisms are not specified here, but are in stead left at the latitude of the compositor, since each compositor has its own mechanisms. However, the only requirement mandatory here is that any such permission confirmation dialog should be accessible to as many users as possible. For visually impaired users, a sound or a pre-recorded explanation, followed by a workflow like press enter if you agree or escape if you don't, would be enough. However, for hard of hearing people, a manually drawn window might work, but it would be possibly better to involve a ui toolkit, thing with which many compositors may not agree. For people with motion difficulties, switch access inside the permission window would go a long way. Because of the difficulties around this, especially in such a specialised environment like a wayland compositor, we can't inforce anything, but in this case, we think a working system is better than a non-working one, so for now they should be just let through
  • when the method call returns an answer, the AT gets a numeric identifier which is the handle to a newly allocated wayland object, instantiated with an assistive technology specific interface, let's call it wl_at_provider. From this moment on, the AT can use only this object for any future operations with the compositor, any other action with accessibility specific API's without the object shall be rejected with a protocol error
  • the AT registers interest for the events it things might be useful to it. While this interest can update in the future by specifying another list, no events will be recieved unless the operation finishes successfully, therefore registering interest is required at all times, even if all events is what the AT wants, explicit is better than implicit. Furthermore, if this is not done in a timely manner, the objected allocated before will be deallocated, the AT will get an error event and the socket connection may optionally be broken. If the connection is broken, the AT will be prevented(by process ID) from connecting again for a finite time periode, the exact value for that period is left to the discression of the compositors, but it must be user configurable and it may not be less than twenty seconds, nor greatter than two minutes
  • After the AT registered what it wants to recieve(this can be keyboard events, touchscreen, gamepad and possibly others depending on compositor), it must inform the compositor of that, by calling a method on the wl_at_provider object it got at the very beginning of the connection. This operation happens atomically, that is, registering interest for a list of events, then indicating the readyness to recieve more and that it does not have any other events it wants to recieve, is considered a single operation by the compositor. This pair of calls can happen multiple times with no limit, however nesting is not allowed, attempting to begin a new interest event list while the previous one did not end should be answered with a protocol error regardless of compositor. While the operation can happen multiple times, no state is inherited between the current operation and any previous one of its kind. In fact, between the call to registering of the interest list and signaling the end, the compositor will not send any key events to the AT provider, regardless if they happened or not, so AT's are incouraged to spend as little time in this section as possible because those events are lost, non-recoverable and already interpreted by the compositor
  • on events like new app is in focus, the event body will contain an ID representing an allocation of the wl_accessible_app interface, see below. Along with that object, the event will also contain a file descriptor for p2p dbus, but can also be any other transport method in the future, as long as both at clients and servers agree on details
  • for events like window resized, moved, etc that pertain to the focused application, the compositor will append the ID of that application's object in the event body, so the AT knows where it's from
  • for global events like mouse moved, they will be reported like inside a normal wayland surface, but this time they won't be restricted to a particular application, the compositor will report those events as they come, since if it got this far, it's considered trusted. Also, if it configured interest for this event, it surely wants absolute screen coordinates, so give it that
  • if the AT requested support for absolute screen positioning, it will also get events about when a window is moved, resized, or changed position in any way. In this event, there will also be a touple containing coordinates of the new position of the top-left corner of the affected window. Combined with the fact that toolkits can report positions of individual controls in the window coordinate space, the AT can calculate absolute positions for all controls it's interested in, as long as it keeps listening for events and updates calculations every time accurately

for accessible applications:

  • application connects to the compositor
  • it registers itself as an accessible application. On completion, it gets an ID to a newlly allocated object representing an instance of the wl_accessible_app interface
  • on the returned object, the application calls a set_descriptor method, passing it a file descriptor.
  • when this method finishes in the compositor, it emits the mirror event, new application in focus, with the same file descriptor registered in this step. Of course, assistive technologies registered in the same session will recieve the event, as well as the file descriptor, then connections should be able to happen normally.

footnotes

  • The dbus information is recieved out of band, which means the compositor doesn't care if the AT has an open file descriptor to an application which is not in focus, in fact it can't provide that guarantee at all. Optionally, applications can choose to close the socket once they aren't in focus, if their developers consider that to be an attack vector, but it's not recommended because some ATs may have special requirements for which they need information from background apps. This should be user configurable if the toolkit decides whether that is to happen, since while it's not a common pattern, the user may have some requirements and reason to do it, so give the user choice.
  • The file descriptor can transport anything in the future, as long as both the AT server and client agree on details, dbus is being used right now and there's no better alternatives we're aware of, but that doesn't mean that can't change. In the future, it could even be shared memory, the compositor won't care and therefore the sky is the limit