Access Control

Introduction

Since MIT's introduction of time-sharing in the 60s,¹ operating system's security was classically designed around the user: how to protect a user's applications, time quota, storage quota, and his or her own data, from many other users actively using the very same system.

This user-centric model of security was later inherited into the Personal Computing era, with slight differences. Applications written over the Wintel platform expected full access to the user's data, network, and even other applications files and data. By late 2008, the iPhone shifted the client applications landscape by forcing each application to run only in its own isolated sandbox. By 2012, Docker similarly changed the server side of the equation by mainstreaming application-centric containers and deployment.

The concept of user-centric security is so prevalent that Android, built way before the introduction of Linux Kernel namespaces,² had to achieve basic file-system separation between apps by giving each package a distinct Linux user ID at install time.³

Similar to the massively-succesful mobile model, and like the current efforts of macOS,⁴ the Linux desktop is migrating to a sandboxed model of security.⁵ PulseAudio, a core infrastructure layer of the Linux desktop, should be ready.

The Problem

PulseAudio, being written in the Wintel era of computing,⁶ heavily follows the user-centric model of security. It has usually evaded the security question, almost in entirety, by confining itself to run under user context.

In the current model, applications can do whatever they like to the state of user's sound. An application can do any of the following:

Snoop on other application's audio

An application can access the final mixed audio of all other applications, sent to any sink, through the use of monitors:

$ parec --file-format=wav -d $(pactl list sinks | grep Name | cut -d: -f2 | head).monitor snooped-audio.wav
Have unmediated access to the microphone

Any application can access and access from the microphone. This means that a single browser or internet-facing-application exploit can lead to full audio data access. The simple recording command below can run from the context of any application:

$ parec --file-format=wav -d $(pactl list sources | grep Name | grep -v '\.monitor' | cut -d: -f2 | head) microphone-audio.wav
Control system objects outside of the application domain

Any application can induce system-wide effects regarding user audio. For example, an application can suspend the main sink and the user won't hear audio from any application:

$ pactl suspend-sink $(pactl list sinks | grep Name | cut -d: -f2 | head) 1
Load and unload server modules, including network ones

Any application can load or unload server modules, with parameters that might drastically increase the server's attack vector by listening to more network cards, protocols, and ports. An example of this case can be:

$ pactl load-module module-native-protocol-tcp listen=0.0.0.0 port=1234
Induce the server to get reliably killed

Due to how PulseAudio is currently designed, certain workloads asked by the application can reliably kill the daemon by making it exceed its allocated real-time budget.⁷ ⁸ Typically speaking rtkit limits pulse real-time budget to 200ms – not to cause any havoc in the system if any pulse code goes awry.

For example, the following script will reliably kill the pulse server on a dual-core 3GHz x86 machine:

#!/bin/bash

wget -nc https://archive.org/download/TenMinutesOfWhiteNoisePinkNoiseAndBrownianNoise/WhiteNoise.flac

for i in {1..60}; do
    echo "Client #$i";
    pacat --latency-msec=5000 --rate=384000 WhiteNoise.flac &
    sleep 1
done

Moreover, this malicious client will instantaneously kill the server. It simply does so by quickly opening and writing to multiple connected streams.

Forbid other applications from using a sink

To protect resources, pulse has a hard-coded limit of 256 open streams (sink-inputs) per sink. If a malicious, or buggy, application makes the server reach that limit for the default sink, no other application in the system will be able to output any audio.

In the early days, this limit was set to 32 streams. It was increased later to 256 stream due to web browsers, and as a result HTML5 games, becoming PulseAudio-native.⁹

For example, this second malicious client will force-mute the system, without any ability to unmute it back except by killing the client or restarting the server. It does so by exhausting the number of open streams connected to the default sink, thus disabling any newly-connected app from outputting audio.

Forbid other applications from using a digital sink

PulseAudio's passthrough mode allows passing audio, in its compressed form, directly to the hardware. This model is especially prevalent in embedded SoCs where software audio decoding is costly in performance and power-consumption.

Since we can't do audio mixing without decoding the audio first, this mode requries exclusive access (forbids mixing) on the sink. A passthrough stream is thus allowed to start only if the sink has zero open streams. Malicious apps, or an honest system event, can block the system's core application from using passthrough. To handle this, the allow-passthrough module is to be merged in v10.0: it force-acquires the sink for passtrhough streams.¹⁰

Nonetheless from a containers perspective, and whether the above module is loaded or not, a malicious app can pretend to be a passthrough and block all other apps from accessing the acquired sink. The below simple command will disable all other apps from outputting audio:

$ pacat --passthrough /dev/zero
Stop all audio through the Device Reservation Protocol

For better cooperation with other audio servers requiring direct access to ALSA (e.g. JACK), the generic Device Reservation Protocol was created. It's implemented in PulseAudio, Jack, Kodi, etc. and uses d-bus's own "service name ownership" conflict management to serialize access to ALSA cards.

Through this protocol, any application can ask PulseAudio to relinquish access to the ALSA cards and effectively force-mute the system if misused. The following program is an example:

#!/usr/bin/env python3

import dbus, signal

service_prefix='org.freedesktop.ReserveDevice1.Audio'
object_prefix='/org/freedesktop/ReserveDevice1/Audio'
interface='org.freedesktop.ReserveDevice1'
priority=0x7fffffff       # Highest prio = INT32_MAX
bus = dbus.SessionBus()

def request_card_release(card_idx):    
    object = bus.get_object(service_prefix + card_idx, object_prefix + card_idx)
    release = object.get_dbus_method('RequestRelease', interface)
    result = str(release(priority))
    print('Asking pulse to release ALSA card #' + card_idx + ': ', end="")
    print('Success!' if result == '1' else 'Failure!')
    acquire_alsa_card(card_idx, dbus.bus.NAME_FLAG_REPLACE_EXISTING)

def acquire_alsa_card(card_idx, extra_flags=0):
    request = bus.request_name(service_prefix + card_idx,
                               dbus.bus.NAME_FLAG_DO_NOT_QUEUE |
                               dbus.bus.NAME_FLAG_ALLOW_REPLACEMENT |
                               extra_flags)
    if request == dbus.bus.REQUEST_NAME_REPLY_EXISTS:
        print('Pulse has previously acquired ALSA card #' + card_idx)
        request_card_release(card_idx)
    elif request == dbus.bus.REQUEST_NAME_REPLY_PRIMARY_OWNER:
        print("Pulse is now blocked from accessing ALSA card #" + card_idx)

acquire_alsa_card(str(0))
signal.pause()
Snoop on the server's global mempool

Shared memory is used in PulseAudio for audio transfers. For playback the client creates a shared-memory pool to send played audio to the server. For recording a system-wide & global shared-memory pool is used to send recorded audio to clients. Shared memory is used for low-latency and a single-copy sound architecture.

Classically, there are two global shared-memory pools created by the server:

All clients have full read-access to the two pools above and thus any client can fully snoop their contents. Since PulseAudio v9.0, the srbchannel pool has been redesigned & instantiated on a per-client basis, thus forbidding cross-application leaks.¹¹

A similar per-client effort is needed for the global mempool. Otherwise, any app can snoop behind the server's back and read the microphone's audio even if an access-control layer was merged. A large patchset is needed for this transformation.¹²

Threat model

As seen above, the PulseAudio protocol was not originally written with security at the application level in mind. We run an instance of PulseAudio for each user, and if the user has access to the audio hardware, then all applications being run as that user are trusted.

Before start looking at how to address this, we list the set of "attacks" that we wish to mitigate, and those that we explicitly do not address (at least currently).

Application level
  • An application should not be able to capture audio without authorisation

    • Do we want to provide fine-grained control over which devices capture is permitted from?
    • iOS/Android seems to aggregate all of this under "microphone access"
  • An application should only be able to perform any operations on its own input and output streams

    • While enumerating, only its own streams should be visible to the application
  • An application should not be able to create passthrough streams without authorisation

  • An application should only only be able to control the volume of its own streams

  • An application should only be able to play or remove a sample that it uploaded

    • Should all applications be allowed to upload samples?
  • An application should not be able to perform operations sinks/sources/cards without authorisation

    • It should be possible to enumerate only permitted devices
    • Should operations on sinks/sources/cards be allowed in bulk? (e.g. like UNIX root)
  • An application should not be able to set the default sink/source without authorisation

  • An application should not be able to load or unload modules without authorisation

    • Exception for modules loaded by policy (such as filter modules)?
  • An application should only be able to subscribe to events:

    • On its own streams
    • On devices that it can enumerate
    • Do we need to filter what events are sent?
  • An application should not be able to snoop on server's system-wide shared memory

Core level
  • It should not be possible for an application's stream to be moved to a device it is not authorised for
  • Should modules be allowed to change the volume of an application without authorisation (ducking, corking, ...)?

Proposed Designs

1. Separation of concerns

We can think of access control in PulseAudio as a separation of concerns mechanism. In this light we can see three entities:

  1. The transport layer. protocol-native over Unix domain sockets, and possibly bus1 in the future.¹³
  2. The core, which mediates access (mechanism part), in an independent manner from the transport layer
  3. External policy module, which provides policy part of the access

In the above manner, the transport layer won't directly access the pulse-core layer data structures as it's done now. Rather, this will be done through a well-defined core API. This core API shall then allow/deny access (mechanism) per hooks registered in core and implemented as external modules (policy).

Advantages:

  • Sound design that can serve us well in the following years

Disadvantages:

  • A major undertaking, requiring a redesign of both protocol-native and core
  • Each transport, especially the more advanced it is, have its own "native" access control mechanism. For example, we have polkit for d-bus, ¹⁴ and possibly a unique new mechanism for bus1. In this case, won't it be simpler to do Pulse own mechanism checks in terms of the protocol (native, bus1, etc.) own access-control mechanism interfaces?

2. Merging access-control mechanism inside the transport

This is the approach chosen by the access-control patch series submitted last year.¹⁵ New access-control hooks are added, and mapped 1-to-1 with protocol-native commands.

Advantages:

Disadvantages:

  • This second approach leads a design loophole. The 1-to-1 mapping ¹⁷ of protocol-native commands to access hooks inside pa-core is, at best, redundant. At worst, it heavily ties protocol-native with core, blocking future extensions. Meanwhile, putting the access hooks inside protocol-native itself introduces race-conditions with the policy modules.

  • What if we want to provide access-control checks beyond the semantics of a protocol module?

References & Footnotes

⁰¹²³⁴⁵⁶⁷⁸⁹