Skip to main content.

How Printers can Breach our Privacy:
Acoustic Side-Channel Attacks on Printers

Michael Backes, Markus Dürmuth, Sebastian Gerling, Manfred Pinkal, Caroline Sporleder

This webpage provides some (non-scientific) information about the project, intended for the public.
If you are interested in a more scientific treatment, read the paper or contact us.

Introduction

Medical data is generally considered private, and there are good reasons for it: Your medical records may reveal if you're suffering from from contagious diseases (your friends are probably interested in this), if you have genetic conditions increasing your risk for certain forms of illnesses (your health insurance company might be interested in this), if you were really having a severe cold this week's Monday right after your vacations (your employer might be interested in this), and if you're suffering from sexual diseases or if you receive Methadone as a substitute for illegal drugs (you simply think nobody should be interested in this).

All this kind of data is regularly printed in a doctors practice. Now these printers are typically placed such that nobody can see what is printed, and you might believe that your data is secure. However, this belief is not justified. In this study we showed that printed text can be reconstructed from a previously taken recording of the sound emitted by the printer. A majority of the doctors' practices use dot-matrix printers for printing (see below for the results of a survey we commissioned on the usage of dot-matrix printers), and in some cases they are even required to do so.

In effect this means that any person sitting in the reception area of the doctor can record the sound of the printer and can reconstruct the printed text. Our novel attack takes as input a sound recording of a dot-matrix printer processing text, and recovers up to 72% of printed words. After an upfront training phase, the attack is fully automated and uses a combination of machine learning, audio processing and speech recognition techniques, including spectrum features, Hidden Markov Models and linear classification; moreover, it allows for feedback-based incremental learning.

The attack in practice

We have successfully mounted the attack in-field in a doctor's practice and recovered the content of a medical prescription. (For privacy reasons, we asked for permission upfront and let the secretary print fresh prescriptions of an artificial client.)  The attack was conducted under realistic - and arguably even pessimistic - circumstances: during rush hour, with many people chatting in the waiting room.

While medical data is one of the most striking examples of our attack, there is no principal limitation and the attack works, e.g., with bank account printers that use dot-matrix printers.

Deployment of dot-matrix printers

Although outdated for private use, dot-matrix printers continue to play a surprisingly prominent role in businesses where confidential information is processed, in particular in banks (for printing account statements, transcripts of transactions, etc.) and doctor's practices (for printing the patients' health records and medical prescriptions).  The table shown below contains the results of a representative survey that we commissioned from a professional survey institute in Germany on this topic.

Doctors Asked: n=541
Use dot matrix printers 58.4%
 - for prescriptions 79.4%
 - for other usage 84.5%
Printer placed within earshot of customers 72.2%
Replacement planned 4.7%
Banks Asked: n=524
Use dot matrix printers 30.0%
 - for bank statement printers 29.9%
 - for other usage 83.4%
Printer placed within earshot of  customers 83.4%
Replacement planned 8.3%

The reasons for the continued use of dot-matrix printers are manifold: robustness, cheap deployment, incompatibility of modern printers with old hardware, and overall the lack of a compelling business reason of IT laymans why working IT hardware should be modernized. Moreover, several European countries (e.g., Germany, Switzerland, Austria, etc.) require by law to use dot-matrix (carbon-copy) printers for printing prescriptions of narcotic substances.

Overview of techniques

In slightly more scientific terms, our method for reconstruction of printed text from a previously taken recording of the sound emitted by the printer works as follows. We first conduct a training phase where words from a dictionary are printed, and characteristic sound features of these words are extracted and stored in a database.  We then use the trained characteristic features to recognize the printed text. This task is not trivial. Major challenges include:

Our work addresses these challenges, using a combination of machine learning techniques for audio processing and higher-level information about document coherence. Similar techniques are used in language technology applications, in particular in automatic speech recognition.

First, we develop a novel feature design that borrows from commonly used techniques for feature extraction in speech recognition and music processing. These techniques are geared towards the human ear, which is limited to approx. 20 kHz and whose sensitivity is logarithmic in the frequency; for printers, our experiments show that most interesting features occur above 20 kHz, and a logarithmic scale cannot be assumed. Our feature design reflects these observations by employing a sub-band decomposition that places emphasis on the high frequencies, and spreading filter frequencies linearly over the frequency range. We further add suitable smoothing to make the recognition robust against measurement variations and environmental noise.

Second, we deal with the decay time and the induced blurring by resorting to a word-based approach instead of decoding individual letters. A word-based approach requires additional upfront effort such as an extended training phase as the dictionary grows larger, and it does not permit us to increase recognition rates by using, e.g., spell-checking. Recognition of words based on training the sound of individual letters (or pairs/triples of letters), however, is infeasible because the sound emitted by printers blurs so strongly over adjacent letters.

Third, we employ speech recognition techniques to increase the recognition rate: we use Hidden Markov Models (HMMs) that rely on the statistical frequency of sequences of words in text in order to rule out incorrect word combinations.  The presence of strong blurring, however, requires to use at least 3-grams on the words of the dictionary to be effective, causing existing implementations for this task to fail because of memory exhaustion. To tame memory consumption, we implemented a delayed computation of the transition matrix that underlies HMMs, and in each step of the search procedure, we adaptively removed the words with only weakly matching features from the search space.

We built a prototypical implementation that can bootstrap the recognition routine from a database of featured words that have been trained using supervised learning. Afterwards, the prototype automatically recognizes text with recognition rates of up to 72 %.

HH Decay

The above image shows the recording of the sound captured while printing the letter "H" two times. Time is shown on the x-axis, frequencies are shown on the y-axis, and the increasing intensity is shown by darker colors.

Demonstration of an actual reconstruction

The original text
(the beginning of the Wikipedia article on printers)
The result after reconstruction
In computing, a printer is a peripheral which produces a hard copy (permanent human-readable text and/or graphics) of documents stored in electronic form, usually on physical print media such as paper or transparencies. Many printers are primarily used as local peripherals, and are attached by a printer cable or, in most newer printers, a USB cable to a computer which serves as a document source. Some printers, commonly known as network printers, have built-in network interfaces (typically wireless or Ethernet), and can serve as a hardcopy device for any user on the network. Individual printers are often designed to support both local and network connected users at the same time. In computing, a printer in a peripheral which produces a hard body (permanent human-readable text and/or graphics) of documents source in electronic form. usually as physical print media such as pages or transparencies. Many Printers are primarily used go local peripherals, end are attached go A printer could or, in most newer printers; a USB cable go A computer which served de = document source. some printers, commonly known go network printers; have built-in network interfaces (typically wireless as ethernet), god way serve As a hardcopy device for out year we who network. Individual Printers use often designed so support born local god network connected users as too some tree.

Scientific publication

  1. Michael Backes, Markus Dürmuth, Sebastian Gerling, Manfred Pinkal, Caroline Sporleder.
    Acoustic Side-Channel Attacks on Printers.
    In Proceedings of the 19th USENIX Security Symposium, August 2010.

Press coverage

TV

Print media and websites

Contact

In case of questions, please contact:

Prof. Dr. Michael Backes   [backes (at) cs (dot) uni-saarland (dot) de]
Dr. Markus Dürmuth   [duermuth (at) cs (dot) uni-saarland (dot) de]
Sebastian Gerling, M.Sc.   [sgerling (at) cs (dot) uni-saarland (dot) de]