How Printers can Breach our Privacy:
Acoustic Side-Channel Attacks on Printers
Medical data is generally considered private, and there are good reasons for it: Your medical records may reveal if you're suffering from from contagious diseases (your friends are probably interested in this), if you have genetic conditions increasing your risk for certain forms of illnesses (your health insurance company might be interested in this), if you were really having a severe cold this week's Monday right after your vacations (your employer might be interested in this), and if you're suffering from sexual diseases or if you receive Methadone as a substitute for illegal drugs (you simply think nobody should be interested in this).
All this kind of data is regularly printed in a doctors practice. Now these printers are typically placed such that nobody can see what is printed, and you might believe that your data is secure. However, this belief is not justified. In this study we showed that printed text can be reconstructed from a previously taken recording of the sound emitted by the printer. A majority of the doctors' practices use dot-matrix printers for printing (see below for the results of a survey we commissioned on the usage of dot-matrix printers), and in some cases they are even required to do so.
In effect this means that any person sitting in the reception area of the doctor can record the sound of the printer and can reconstruct the printed text. Our novel attack takes as input a sound recording of a dot-matrix printer processing text, and recovers up to 72% of printed words. After an upfront training phase, the attack is fully automated and uses a combination of machine learning, audio processing and speech recognition techniques, including spectrum features, Hidden Markov Models and linear classification; moreover, it allows for feedback-based incremental learning.
The attack in practice
We have successfully mounted the attack in-field in a doctor's practice and recovered the content of a medical prescription. (For privacy reasons, we asked for permission upfront and let the secretary print fresh prescriptions of an artificial client.) The attack was conducted under realistic - and arguably even pessimistic - circumstances: during rush hour, with many people chatting in the waiting room.
While medical data is one of the most striking examples of our attack, there is no principal limitation and the attack works, e.g., with bank account printers that use dot-matrix printers.
Deployment of dot-matrix printers
Although outdated for private use, dot-matrix printers continue to play a surprisingly prominent role in businesses where confidential information is processed, in particular in banks (for printing account statements, transcripts of transactions, etc.) and doctor's practices (for printing the patients' health records and medical prescriptions). The table shown below contains the results of a representative survey that we commissioned from a professional survey institute in Germany on this topic.
The reasons for the continued use of dot-matrix printers are manifold: robustness, cheap deployment, incompatibility of modern printers with old hardware, and overall the lack of a compelling business reason of IT laymans why working IT hardware should be modernized. Moreover, several European countries (e.g., Germany, Switzerland, Austria, etc.) require by law to use dot-matrix (carbon-copy) printers for printing prescriptions of narcotic substances.
Overview of techniques
In slightly more scientific terms, our method for reconstruction of printed text from a previously taken recording of the sound emitted by the printer works as follows. We first conduct a training phase where words from a dictionary are printed, and characteristic sound features of these words are extracted and stored in a database. We then use the trained characteristic features to recognize the printed text. This task is not trivial. Major challenges include:
- Identifying and extracting sound features that suitably capture the acoustic emanation of dot-matrix printers;
- Compensating for the blurred and overlapping features that are induced by the substantial decay time of the emanations;
- Identifying and eliminating wrongly recognized words to increase the overall recognition rate.
Our work addresses these challenges, using a combination of machine learning techniques for audio processing and higher-level information about document coherence. Similar techniques are used in language technology applications, in particular in automatic speech recognition.
First, we develop a novel feature design that borrows from commonly used techniques for feature extraction in speech recognition and music processing. These techniques are geared towards the human ear, which is limited to approx. 20 kHz and whose sensitivity is logarithmic in the frequency; for printers, our experiments show that most interesting features occur above 20 kHz, and a logarithmic scale cannot be assumed. Our feature design reflects these observations by employing a sub-band decomposition that places emphasis on the high frequencies, and spreading filter frequencies linearly over the frequency range. We further add suitable smoothing to make the recognition robust against measurement variations and environmental noise.
Second, we deal with the decay time and the induced blurring by
resorting to a word-based approach instead of decoding individual
letters. A word-based approach requires additional upfront effort such
as an extended training phase as the dictionary grows larger, and it
does not permit us to increase recognition rates by using, e.g.,
spell-checking. Recognition of words based on training the sound of
individual letters (or pairs/triples of letters), however, is
infeasible because the sound emitted by printers blurs so strongly
over adjacent letters.
Third, we employ speech recognition techniques to increase the recognition rate: we use Hidden Markov Models (HMMs) that rely on the statistical frequency of sequences of words in text in order to rule out incorrect word combinations. The presence of strong blurring, however, requires to use at least 3-grams on the words of the dictionary to be effective, causing existing implementations for this task to fail because of memory exhaustion. To tame memory consumption, we implemented a delayed computation of the transition matrix that underlies HMMs, and in each step of the search procedure, we adaptively removed the words with only weakly matching features from the search space.
We built a prototypical implementation that can bootstrap the recognition routine from a database of featured words that have been trained using supervised learning. Afterwards, the prototype automatically recognizes text with recognition rates of up to 72 %.
The above image shows the recording of the sound captured while printing the letter "H" two times. Time is shown on the x-axis, frequencies are shown on the y-axis, and the increasing intensity is shown by darker colors.
Demonstration of an actual reconstruction
|The original text
(the beginning of the Wikipedia article on printers)
|The result after reconstruction|
|In computing, a printer is a peripheral which produces a hard copy (permanent human-readable text and/or graphics) of documents stored in electronic form, usually on physical print media such as paper or transparencies. Many printers are primarily used as local peripherals, and are attached by a printer cable or, in most newer printers, a USB cable to a computer which serves as a document source. Some printers, commonly known as network printers, have built-in network interfaces (typically wireless or Ethernet), and can serve as a hardcopy device for any user on the network. Individual printers are often designed to support both local and network connected users at the same time.||In computing, a printer in a peripheral which produces a hard body (permanent human-readable text and/or graphics) of documents source in electronic form. usually as physical print media such as pages or transparencies. Many Printers are primarily used go local peripherals, end are attached go A printer could or, in most newer printers; a USB cable go A computer which served de = document source. some printers, commonly known go network printers; have built-in network interfaces (typically wireless as ethernet), god way serve As a hardcopy device for out year we who network. Individual Printers use often designed so support born local god network connected users as too some tree.|
- Michael Backes, Markus Dürmuth, Sebastian Gerling, Manfred Pinkal, Caroline Sporleder.
Acoustic Side-Channel Attacks on Printers.
In Proceedings of the 19th USENIX Security Symposium, August 2010.
- SR Fernsehen (03.06.09) (external link)
- ZDF - Volle Kanne (10.07.09)
- WDR Servicezeit: Gesundheit (19.10.09) (external link)
- ZDF - Drehscheibe Deutschland (17.11.09) (external link)
- Bayerisches Fernsehen - Faszination Wissen (10.10.10)
Print media and websites
- Frankfurter Allgemeine Sonntagszeitung (17.05.09) "Geheimniss aus dem Druckerlärm"
- FAZ Online (19.05.09) (external link)
- Heise Newsticker (27.05.09) (external link)
- Süddeutsche Online (28.05.09) (external link)
- Spiegel Online (27.05.09) (external link)
- PC Welt (27.05.09) (external link)
- Zeit Online (28.05.09) (external link)
- Stern Online (28.05.09) (external link)
- Netzwelt.de (28.05.09) (external link)
- golem.de (27.05.09) (external link)
- CIO (02.06.09) (external link)
- Manager Magazin (06.07.09) (external link)
In case of questions, please contact:
|Prof. Dr. Michael Backes||[backes (at) cs (dot) uni-saarland (dot) de]|
|Dr. Markus Dürmuth||[duermuth (at) cs (dot) uni-saarland (dot) de]|
|Sebastian Gerling, M.Sc.||[sgerling (at) cs (dot) uni-saarland (dot) de]|