Fingerprinting, CDI & How to Deal With It
Updated February, 18, 2013.
This site is no longer being maintained so anything below could still be accurate, or very outdated.
Clientless device identification (CDI), or generally just fingerprinting, is a very tedious topic with many shades of gray. To set the tone, it needs to be put simply—you can’t spoof your way out of this one. You may have the greatest security system for your car but if someone really wants your new Range Rover, they’ll snatch it from you. In a nutshell, that is analogous to CDI.
The concept behind fingerprinting is that an attacker can use many individual properties available from your system to reliably identify it, things like your screen size, user agent string, browser plugins, fonts, etc. Alone, these properties are not significant, nor personally identifying but they reveal a lot when combined into what becomes your device fingerprint. This means you just went from being the blonde to the blonde in the red dress near the window with an espresso. Get it?
The problem which makes fingerprinting so tricky to elude is that today’s browsers throw into the internet so much information and they are so integrated with the rest of the operating system, that if you tried to catch all these pieces to keep Humpty Dumpty together, you’ll fail miserably and hate life.
This shotgun blast of data ranges from the mundane HTTP header info to your time zone, to the inescapable concept of clock skew, to other methods (speculatively) unknown outside of specific organizations. According to Ric Richardson, founder of Uniloc, the company which gave birth to Bluecava, Inc., "There are literally hundreds of things you can measure."
While "hundreds" strikes me as an exaggeration, there are many traits unique to certain browsers and operating systems which cannot be changed. Here is just one example you definitely didn’t see coming. A report by Securityblog.st explains, "Looking at the number of requests we can see that Firefox will make a second request for /favicon.ico if it receives a 404 on the first try. In testing no other browser implementation seemed to do this."
Translation: Firefox is the only browser which acts this way, so a simple Page Not Found error can be what undoes all your effort in browser spoofing. Obscuring you IP address will not help you here, either. Andrew Lewman is the executive director of the Tor Project and in an interview for Guardian News UK’s Tech Weekly series, Andrew talks a bit on the subject of fingerprinting. The good stuff starts around 23m 30s into the recording.
There’s a new breed of companies out there... They actually say they don’t record IP addresses ’cause they don’t need it anymore. They fingerprint your browser profile, your cookie profile, your language profile... All that sorta stuff correlated together will start building big pictures about you.
Take a Closer Look
Let's get ourselves a visual example. Take a look at Panopticlick, an experiment set up in 2010 by the Electronic Frontier Foundation to see how many unique users would be detected by the site over its duration.
While the site still adding to its number of visitors, their data is not being analyzed as aggressively as were the results of the initial 4 months. So when you see on the site that you’re unique among 2+ million users, this is (directly from the EFF themselves), "...unrepresentatively high when you’re being compared to a lot of older browsers and fewer newer ones." So don’t get too worked up about your Panopticlick score because unless you’re on Tor, most other people will have a result similar to yours.
The largest points of identification are system fonts and browser plugins (not to be confused with browser extensions). Chrome’s PDF viewer is a plugin, Microsoft Silverlight renders in the browser via a plugin. Windows Media Player, Quicktime, Flash, Java, VLC and other media plugins, Ubuntu’s Google search enhancer, etc. The more weird plugins and fonts you have, the more unique you become.
While it’s good practice to not keep plugins you don’t need because unused somethings are exploitable somethings, most people have a Flash plugin and many also have Java web browser plugins. Through these two plugins, an attacker can detect your operating system and architecture, screen resolution, your GPU driver vendor, your system language and Linux kernel if using it. As another example of something unchangeable within the operating system: Flash can specify whether it’s installed on a Linux or Apple OS and there’s no way to spoof something different.
Flash also uses what are referred to as Flash cookies (LSOs) and Java can even show whether you’re likely to be on a laptop computer or a desktop by listing your network connection interface(s) identifier. Additionally, Flash and Java are only two of the most popular browser plugins. There are dozens of others to add to your fingerprint.
Adobe Flash is sometimes included with Apple and Windows computers, and some popular Linux distros. Java usually is not, but some Linux distributions package OpenJDK and IcedTea plugins by default. Your choices to combat all of this are to uninstall Flash and Java completely, disable those plugins, block them with an extension or use browser which supports on-demand plugins.
Take a look at this screenshot, it’s a result of the JonDonym browser test. Both Flash and Java can create a list of every font on your computer.
Windows is notorious for having literally hundreds of fonts (which look exactly the same!) and in a 2011 study on fingerprinting conducted by researchers at the Budapest University of Technology and Economics, it’s described how Linux and OS X have the upper hand here in fingerprinting compared to Windows because Linux and OS X contain much fewer and more uniform font sets.
Flash can be set up to not allow font enumeration so your system fonts won’t be detectable by your browser’s Flash plugin. See Adobe’s Flash administrator guide for this and other areas you may want to limit Flash’s behavior to. This is for Adobe Flash only, it doesn’t apply to Google Chrome's Pepper Flash.
Browser User Agent
You have a finer degree of control over your browser’s user agent string than any other single part of your fingerprint. The UA string identifies your browser, operating system and some specific attributes about them. If you choose to change your user agent, it’s better to change the entire thing to the default of another browser rather than only changing certain parts of it. Then you avoid using an unnatural combination of properties. For example, with each Chrome version change, there is a corresponding Webkit version change. UA strings for the same browser can differ from one operating system to another, too. Take a look at these two Opera user agents.
Now, don’t think that just because you can change your user agent, you must. Obviously an unconventional OS like FreeBSD paired with an uncommon browser like Midori would be more conspicuous, but among an updated Firefox, Internet Explorer or Chrome, even Safari or Opera, you'll still have many peers to blend in with. If you’re using Windows and IE, Chrome or Firefox, I’d say there’s no need to change anything. From a fingerprinting standpoint, you’ve already got one of the most common user agents possible, as long as other applications you've installed haven't tampered with it.
The best place to find a totally default Internet Explorer or Safari user agent is from a fresh Windows or OS X installation, most readily available from showroom display computers at a retail store. Chrome stable release numbers can be found at the Chrome release blog, Firefox’s from Mozilla’s website and there is also useragentstring.com.
Different browsers treat HTTP headers differently. Firefox and its forks allow thorough editing of that information while the other major browsers are immutable without extensions or addons, and even they are limited in effectiveness. The order in which this header information is sent to websites and the information itself also differs with each browser. Here is a screenshot of the HTTP headers for Safari on OS X. This is a showroom-default setup with cookies enabled and no Flash installed.
REMOTE_ADDR is the computer’s IP address which any website sees, but can be changed with a proxy, Tor or a VPN. You can also see some Google Analytics cookies. Browserspy.dk and MNO Privacy Checker are two great sites for examining your header content. I do NOT recommend changing your headers aside from only one point, and in a very specific circumstance. I’ll explain why on the next page.
This is taken from your operating system’s time settings, but don’t think you can get away with disabling all timekeeping in the OS and have no time zone. Doing this will simply default you to a time zone output of 0 in the Panopticlick test. This is an arbitrary number and doesn’t correspond to any regional time zone format like UTC +/- x hours, but instead signifies that you have no system time set at all.
Screen Size & Color Depth
Linux and OS X will show a 24-bit color depth while Windows Vista and above will usually show 32-bit. The reason for this is sometimes hardware, like if your display or GPU only support 24-bit color. But if hardware supported, 32-bit color depth comes from using the standard 8 bits for each RGB channel, plus an additional 8 bits for opacity effects (alpha transparency, this is RGBA), thus 32 bits total. Both 24 and 32-bit color depths will still show 16.7 million colors (256^3).
Cookies are a well-known adversary by now but fortunately they can be easily controlled by the browser alone. Panopticlick's supercookie test refers to general DOM storage, and Internet Explorer user data and ActiveX data and are kept in DOM. Disallowing all cookies and local data in Chrome and IE will prevent DOM storage from being used at all. Chrome, IE, Opera and Firefox all delete DOM contents when you clear the browser's user data (Ctrl+Shift+Delete). That includes some Flash cookies.
Other Data Points
So that covers the stuff in the Panopticlick test but there is much more that can still be done. The more obscure fingerprint data points reach deeper into your system than simply analyzing your browser headers or probing with some Flash calls. Here's a small taste, and even these would mostly reside at the non-exotic end of the scale.
...our method can efficiently track changes in the dynamic IP address...and distinguish between different PCs behind a NAT...
As with any other single point of CDI, removing only the IP address from the fingerprinting still leaves many more powerful data sources remaining. So much so that IP addresses have become more useful for geolocation than identifying returning users and/or devices. Besides, state-level actors always have the option of linking together dynamic public IP addresses and customer accounts with help from internet service providers directly.
MAC Address & UUID
MAC addresses are the identifying number sets of a device's network interfaces, be they wired or wireless. Though a MAC lives in firmware, it can be spoofed (see next page). A network's access point will always see your hardware's MAC address, and depending on the services running on your device, so will other devices on the network.
UUID stands for universally unique identifier. This is a 128-bit hexadecimal number assigned to a hardware device. For some examples, Apple gives a UDID to iStuff but is moving towards replacing them with a UUID instead. Linux assigns disk partitions and external media a UUID and Microsoft uses a variation called a GUID (globally unique identifier).
UUIDs and GUIDs aren’t too much of an issue and there’s nearly nothing you can do about them. Sometimes UUIDs can be changed or temporarily spoofed, but outside of events like this, the biggest concern would be how apps use a UUID for various purposes.
Referrers show up in website logs and tell whether you clicked a link to get to that website (and if so, what site did you come from) or if you likely accessed the URL straight from the address or bookmark bar. This video from the 2011 Black Hat USA conference demonstrates a recent simple use of referrers contributing to identifying a user. Skip about 5 minutes into the video.
Referrers can be manipulated or disabled from within the browser. Since disabling referrers will break sites which require them (this is uncommon though), doing this through an extension will allow you to add exceptions for problematic sites. Disabling directly through the browser is all on or all off.
There’s no way to moderate referrers in Internet Explorer without using an external application like Fiddler. For Firefox, you can use Refefrer Control or an about:config setting (see tSc’s Firefox tweak guide). Chromium-based browsers also have a Referer Control extension (not connected to Firefox's) but you could also use the noreferrers command line switch. For Opera, use the settings in opera:config.
In 2005, researchers at the University of California, San Diego demonstrated how clock skew can fingerprint devices. The team used TCP and ICMP timestamps and the computer’s system time for client time readings. In summary, "...the clock skew estimates for any given machine are approximately constant over time, but that different machines have detectably different clock skews."
To prevent ICMP timestamping, ICMP type 13 (along with the type 14 response) could be disabled in the system or blocked at a firewall. As for TCP, Windows XP and 7 (but not Vista) ship with TCP timestamps disabled, while Apple OS X, iOS and many Linux distributions come with timestamps enabled. You can verify your own settings with SpeedGuide's TCP/IP analyzer.
TCP Timestamping is not necessary on residential internet connections and can be disabled. You can use SpeedOf.Me to check for any negative effects and this wiki page from the University of Pennsylvania explains how to change TCP timestamp options for Linux, OS X and Windows.
An alternative to disabling TCP timestamps is to mask them as discussed in this paper by researchers at University of Massachusetts. They describe evading fingerprinting by TCP timestamp clock skews while preserving the functionality of keeping timestamps enabled. While this might be good for anti-forensics Linux distros, you must either compile your own kernel or create a module.
But even if you spoof or disable ICMP and TCP timestamps, clock skews can be used from CPU or GPU processes, among other places.
Network Traffic Analysis
In a study by researchers at the University of North Carolina and Carnegie Mellon University, it’s explained how network traffic analysis can be used to detect a specific browser with a minimum accuracy of 71%. They don’t even examine packet contents, only the coarse flow records and from there, the browser’s behavior during retrieval requests does the rest of the work.
Like the Securityblog Firefox 404 example mentioned above, this study reveals traffic flow characteristics which are exclusive to individual browsers and largely unable to be changed. For example, during a website retrieval request to cnn.com, Safari sent the least amount of packets overall and Firefox the most. The study gathered data from October to December of 2008 and the browsers tested were Firefox 2, Internet Explorer 7, Opera 9.51 and Safari 3.1. Chrome was not included because it wasn’t officially released yet.
Again, these are just some of the many elements of a device fingerprint and how obscure or common they can be. The field of CDI is rather roughly cultivated at this point in time so its potential overshadows its current use. (...or does it?) So where do we go from here? Read on, young grasshopper.Part II: Separating You From You