You are not paranoid – they really are watching you. Criminals, web companies and governments all have reason to spy on your online life, and the methods they use are becoming increasingly sophisticated.
This article is about what kind of information you share and who can find out what about you, and how to stop them. What you do with this information is, of course up to you. Whether you are concerned about the scale of information gathered by web companies or you are hiding from a corrupt government, read on to learn how to keep your data yours.
Tools needed: Wireshark
With Wireshark you can easily find out just how much info you reveal to the world in a few easy steps. This tool captures all info passing through your network interfaces and allows you to search and filter for particular patterns. It basically ‘captures’ whatever data streams from your network interface (wireless router), so you can see whatever (potentially malicious) people logged-on or hacked in to your network can see.
Once installed run Wireshark from the installed directory using ‘privileges’. Use command: sudo wireshark or sudo ./wireshark. You will then get a message that you have started with ‘super user privileges’, which we will use for the tutorial. If you plan to further use Wireshark, properly set it up using the included guide later.
Click on your network device in the interface list (probably eth0 for wired and wlan0 for wireless) to start a ‘capture’. As soon as you start using the network, the top part of the screen will fill with variously colored packages. The tool has a filter to help you make some sense of this multi-colored mess. Fore example, you can keep a prying eye on google.com searches using this filter: http.request.full_url contains “google.com?q”
If you now do a search using http://google.com, it will appear in the list and the search term will be in the info column. A similar technique could be used on any of the popular search engines.
You may not be concerned about people being able to read your search ques, but exactly the same technique can be used to pull usernames and passwords that are sent in plain text. For example, most forums send passwords in plain text because they’re not a serious security risk and secure certificates can be expensive.
To see how easy username and passwords can be sniffed, go to a forum of your choice, fire up Wireshark and start a package capture using filter: http.request.url contains “login.php”
When you log into a forum the filter will capture the packet. The line based text will contain: Username=XXX&password=YYYY&login=Log+in
The most basic piece of the web privacy puzzle is the Secure Sockets Layer (SSL)
How many computers are you sharing this info with? Depending on your network set-up, probably every other computer on the LAN or Wireless network. As well as every computer that sits in the route between you and the website’s server you’re communicating with.
To discover what these are, use Traceroute to map the path the packets take. For example, traceroute: www.google.com
If your computer’s behind a firewall, you may find that this just outputs a series of asterisks. In this case, you can use a web-based traceroute such as the ones indexed at www.traceroute.org. This list is a little out of date and not all the servers are still hosting traceroute, but you should be able to fine one that works in your area.
The most basic piece of the web privacy puzzle is the Secure Sockets Layer (SSL). This rather obscure-sounding protocol is a way of creating an excrypted channel between an application running on your computer and an application running on another computer. If you use services with unsecured passwords (and there’s no reason you shouldn’t as long as you understand the implications), then it’s important not to use the same password for a secure service. For each insecure network protocol, there’s a secure one that does the same basic task but through an SSL channel.
Insecure Protocols: FTP, Telnet, RCP and HTTP (the most common). Anytime one of these protocols is used, an eavesdropper can read what you send.
Secure Protocols: SFTP, SCP, SSH and HTTPS
For web browsing HTTPS is the most important. As you saw before, many computers can read what we send in HTTP, but if we perform the same test again using googles secure webpage – https://www.google.com (note the ‘s’) then you will find the info does not appear in Wireshark.
Some web browsers show a padlock icon (next to URL) to indicate you are on a secure server when connected to a secure website, but this can be spoofed easily using favicons. If you’re unsure, click on the icon. A legtimate padlock will open a pop-up telling you about the secuirty on the page.
HTTPS/SSL ensures only that the information can’t be read as it’s being transmitted between your computer and the server. Once there, the organization running the server could pass it to third parties, or transmit it insecurely between their data centers. Once you send information, you lose control of it. Before hitting submit, always ask yourself, do you trust the organization receiving the data? If not, don’t send it.
HTTPS is a great way to keep your web browsing private. However, because of the way it’s bolted on top of HTTP it isn’t always easy to make sure you use it. For example, if you use https://www.google.com to search for ‘wikipedia’, it will direct you to the HTTP (non-secure) version and not the secure HTTPS version.
The Electronic Frontiers Foundation (EFF), a non-profit dedicated to defending digital rights, has developed an extension for Firefox that forces browsers to use HTTPS wherever available. A Chrome version is currently in beta. Download the extension here to keep your web usage away from eavesdroppers.
Like all forms of encryption, SSL has a weakness and that’s the keys which are stored in certificates. Just as a hacker can easily get into your accounts if they know your password, they can easily eavesdrop on SSL encrypted data – or spoof it – if they can trick your computer into using it their certificates. The main point here is that they are stored on the computer, not in your memory like passwords. If an attacker can put files on your system, they can break SSL encryption. You are at particular risk when using a computer you haven’t personally installed the operating system on, such as a work machine or at an internet cafe.
You should be able to view the current certificates and authorities in your browser’s security settings, but it isn’t always easy to identify things that shouldn’t be there. Here, live distros come to the rescue, since you can carry a trusted operating system with you and use that whenever you are at a computer of dubious provenance.
Using SSL will keep your data safe from eavesdroppers, but what if the companies that you’re communicating with are spying on you? Google, Facebook, Twitter and others have built business models out of providing users with a free service in return for information about you. This information can then be used to target advertisements at you. Twitter has even gone a step further and sold users’ tweets to market researches. Some people may consider this a fair trade, but privacy campaigners are becoming increasingly concerned about the shear quantity of data these companies are holding about us. And this data goes way beyond what we voluntarily hand over to them. Both Google and Facebook have established relationships with literally millions of other websites to help them track your movements around the web using cookies – these are pieces of information stored on your computer to help sites identify you when your browser reaches them. To find out just how much these companies are tracking us, we can use Wireshark to monitor our network connection and watch for the cookie data being sent back.
Start Wireshark and capture on your main network interface and in the filter box enter: http.cookie This will now show only packets that relate to cookies that are being sent to web servers. To display a little more of the information that is being acquired, go to the middle pane and click on the arrow next to Hypertext Transfer Protocol. There are two sections in here that allow the web company to track us: the host and the referrer. Right-click on each of these and select ‘Apply as Column’. This will then add these fields to the main view. Each of these two domains allows host (the organization receiving the cookie) to monitor your activity on the referrer. In addition to this, the host uses a unique ID to track your activity between sessions.
Google uses its advertising services to monitor what we do, whereas Facebook uses its Like buttons. There’s no way of knowing exactly what these companies are doing with the data they collect – we can see only what they’re receiving.
Fortunately, most browsers allow you to control cookies. Depending on your personal feelings, you may choose to limit cookies to certain websites (where they can be useful to remember preferences), or block them completely. If you use Firefox (Edit > Preferences > Privacy), Firefox will use custom settings for History. If you untick Accept Cookies from Sites, Firefox will not store any cookies. To do the same in Safari (Safari > Private Browsing) which supposedly blocks the store of cookies and other data just by selecting Private Browsing, but this is not the case, in order to totally eliminate Cookies being stored you must open Terminal and delete directory cache from the dscacheutil utility manually. In Chromium (Preferences (the spanner by the address bar) > Under the Bonnet and change Cookies to Block Sites from Setting Any Data for Incognito mode). Both Firefox and Chromium give you the option of blocking third party cookies. This means they block cookies from domains other then that of the current website. If you do this, websites can store data about you such as your preferences, and can track your movements within the site, but other sites won’t be able to follow your movements once you leave the domain. This will stop companies from tracking your movements across the web.
If you set this up, then run cookie tracking in Wireshark, as was done above, you will that the referrer and the host are always the same domain. For many users, this will be a happy medium of letting cookies do their original purpose – letting sites use them to recognize returning viewers – but blocking organizations from following their online movements. Cookies aren’t the only way that websites can track you. Even if you have browser cookies disabled, sites can still store tracking information on your computer Locally Shared Objects (LSO’s). These function exactly like cookies , except that they’re accessed through flash rather then directly through your browner. To view and control what websites are using these, go to Macromedia’s Website Storage Settings Panel at www.macromedia.com/support/documents/en/flashplayer/help/ settings_manager07.html
Webmasters intent on tracking you can use a combination of techniques to create zombie cookies. These store the same information in more then one place so that when you destroy one, they regenerate using the others. Fore example, if you delete all browser cookies, the website can recreate the cookie from an LSO and visa versa. As long as one of these remains, all the others can regenerate. Sammy Kamkar has taken this to the extreme at samy.pl/evercookie, where he uses 12 different methods to resurrect the data!
Running the NoScript extension for Firefox should prevent this type of cookie from working, but it also disables the method of testing it. Neither Private mode in Firefox or Incognito mode in Chromium are able to prevent this. If you need to be sure your web browser isn’t be tracked across sessions, the best solution is to use a non-persistent system. This is a system that doesn’t carry any information over from one session to the next. You can still be tracked during a browsing session, but not between them.
For Linux users, the most obvious option is a live DVD. This doesn’t have to be a physical disc running live – an ISO running in a virtual machine will do the job. This means that all data that the website can use to track you is reset each time you restart the virtual machine. You can also run more then one virtual machine simultaneously to prevent anyone linking two sessions. If it ever cones into being, a live version of Boot To Grecko would be a particularly convenient way to do this, but this is still in development.
There is one, slightly more devious technique that websites can use to identify you. This is by amalgamating all information about the capabilities of your browser and system into a digital fingerprint. Because of the amount of information that your browser will, if asked, reveal about you, this fingerprint can often be used to uniquely identify you on a site.
Once again the EFF is active in this area, and hosts a website to help you understand what your fingerprint is at panopticlick.eff.org to see how unique you are. If you are concerned about being tracked this way, the best way to prevent it is to stop scripts from running. This reduces massively the amount of information that a website can use to form the fingerprint. The NoScript extension for Firefox provides an easy way to control which scripts limit the function of many interactive websites. Web pages are made up of a number of different elements that your browser reassembles to make a single document. These elements may come from many different places, organizations and servers. Any of these could contain some degree of monitoring using a technique called web bugging (also known as web beacons or pixel tags). These use images to generate HTTP requests that log your activities with a different server to one hosting the website. These potentially could be able to track you using browser fingerprinting, but they’re also used more widely. They are not restricted to web pages, and can be used in any HTML document. Most commonly they are used by spammers to identify active email addresses. If you open an email containing one of these images, the spammer will be able to identify that you’re checking the address and can be persuaded to open spam emails. Fortunately, most email clients and web mail providers disable image loading by default.
Source: LinuxFormat, EFF.org