Author: Scott Robbins
Bulk meta-data collection as a tactic for counter-terrorism agencies is widely used by liberal democracies. This paper provides some clarity regarding what bulk meta-data collection means, how it is regulated, and how this collected meta-data is used for counter-terrorism purposes. We focus on the US National Security Agency (NSA) and the UK’s Government Communications Headquarters (GCHQ) – although there are many other intelligence agencies collecting meta-data in bulk. We also discuss how other agencies within the “Five Eyes” (Australia, Canada, the UK, the US, and New Zealand) share this collected meta-data. In section 3 the laws and policies surrounding bulk meta-data collection are discussed. We focus in on US law as the US is the most discussed and analysed practitioner of bulk meta-data collection, and US practice plays a direct and indirect role in other nation’s bulk meta-data collection and use. We discuss how US law compares with the EU and how US law interacts with sharing agreements with other countries. Finally, we discuss how this bulk collected meta-data is used.
This section takes a look at the major concepts with regard to bulk meta-data collection: bulk, meta-data, and collection. Also discussed is the “value” of bulk data collection in terms of the capabilities it enables. We do not discuss the moral or social values around those capabilities. These are important issues – it may well be that these capabilities provide intelligence analysts with very little intelligence and therefore the practice of bulk data collection is an unnecessary intrusion of people’s privacy. However, these questions do not interest us here.
Meta-data is often described as “data about data”. This, at first glance appears rather unhelpful; however, it does point out the main difference between data and metadata, the fact that it is data about other data is what makes it meta-data. It is the relationship with other data that makes some piece of information meta-data. On a common description, imagine I send an email to you at 10:13 AM today. Whatever is contained in the email, the content (the subject an message), would be the primary data. However, the time it was sent (10:13 AM), who sent it (me), who it was sent to (you) etc., would be the meta-data: that is, this is data about the email, rather than the email itself.
Meta-data is useful in that it is easily readable by machines. Much work has been done to have machines understand languages and parse text; however, human language poses significant challenges for automated processing. Furthermore, analyzing such information is processor intensive. In contrast, meta-data associated with that information is easy to analyze for machines. With telephony meta-data, for example, a machine can tell you a lot about someone’s life. The machine analysis can tell the user who the target talks to most often and for how long, whether or not they have medical problems (because you have been calling a doctor a lot), how often they order pizza, etc. For a machine to learn that same information via the complex primary data is extremely difficult and unreliable.
It is because of its capacity for revelation that meta-data is used in counter-terrorism. Important examples of information and meta-data for counter-terrorism purposes include:
- Telephony meta-data: caller number, receiver number, duration of call, time of call
- Email meta-data: sender email address, receiver email address, IP address of email service
- Internet History: URL address, length of visit, IP address of visitor
- Financial transactions
In order for this data to be useful, however, it must be collected. While a discussion of the programs and methods for collection will occur in section 2, we must briefly say something about what it means for the data to have been collected by the government.
Intelligence agencies first must find and copy a signal. This signal could be the fiber optic cables serving as the backbone to the internet, or telephone wires. The signal is copied, extracted, and filtered. The filters are a set of discriminants which are essentially search terms. The data that makes it through these filters is stored in government servers. Once the data makes it to these servers and is available to intelligence analysts it has been collected. This places a lot of importance on the filtering process.
Restricting collection to only those meta-data which make it to government storage servers means that although the government has access to much of the data that streams through the internet, it actually collects a very small amount. This also means that data stored on corporate servers (e.g. Facebook, Google) which the government can gain access to through a warrant does not count as data collected by the government.
Targeted meta-data collection is collection of meta-data associated with any person of interest to the intelligence community. A person of interest would be a “target”. The data is collected by creating a “discriminant” associated with a target or a group of targets. For example, Osama bin Laden was a person of interest. Any identifier (e.g. email, phone number, etc.) associated with Osama bin Laden would be a discriminant to filter that data into intelligence community servers. In contrast, bulk collection refers to collected meta-data which is not targeted and will result in a large portion of collected meta-data not being associated with current targets.
The collection of data associated with people like Osama bin Laden, then, is targeted collection. This example gives a very specific target; however, targeted vs. bulk is not a clear cut distinction. Instead there is a continuum going from extremely targeted to indiscriminate bulk meta-data collection. Using Osama bin Laden as a discriminant is extremely targeted collection. Collecting all data travelling through the internet is at the opposite end of the spectrum as indiscriminate collection. In between there are discriminants like “all data associated with Al Qaeda” or “all data travelling to and from Pakistan”. Most would call the former targeted collection because anyone associated with Al Qaeda is likely to be a person of interest. Whereas the latter is bulk collection, as much of the data will be about persons not of direct or immediate interest to the intelligence community.
Why not simply collect data on identifiers related to current targets? This issue was discussed at length by a committee formed by former president Barack Obama. He wanted to know if there were technical alternatives to bulk data collection. While the committee made no statements regarding the actual value that bulk data collection provides, it did come to the conclusion that no other options currently exist which provide the same capabilities to intelligence analysts that bulk data collection does.
The most important capability which bulk data collection provides is access to historical information. When US intelligence discovers a new target through human informants, bulk data collection allows them to look through vast amounts of data about the past regarding identifiers associated with this target. Without bulk data collection, US intelligence could only collect this target’s present and future electronic communications. If the suspect had sent many emails whose meta-data could reveal a terrorist cell, this information would not be in the intelligence database. However, if meta-data is already collected in bulk, these emails may have been collected through bulk intelligence gathering – revealing many new targets. Thus bulk collection presents a great opportunity for retrospective analysis; once a target has been identified, the bulk collection presents myriad intelligence opportunities for observation, analysis and subsequent targeted surveillance.
Bulk data collection also allows for the possibility of using big data analytics. This broadly refers to using statistical analysis to discover patterns about a set of data. Google has used its collected search data to predict flu outbreaks, and it may be possible for US intelligence to analyze its bulk data to predict terrorist attacks. There are significant differences between these two contexts, but the ability to predict using big data analytics on bulk collected data is the goal of both.
The National Security Agency (NSA) in the United States is the primary collector of bulk meta-data. The NSA’s primary mission is to collect, analyze, and distribute foreign intelligence to the US government. Although there are other agencies charged with the collection of foreign intelligence, the NSA specializes in signals intelligence (SIGINT). The CIA, by contrast, is the primary collector of human intelligence (HUMINT). This makes the NSA an especially important organization with regard to bulk meta-data collection.
There are many programs and partnerships which enable the NSA to collect meta-data in bulk. These programs and partnerships are not only with other government organizations but with corporate entities as well. For example, its Special Sources Operations (SSO) program partners with companies to gain access to vast amounts of internet data. One SSO program called UPSTREAM partners with telecommunications companies to put splitters on the fiber cables coming into the US that operate as the backbone of the internet. This allows the NSA to gain access to a large percentage of total internet traffic as the original infrastructure powering the internet was built in the United States. One SSO partner (believed to be AT&T) has put secret rooms into the stations that receive overseas data via fiber optic cables. These rooms have fiber optic splitters which are capable of copying the incoming data and storing it on NSA servers.
Since the Snowden revelations regarding these bulk data collection programs, many countries have attempted to ensure that their data does not go through the United States (or other major surveillance states). While there has been interesting developments in this area, studies have shown that the US remains difficult to avoid.
The NSA’s PRISM program allows the NSA access to some of the world’s top consumer technology companies’ servers. These companies are not allowed to disclose that a backdoor has been created to allow the NSA access. A backdoor means that the NSA has been given a way into their servers. Known consumer technology companies cooperating with the PRISM program include: Apple, Google, Facebook, and Skype. These back doors provide opportunity to collect meta-data in bulk.
In the UK, the primary institution collecting meta-data in bulk is GCHQ. GCHQ’s TEMPORA program intercepted communications from fiber optic cables under water and the SHELLTRUMPET program had processed 1 trillion meta-data records by December 2012. GCHQ also worked with the NSA to break common encryption techniques – leading the way for more intrusive surveillance.
GCHQ and other UK intelligence agencies have long standing and extensive cooperation with the US intelligence community. This grew from collaborations in World War II, through the UKUSA agreement in 1945. In relation to Europe, a series of multilateral intelligence exchanges began with the so-called ‘Club of Berne’ in the 1960s. What began with nine West European countries now includes 28 European countries, and is coordinated by the EU Intelligence Analysis Center (IntCen). These historical, formal and personal relations provide the basis for sharing of intelligence and meta-data.
Underpinning coordination between the NSA and GCHQ is the so called ‘Five Eyes’ agreement. This is a semi-formalized intelligence cooperation agreement between the US, the UK, Canada, New Zealand and Australia. The primary agencies tasked with collecting SIGINT are: US – NSA; UK: GCHQ; Canada: CSE; Australia: ASD; New Zealand: Government Communications Security Bureau (GCSB).
The Five Eyes agreement is important for two main reasons. First, it gives a set of agreements around intelligence sharing. If Australia has relevant intelligence of interest to the US, then there are mechanisms in place to make the sharing of that intelligence with the US easier. Second, given the long standing cooperation between the five countries, there is confidence in sharing of the intelligence between the countries; this enables easier cooperation using highly classified and top secret intelligence. Following Snowden’s releases, it thus became apparent that the meta-data gathered by the NSA could be accessed by other members of the Five Eyes, and vice versa.
While this cooperation is useful for legitimate national security purposes, it must be pointed out that this cooperation has implications for foreign/domestic meta-data gathering, use and surveillance more generally. The Five Eyes countries all have tight constraints on domestic surveillance and intelligence gathering on their own citizens. Yet these constraints are lessened when considering international surveillance and intelligence on non-citizens. For example, the FBI can gather, access and use meta-data on domestic citizens that the CIA cannot and vice versa (see below for more on this). The cooperation between nations makes this domestic/international division complicated. This is further complicated as the specific breadth and scope of the Five Eyes cooperation is murky. These intelligence sharing arrangements have special relevance to counter-terrorism as they can directly impact both the capacity for counter-terrorism operations and the willingness of nations to collaborate on counter-terrorism operations.
The NSA has collected billions of emails and phone records through its surveillance program. The ability to do this is governed by a number of laws, executive orders, and policies. Furthermore, the courts have weighed in on whether or not the law extends the authority that the NSA thinks it has to conduct all of this surveillance. Given the important role of the NSA in US meta-data collection, this section gives a snapshot of the laws and policy governing their tactics involving bulk meta-data collection. The tactics used by the NSA are also used in a number of other countries. Similar to the US, laws governing these countries are primarily concerned with ensuring that the citizens of the country conducting meta-data collection are protected. That is, common across many liberal democracies is the idea that citizens of a given country are afforded different legal protections from surveillance than non-citizens.
The 4th Amendment to the Constitution protects US persons from ‘unreasonable search and seizure’. The Supreme Court has ruled that electronic surveillance counts as a search and seizure and therefore the 4th amendment applies. The 4th amendment only applies to US persons, defined by the NSA as
a citizen of the United States; an alien lawfully admitted for permanent residence; an unincorporated association with a substantial number of members who are citizens of the U.S. or are aliens lawfully admitted for permanent residence; or a corporation that is incorporated in the U.S.
Similar definitions can be found in various US laws. The rules, therefore, governing this practice are different for US and non-US persons. Important differences will be noted after a discussion of the legislative situation in the US.
Up until 2015 major US telecommunications companies like Verizon and AT&T bulk collected phone meta-data on US people, which was then passed along to the NSA. The authority to do this, according to the NSA, was given by Section 215 of the USA Patriot Act. Section 215 which requires meta-data collection to be “relevant” to investigations of terrorist groups was interpreted by the government to allow the collection of all US person phone meta-data (in 2011 this included call records from AT&T for 1.1 billion domestic cellphone calls per day). Relevant is interpreted quite broadly by the government to include meta-data that could one day be useful for an investigation.
Section 215 expired in 2015 and was replaced by the USA Freedom. This effectively ended the NSA’s US person phone meta-data collection program. The USA Freedom Act requires telecommunications companies like Verizon and AT&T to store the meta-data themselves for a period of 18 months. These telecommunications companies are required to give this meta-data to the NSA when there is a FISA court warrant.
The FISA (Foreign Intelligence Surveillance Act) court (FISC) is a secret government court which approves and rejects requests for surveillance. Although it is not a secret that he court exists, the deliberations and judgments are classified. Government requests can be individual requests, for example of US person data, or for bulk collection programs. FISC approves collection programs provided that they have reasonable “US person minimization procedures.” The decisions of this court are not available to the public, although it is known that FISA rarely rejects surveillance requests. The FISA court can in effect make policy by interpreting relevant laws with regard to new collection techniques and technologies. FISC, for example, agreed with the government’s broad interpretation of section 215 of the Patriot Act. Some argue that FISA’s classified rulings amount to a body of “secret law”. Privacy groups have sought to declassify some of these rulings – with some success.
The FISC is set up to ensure that any intentional data collection on US persons has a justifying reason. The intelligence community’s minimization procedures and policies regarding US person meta-data collection can be found in U.S. Signals Intelligence Directive 18 (USSID 18). The Purpose of USSID 18 is to: “balance the U.S. Government’s need for foreign intelligence information and the privacy interests of persons protected by the Fourth Amendment.” The document is a response to the need for the “minimization of U.S. person information collected, processed, retained or disseminated”. It reflects the language of Executive Order 12333 (issued by Ronald Reagan – which gives the intelligence community legal authority to conduct bulk surveillance – in that it is an attempt at U.S. person “minimization”. USSID 18 explicitly states that the target of signals intelligence collection is “foreign communications”. However, U.S. persons can be targeted under certain conditions – with the permission of the US Attorney General, a FISC warrant, or in an emergency when the Attorney General’s permission cannot be reasonable received in a timely fashion.
Despite these restrictions and limitations, collection of meta-data on US persons still occurs. It is estimated that over 151 million records associated with US person phone calls were collected in 2016. These records were collected with regard to the investigation of only 42 terrorism suspects. The government is allowed to collected these 42 suspects’ records along with all the records associated with people a step away from these suspects (people who called or were called by the suspects). This collection, however, probably does not constitute “bulk” collection as it is targeted at the 42 suspects and their contacts.
It is possible that the NSA and CIA receive intelligence on US persons through agreements with other countries which conduct bulk meta-data collection.
The FISA Amendments Act of 2008 requires the government to adopt minimization procedures which will minimize the collection of US person meta-data. The intelligence community submits their minimization procedures to the FISC which approves them for one year at a time. This allows the government to collect as much meta-data as they can (from telecommunications companies, or the backbone of the internet) as long as they follow their approved minimization procedures.
The minimization requirements are derived from Executive Order 12333 (signed by Ronald Regan). The NSA refers to EO 12333 as the primary legal authority for its operations. It says that as long as an intelligence operation does not “intentionally target a U.S. person” and is “conducted abroad” then the operation is free of 4th Amendment constraints on unlawful search and seizure as meeting the criteria above means that the operation is targeted at non-U.S. citizens.
Agreements with other countries regarding sharing of intelligence information does somewhat constrain US collection of non-US person meta-data. The best known agreement is the Five Eyes agreement between the US, Canada, the UK, Australia, and New Zealand. This agreement allows these countries to share foreign intelligence information. This greatly expands the scope of what the NSA can collect as the data not entering the US may enter these other countries. It is supposed to prevent these countries, including the US, from spying on each other’s citizens. However, exceptions are made if it is in the US’s national interest. It is unclear how “national interest” is interpreted in practice.
From the Snowden disclosures it appears that there may be upwards of 30 other countries which have similar agreements with the US. These countries are considered as second or third tier and receive less cooperation in terms of data and intelligence sharing than the Five Eyes. The details of these agreements have not been disclosed.
The flip-side of these agreements is that the US can potentially receive intelligence on US persons from its Five Eyes partners. For instance, while GCHQ, the British SIGINT agency, cannot collect meta-data on British persons, it can collect meta-data on US persons, and can potentially share the meta-data and/or intelligence gleaned from analysis with is US partners. Part of the controversy around possible links between Russia and the Trump 2016 campaign team derives from the fact that GCHQ had intelligence on US persons, which it then shared with US intelligence agencies. Thus, the agreements mean that domestic protections are more permeable than might appear. This is particularly relevant when considering bulk collection of meta-data, as agencies like the NSA and GCHQ have been shown to be heavily engaged in bulk collection.
There are two executive orders currently standing which mention the privacy of non-US persons. Presidential Policy Directive 28 (PPD 28) issued by Barack Obama and the Executive Order on Privacy issued by Donald Trump on January 25, 2017. PPD 28 extends privacy protections usually only covering US citizens to everyone “to the maximum extent feasible consistent with the national security.” Donald Trump’s executive order seems to directly contradict this one by stating:
Agencies shall, to the extent consistent with applicable law, ensure that their privacy policies exclude persons who are not United States citizens or lawful permanent residents from the protections of the Privacy Act regarding personally identifiable information.
This may impact data sharing agreements with other countries.
In contrast to the US, the EU has taken a comprehensive approach to data privacy which requires member states to “protect the fundamental rights and freedoms of natural persons, and in particular their right to privacy, with respect to the processing of personal data” (EU Data Protection Directive 95/28/EC). Privacy in the EU is considered a fundamental right, and gives citizens the “right to be forgotten”. The US has recently allowed Internet Service Providers to sell their customer’s data – including browsing history. This would not be allowed under current EU law.
In contrast to the US, there are many laws in the EU which protect data from private companies which the US does not have. However, EU laws regarding government surveillance regulate bulk data collection in a very similar way to the US. There are loopholes and exceptions offered to member states when it comes to issues of national security, defense, and public security. Arguably, this leaves the EU in a position functionally equivalent to the United States with regard to non-Citizens.
The UK’s Investigatory Powers Act of 2016 has generated much controversy and has been labeled as the “Snooper’s Charter”. It details the powers that intelligence agencies within the UK have in order to conduct bulk surveillance. There are four bulk powers detailed:
- Bulk Interception
- Bulk Acquisition
- Bulk Electronic Interference
- Bulk Personal Data Sets
Bulk interception allows, under certain conditions, to collect meta-data streaming through the internet in bulk (similar to the NSA’s UPSTREAM program. Bulk Acquisition allows for collection of meta-data and data from telecommunications and internet companies (similar to the NSA’s PRISM program). Bulk electronic interference allows for ‘hacking’ of devices in order to collect data which could not be collected by the other two powers above due to things like encryption. Finally, bulk personal data sets allows for the collection of transactional data from airlines and financial institutions.
There are constraints put on these powers and an independent review has judged these powers to be necessary. This legislation makes transparent the powers which were already being used – which is in stark contrast to the US context where the powers are derived from abstract language in a diverse set of laws, policies, and executive orders.
With some restrictions, intelligence agencies like the NSA have collected and have access to a massive amount of meta-data. What do they do with it, and how effective is it for the purposes of counter-terrorism? Below are some of the tactics currently used on collected meta-data as well as proposals and hopes for how all of this data can be leveraged to prevent and fight terrorism in the future.
Contact chaining is the process of using information (in this case meta-data) about connections between identities to identify members of an organization. The bulk collection of meta-data allows intelligence agencies to identify networks of people associated with a known target. So if an analyst knows Person X to be a terrorist the analyst can use bulk collected meta-data to determine who is communicating with Person X.
Contact chaining can expose members of a terrorist organization not currently known as well as help the intelligence community understand the organization of a terrorist network. Furthermore, in the event of a terrorist attack in which there is a known suspect, bulk meta-data can be used to quickly find others who may be have been involved in the attack. Without the collection of bulk meta-data this would not be possible.
Contact chaining can then be used as part of social network analysis (SNA) to recognize the main actors in, and possible structures of terrorist organizations. These structures matter for counter-terrorism, as the choice in how to understand the structure of an organization can determine counterterrorism policy.
Bulk meta-data may also be used to discover alternate identifiers used by a target. A target may switch email addresses or use an alternate Internet Service Provider (ISP) which would give him/her a new IP address. This makes it difficult for intelligence agencies to track the target. With bulk meta-data, however, intelligence agencies can discover new identifiers communicating in similar ways to known identifiers.
So if Person X switches email addresses – rendering the old email address silent – an intelligence analyst can use bulk meta-data to discover a different email address communicating with the same network of contacts.
Meta-data collected in bulk makes possible the use of big data analytics to identify new targets and anticipate terrorist attacks. Although it is unclear exactly how big data analytics is being used to process bulk collected meta-data, insights can be gleaned from companies offering big data analytic services and intelligence officers and analysts touting the practice as the future of counter-terrorism.
The US does use data analytics in some cases, like figuring out who to place on its “No Fly” list.
The intelligence communities in the US and UK have also been partnering up with companies like Palantir (funded by the CIA) to use big data analytics to help organize and analyze the large amounts of data collected by the intelligence community. Another company – SAS Institute – advertises on their website that:
The advantage of advanced analytics is that you don’t need to know what you’re looking for. The technology can spot behaviour in the right kind of area. It can be particularly helpful when you’re looking for the lone wolf…Advanced analytics lets you flag individuals who have disturbing behavior profiles – not just the ones who are connected to networks or groups that are already under suspicion
These companies directly market their big data analytics software to tackle the problem of the massive amounts of data contemporary intelligence agencies collect.
Big data analytics is being used by researchers to create profiles of those who are susceptible to radicalization. Combining these profiles with bulk collected metadata could allow for closer monitoring of people identified as a recruiting target for a terrorist group. Going further, if an algorithm can cross reference those thought to be susceptible to radicalization with data about who holds pilots licences, for example, there may be a way to predict that someone is planning an attack.
Academics have also proposed methods for using data for counter-terrorism. For example, there are proposals for the use of twitter sentiment analysis to predict terrorist attacks. Another study claims that their “method can predict within a 1.5 miles radius incident that will occur in the next 24 hours”. Yet another study claims to use neural networks to be able to detect suspicious behavior using GPS and communications data. Furthermore, academics have written articles and books to prepare security analysts and intelligence officers the big data analytics transformation of their fields.
Bulk meta-data collection as a counter terrorism tactic is clearly gaining in importance. In this paper we have distinguished bulk meta-data collection from targeted by saying bulk means that much of the data collected is not associated with current government targets. The value of this data is that analysts can use it to find new targets – either through their connections with current targets (contact chaining) or by algorithmic discovery of suspicious behavior (big data analytics).
The bulk collection of meta-data is conducted by large intelligence agencies like the NSA in the US and GCHQ in the UK; however, many countries have intelligence agencies and there is a complex web of cooperation between them. There is also a complex legal oversight system which regulates the collection of meta-data. In the US, for example, there must be mechanisms to ensure that US person data is ‘minimized’.
There is little evidence that bulk collected meta-data has led to the prosecution or prevention of terrorists and terrorist plots. There is much research and development into solutions which will make turn this data into actionable intelligence. Companies are marketing their products directly to intelligence agencies and academics are publishing methods to use data to predict terrorist attacks.
 Executive orders are legally binding orders issued by the president of the United States. They can be overturned by future presidents or by the court system if deemed unconstitutional.