RSAC 2011: Data Security Wunderkind: Tokenization

Tokenization may still be the new kid on the block in the data security technology world, but it’s definitely here to stay. In fact, it just might be the Wunderkind of the data security industry for its ability to lower an organization’s risk posture. It does this by removing sensitive data from applications and databases, which has the added benefit of reducing scope for Payment Card Industry Data Security Standards (PCI DSS) compliance audits.

Over the past couple of years, the tokenization data security model has taken its rightful place alongside data encryption, and it is well on its way to becoming a commonplace solution for credit card protection. What’s more, a particular version of tokenization—Format Preserving Tokenization™—is equally adept at protecting personally identifiable information (PII) and electronic health records (EHR) to help organizations comply with data privacy laws like the EU Data Privacy Directive and HIPAA.

Tokenization was introduced to the data security market five years ago. As pioneering companies began adopting it, the industry took notice, resulting in more vendors introducing on-premises tokenization solutions and managed tokenization services. The PCI Security Standards Council (PCI SSC), industry analysts and the media began to investigate tokenization further, moving it into the mainstream. While initially drawing the attention of companies wanting to use it to secure credit card numbers and reduce scope for PCI DSS, tokenization is now recognized as a global standard for the protection any type of consumer, patient, employee and corporate information.

What is Tokenization?

Simply put, tokenization is a data security model that generates surrogate values, called tokens, to replace sensitive data—credit card numbers, for example—in applications and database fields. The sensitive data is simultaneously encrypted and stored in a central data vault, where it can be unlocked only with proper authorization credentials.

A token can be safely passed around the network between applications, databases and business processes, leaving the encrypted data it represents securely stored in a central data vault. A token can be used by any file, application, database or backup medium throughout the organization, thus minimizing the risk of exposing the actual sensitive data while allowing business and analytical applications to work without modification.

Because tokenization reduces the number of points where sensitive data is stored by locking the ciphertext away in a central data vault, sensitive data is easier to manage and more secure. It’s much like storing the Crown Jewels in the Tower of London or the U.S. official gold reserves at Fort Knox. All are single repositories of important items, well-guarded and easily managed. This provides a greater level of security than traditional encryption, in which sensitive data values are encrypted and the ciphertext is returned to the original location in an application or database, with a greater level of exposure.

Tokenization in an Enterprise

The most effective token servers deliver an intelligent and flexible data security strategy. Under the tokenization model, data to be encrypted is passed to the token server, where it is encrypted and stored in the central data vault. The token server then issues a token that it places into applications or databases where required. When an application or database needs the encrypted value, it makes a call to the token server using the token to request the full value.

Referential integrity can introduce problems when various applications (e.g. data warehouses) and databases use the sensitive data values as primary or foreign keys to run queries and to perform data analysis. When sensitive fields are encrypted, they often impede these operations since encryption algorithms by definition generate random encrypted values—an unencrypted value (a credit card number, for instance) does not always generate the same encrypted value. While there are methods to make encryption consistent, there are risks associated with removing the “randomization” from encryption. A consistent, format-sensitive token eliminates this issue. 

With Format Preserving Tokenization, the relationship between data and token is preserved, even when encryption keys are rotated. The central data vault contains a single encrypted version of each original plain text field. This is true even when encryption keys change over time, because there is only one instance of the encrypted value in the data silo. This means the returned tokens are always consistent whenever the same data value is encrypted throughout the enterprise. Since the token server can maintain a strict one-to-one relationship between the token and data value, tokens can be used as primary and foreign keys, and referential integrity is assured whenever the encrypted field is present across multiple data sets. And since records are created only once for each given data value (and token) within the data vault, storage space requirements are minimized.

Maintaining referential integrity is also useful for complying with the EU Data Privacy Directive, which regulates the electronic transfer of national insurance numbers across international borders. Using tokens in place of encrypted values meets the requirement of the law, yet allows for data analysis across borders.

Why Tokenization is Becoming Even More Important

With most credit card data being locked down when at rest and in transit across wireless networks due to PCI DSS requirements, cybercriminals are going after this data in other places, such as in transit on corporate networks. They’re also focusing more on stealing PII, particularly those fields most commonly collected and useful such as Social Security, driver’s license, state ID, financial account and passport numbers, followed by more specialized PII such as biometric information and health records. State breach notification laws passed by 46 states address PII data theft and loss, and spell out requirements for companies to follow when such a breach occurs. Attacks aimed at stealing PII are expected to increase as credit card data becomes increasingly harder to steal, putting other types of organizations and companies at risk beyond those that collect and store credit card numbers.

Many North American and European retail, hospitality, financial services and insurance companies—both large and mid-sized—have already implemented tokenization to protect cardholder information and reduce scope for PCI DSS. Yet, as they gain experience with the full lifecycle of protection and compliance, they are realizing the need to explore protection for other types of customer and company data at risk. In fact, whether in response to privacy laws or the desire to better protect customers and employees, organizations of all types are moving beyond securing credit card numbers to seeking ways to guard the many types of PII under their care. This presents new data encryption and storage challenges because PII data can be harder to locate and lock down. It typically resides in many places throughout an enterprise, and trying to secure it can be a complex, resource-intensive exercise.

Frequent high-profile breaches of credit card numbers, social security numbers and other PII over the past few years are causing board members, shareholders and stakeholders, including consumers, to care deeply about data security. Now that data security has broadened from an IT issue to a corporate and customer issue, tokenization is becoming an important defense against data theft and misuse.

Tokenization in Practice

There are a number of instances where companies have implemented tokenization to reduce scope for PCI DSS compliance and audits. For example, a $500 million U.S. direct marketer implemented tokenization to protect cardholder information gathered by phone, mail order and online orders and to comply with PCI DSS. By substituting tokens for credit card numbers in applications, this e-tailer was able to take 80 systems—nearly 90 percent—out of scope for PCI DSS audits, leaving only 10 systems to be audited.

As part of the tokenization deployment, more than one million credit card numbers stored in the company’s SQL database were converted, securing the data in a segregated vault environment and replacing the sensitive data with tokens on all out-of-scope systems. The Format Preserving Tokenization solution the e-tailer chose plugged directly into the company’s architecture and applications with minimal change to the capturing applications, so business operations have not been impeded in any way. The company now estimates that its nearly 90 percent reduction in scope has the potential to save approximately $250,000 annually in staff and administrative overhead, and is in the process of reinvesting its cost savings to tokenize customer loyalty data and other PII.

In another example, one of the UK’s oldest, largest and most esteemed retailers recently implemented Format Preserving Tokenization across its heterogeneous retail, order management and data warehouse IT infrastructure, enabling it to meet PCI DSS encryption requirements without costly programming modifications and with no requirement for additional computing resources or staff, as would have been required with the other solutions it considered.

What’s in Store for Tokenization in 2011?

In October 2009, PricewaterhouseCoopers delivered the results of a pivotal study commissioned by the PCI SSC to review emerging data security technologies that could potentially help merchants comply with PCI DSS more quickly and cost-effectively. PricewaterhouseCoopers narrowed it down to 14 technologies and as a result, a special interest group (SIG) was formed to further investigate how the top four technologies can help to reduce the scope of a PCI DSS audit, one of them being tokenization.

The Scoping SIG’s Tokenization Working Group is in the process of writing a guidance document on tokenization. When released in April 2011, this document will advise merchants and service providers on best practices for the implementation of tokenization to reduce scope and therefore the cost of compliance. A subsequent document will be issued to provide guidance about validation of a tokenization solution and its implementation. The PCI SSC has not set a timeframe for the delivery of this yet.

In 2010, companies first began to use tokenization to protect PII such as social security, driver’s license and passport numbers to comply with existing privacy laws and to prepare for the passing of U.S. data privacy legislation. In 2011, we expect to see many more mid-sized to large enterprises adopt tokenization more broadly to protect many other types of sensitive information, including electronic health records (EHR).


The value of tokenization is indisputable for any company that wants to secure credit card numbers and reduce the cost of PCI DSS compliance and annual audits, or to protect any other type of customer, patient, employee or company-confidential information. Its appeal is quickly extending beyond the retail industry to financial services, insurance and hospitality industries and health care.

About the Author
Gary Palgon, CISSP®, is Vice President of Product Management for data protection software vendor nuBridges, Inc. He leads the Payment Card Industry’s Tokenization Working Group, one of four working groups in the PCI SSC’s Scoping Special Interest Group (SIG). Palgon is a frequent contributor to industry publications and a speaker at conferences on eBusiness security issues and solutions. Gary can be reached at
[email protected].

Like this article? Please share on Facebook and give The Tech Herald a Like too!