Data Masking: Sharing, while protecting

19th October 2010
Data Masking: Sharing, while protecting
Take care! That data wall could be breached!

No one could have foreseen the volume of information being created and stored in today’s data-intense environments, by business and government organizations. The average medium-to-large enterprise experiences data growth rates of over 50 percent a year.

IDC studies have shown that the total volume of digital information in 2011 is expected to be 1,773 exabytes
(a six fold increase compared to 281 exabytes in 2007).This means that every two years, the amount of information you need to secure and manage more than doubles within an enterprise. ( an exabyte is 100 petabytes or 1 billion gigabytes)

Indian enterprises across many verticals-- like those in other geographies -- are huge storehouses of confidential information: intellectual properties, source codes, customer information, etc.—and it’s estimated that more than half of that data could be classified as confidential.

Sensitive personal, financial, and health information is protected by a web of industry and governmental data privacy regulations. If organizations fail to maintain data privacy, they face hefty financial and legal penalties as well as a total loss of consumer and market confidence.

However, they also need to use production data in development and testing to validate the functionality of business-critical applications—which often involves replicating real data to non-production systems or shipping production copies to outsourced testing and development facilities. As a result, they risk:exposing sensitive data to unauthorized users, compromising data privacy; face potentially devastating consequences of noncompliance and risk customer defection due to loss of trust in data privacy policies

The challenge in data privacy is sharing data while protecting personal information:And this is where a new buzz and a new technology solution has emerged: DATA MASKING

What does Data Masking do? It protects sensitive information by transforming it into de-identified, realistic-looking data while retaining original data properties; ensures that data remains relevant and meaningful; preserves the shape and form of individual fields and preserves intra-record relationships. 
Data Privacy and Data Masking solutions ensure your ability to run application processes and view application screens without exposing original data. They obscure sensitive data and protect data privacy in development, test, and training copies with data privacy solutions, while still working with realistic data sets. They provide your teams with data that complies with data privacy regulations, yet still retains referential integrity and is contextually correct.

To take a few examples, here are some built-in rules  to be found in all good  data masking system:
When dealing with a Credit Card Number
: Generate a random but valid credit card number using the Luhn algorithm; Preserve Issuer Identifier (Visa, Mastercard, etc), the first 6 digits ofthe CC Number.
When handling phone numbers: Generate a random phone number but preserve the incoming phone format.
With an Email Address, generate a random email address of the correct format with @, .,etc.
With a URL: Generate a random URL value with the correct format
For an IP address: Generate a random IP address within the same network range
For names and address: Generate random but realistic looking values for Names and Addresses. Name Masking. For example, mask John Smith to Glen Harrison. Address Masking. For example, mask 100 Cardinal way to 6 Meadows Pkwy.

IndiaTechOnline spoke to Adam Wilson, General Manager Information Lifecycle Management at Informatica, the leading global provider of enterprise data integration software to get an update on the state-of-the-art in Data Masking. The following documents will be helpful for a better understand the challenge of having to ensure data privacy while sharing data -- and the solutions that industry offers:

Data Privacy Best Practices for Data Protection in Nonproduction Environments http://www.informatica.com/downloads/6993_data_sec_sap_wp_web.pdf  

Julie Lockner, Founder, www.CentricInfo.com makes the case for data masking in a blog: http://blogs.informatica.com/perspectives/index.php/2010/09/24/a-data-masking-conversation/

You can find more white papers here: http://www.informatica.com/solutions/application_ilm/test_data_management_solution/Pages/index.aspx

October 19 2010