top of page
hand-businesswoman-touching-hand-artificial-intelligence-meaning-technology-connection-go-

DATA MASKING

Introduction

Data breaches worldwide expose millions of people’s sensitive data each year, causing many business organizations to lose millions.

Many data breaches can go months before the victim organization detects the intrusion and often costs millions of dollars in recovery. Some of the major consequences of a data breach include:

  • 4.35 million dollars: the average cost of a data breach globally in 2022, an all-time high.

  • 9.44 million dollars: The average cost of a data breach in the United States in 2022, is the highest of any nation.

  • 277 days: The average time to identify and contain a data breach in 2022. Broken down, this was 207 days to identify the breach and 70 days to contain the breach.

  • Loss of customer trust and long-term damage to the reputation of the impacted organization.

  • An inability to conduct business to include severe delays and complete halts in operation.


Consequently, data protection has become the top priority of many organizations. That’s why data masking has become an essential technique many businesses need to protect their sensitive data.


What is Data Masking?

Data masking is a way to create a fake, but realistic version of organizational data. The goal is to protect sensitive data, while providing a functional alternative when real data is not needed - for example, in user training, sales demos, or software testing.

Data masking processes change the values of the data while using the same format. The goal is to create a version that cannot be deciphered or reverse-engineered. There are several ways to alter the data, including character shuffling, word or character substitution, and encryption.



Why is Data Masking Important?

Here are several reasons data masking is essential for many organizations:

· Data masking solves several critical threats – data loss, data exfiltration, insider threats or account compromise, and insecure interfaces with third-party systems.

· Allows sharing data with authorized users, such as testers and developers, without exposing production data.

· Reduces risks associated with sharing the data with integrated third-party applications and cloud migrations.

· Avoids risks associated with outsourcing any project. Because most organizations merely rely on trust when dealing with outsourced persons, masking prevents data from being misused or stolen.

· Makes data useless to an attacker, while maintaining many of its inherent functional properties.

· Allows sharing data with authorized users, such as testers and developers, without exposing production data.

· Can be used for data sanitization – normal file deletion still leaves traces of data in storage media, while sanitization replaces the old values with masked ones.

· Helps companies to stay compliant with General Data Protection Regulation (GDPR) by eliminating the risk of sensitive data exposure. Because of this, data masking offers a competitive advantage for many organizations.

· Data masking solves several critical threats – data loss, data exfiltration, insider threats or account compromise, and insecure interfaces with third-party systems.


Which Data Requires Data Masking?

Here are the most common data types that require data masking:


  • Personally identifiable information (PII)—data that can be used to identify certain individuals. This includes information like full name, passport number, driver’s license number, and social security number.

  • Protected health information (PHI)—data collected by healthcare service providers for the purpose of identifying appropriate care. This includes insurance information, demographic information, test and laboratory results, medical histories, and health conditions.

  • Payment card information—the Payment Card Industry Data Security Standard (PCI DSS) requires merchants that handle credit and debit card transactions to appropriately secure cardholder data.

  • Intellectual property (IP)—data related to creations of the mind, including inventions, business plans, designs, and specifications, have high value for an organization and must be protected from unauthorized access and theft.

Data Masking Types

Several types of data masking types are commonly used to secure sensitive data.

Static Data Masking everywhere

Static data masking processes can help to create a sanitized copy of the database. The process alters all sensitive data until a copy of the database can be safely shared. Typically, the process involves creating a backup copy of a database in production, loading it to a separate environment, eliminating any unnecessary data, and then masking data while it is in stasis. The masked copy can then be pushed to the target location.

Deterministic Data Masking

Involves mapping two sets of data that have the same type of data, in such a way that one value is always replaced by another value. For example, the name “John Smith” is always replaced with “Jim Jameson”, everywhere it appears in a database. This method is convenient for many scenarios but is inherently less secure.

This is achieved by having a mapping table (confidential and available only to very limited users) that has a mapping between the actual value and the masked value which is created manually or using some logic. Manual activity is tedious and time-consuming if the data is bigger.


On-the-Fly Data Masking

Masking data while it is transferred from production systems to test or development systems before the data is saved to disk. Organizations that deploy software frequently cannot create a backup copy of the source database and apply masking—they need a way to continuously stream data from production to multiple test environments.

On the fly, masking sends smaller subsets of masked data when it is required. Each subset of masked data is stored in the dev/test environment for use by the non-production system.

It is important to apply on-the-fly masking to any feed from a production system to a development environment, at the very beginning of a development project, to prevent compliance and security issues.

Dynamic Data Masking

Similar to on-the-fly masking, data is never stored in a secondary data store in the dev/test environment. Rather, it is streamed directly from the production system and consumed by another system in the dev/test environment.


Data Masking Techniques

Let’s review a few common ways organizations apply masking to sensitive data. When protecting data, IT professionals can use a variety of techniques.

Data Encryption

When data is encrypted, it becomes useless unless the viewer has the decryption key. Essentially, data is masked by the encryption algorithm. This is the most secure form of data masking but is also complex to implement because it requires technology to perform ongoing data encryption and mechanisms to manage and share encryption keys. Several organizations use MD5, SHA1, and SHA2 hashing algorithms supplied by IRI.

Data Scrambling

Data scrambling is the process to obfuscate or remove sensitive data. This process is irreversible so the original data cannot be derived from the scrambled data. Data scrambling can be utilized only during the cloning process. Many databases have inbuilt Data Scrambling feature.

Nulling Out

Data appears missing or “null” when viewed by an unauthorized user. This makes the data less useful for development and testing purposes.

Value Variance

Original data values are replaced by a function, such as the difference between the lowest and highest value in a series. For example, if a customer purchased several products, the purchase price can be replaced with a range between the highest and lowest price paid. This can provide useful data for many purposes, without disclosing the original dataset.

Data Substitution

Data values are substituted with fake, but realistic, alternative values. For example, real customer names are replaced by a random selection of names from a phonebook.

Data Shuffling

Similar to substitution, except data values are switched within the same dataset. Data is rearranged in each column using a random sequence; for example, switching between real customer names across multiple customer records. The output set looks like real data, but it doesn’t show the real information for each individual or data record.

Pseudonymization

According to the EU General Data Protection Regulation (GDPR), a new term has been introduced to cover processes like data masking, encryption, and hashing to protect personal data: pseudonymization.

Pseudonymization, as defined in the GDPR, is any method that ensures data cannot be used for personal identification. It requires removing direct identifiers, and, preferably, avoiding multiple identifiers that, when combined, can identify a person.

In addition, encryption keys, or other data that can be used to revert to the original data values, should be stored separately and securely.

Data Masking Best Practices:


Determine the Project Scope

In order to effectively perform data masking, companies should know what information needs to be protected, who is authorized to see it, which applications use the data, and where it resides, both in production and non-production domains. While this may seem easy on paper, due to the complexity of operations and multiple lines of business, this process may require a substantial effort and must be planned as a separate stage of the project.

Ensure Referential Integrity

Referential integrity means that each “type” of information coming from a business application must be masked using the same algorithm.

A Single data masking tool for the entire organization is not feasible. Each line of business may be required to implement its own data masking due to budget requirements.

Ensure that different data masking tools and practices across the organization are synchronized when dealing with the same type of data. This will prevent challenges later when data needs to be used across business lines.

Secure the Data Masking Algorithms

Only authorized users should have access to the real data, these algorithms are very sensitive. If someone learns which repeatable masking algorithms are being used, they can reverse engineer large blocks of sensitive information.

A data masking best practice is to ensure separation of duties. For example, IT security personnel determine what methods and algorithms will be used in general, but specific algorithm settings and data lists should be accessible only to the data owners in the relevant department.


Conclusion:

In conclusion, data masking plays an important role in maintaining data security and privacy throughout the development and testing phases, without compromising the effectiveness of these processes. It is a valuable tool in today's data-driven world, where protecting sensitive information.

Reference: https://en.wikipedia.org/wiki/Data_masking

17 views0 comments
bottom of page