Data analysis is a key task within an data management lifecycle and help businesses to take data driven decisions by discovering hidden trends and patterns within data. You can gain powerful insights and make accurate conclusions when data is well-aligned to business objectives. Off late it has been an utmost priority to ensure that characteristics like data quality, data integrity, data security, data completeness and data consistency are well thought through to help with accurate data analysis.
As a data analyst, alignment is something you will need to judge. Good alignment means that the data is relevant and can help you solve a business problem or determine a course of action to achieve a given business objective. In this section we will talk through the details of some of these key data characteristics.
Data Quality & Integrity with well-aligned business objectives help teams to come up with accurate conclusions. On top of that, you will learn how new variables discovered during data analysis can cause you to set up data constraints so you can keep the data aligned to a business objective.
A good analysis depends on the integrity of the data, and data integrity usually depends on using a common format. So it is important to double-check how dates are formatted to make sure what you think is November 10, 2022 isn’t really October 12, 2020, and vice versa.
For an accurate data analysis here are some error scenarios to watch out for:
Data replication compromising data integrity: Continuing with the example, imagine you ask your international counterparts to verify dates and stick to one format. One analyst copies a large dataset to check the dates. But because of memory issues, only part of the dataset is actually copied. The analyst would be verifying and standardizing incomplete data. That partial dataset would be certified as compliant but the full dataset would still contain dates that weren't verified. Two versions of a dataset can introduce inconsistent results. A final audit of results would be essential to reveal what happened and correct all dates.
Data transfer compromising data integrity: Another analyst checks the dates in a spreadsheet and chooses to import the validated and standardized data back to the database. But suppose the date field from the spreadsheet was incorrectly classified as a text field during the data import (transfer) process. Now some of the dates in the database are stored as text strings. At this point, the data needs to be cleaned to restore its integrity.
Data manipulation compromising data integrity: When checking dates, another analyst notices what appears to be a duplicate record in the database and removes it. But it turns out that the analyst removed a unique record and not a duplicate record for the company. Your dataset is now missing data and the data must be restored for completeness.
Fortunately, with a standard date format and compliance by all people and systems that work with the data, data integrity can be maintained. But no matter where your data comes from, always be sure to check that it is valid, complete, and clean before you begin any analysis.
Teams need to find ways to balance their data security measures with their data access needs. There are a few security measures that can help companies do just that. The two we will talk about here are encryption and tokenization.
Encryption uses a unique algorithm to alter data and make it unusable by users and applications that don’t know the algorithm. This algorithm is saved as a “key” which can be used to reverse the encryption; so if you have the key, you can still use the data in its original form.
Tokenization replaces the data elements you want to protect with randomly generated data referred to as a “token.” The original data is stored in a separate location and mapped to the tokens. To access the complete original data, the user or application needs to have permission to use the tokenized data and the token mapping. This means that even if the tokenized data is hacked, the original data is still safe and secure in a separate location.
Encryption and tokenization are just some of the data security options out there. There are focused teams dedicated to data security or hire third party companies that specialize in data security to create these systems. But it is important to know that all companies have a responsibility to keep their data secure.
Reference: I recently completed Google Data Analytics certification and wanted share my learnings
Comments