What is data?
Data science is all about doing experiments with raw or structured data. Data is a collection of facts. Every piece of information is data. Data is the fuel that can drive a business to the right path. It helps strategize campaigns and provides valuable insights. All data is usually created by our activity in the world. We can also create data by collecting information. We need data in all profession like data analyst, data scientist, businessman and many more. We need to work with raw data or clean data. That is the reason data became very important to handle without any error. To know about the data and types of it is the necessity of the time. Data is a systematic record of digital information retrieved from digital interactions as facts and figures. Let’s take a deep dive into some of the commonly used types of data so that we can get the most out of our data.
Quantitative Data vs Qualitative Data
Quantitative Data: Quantitative data is mathematical numerical data. It can be measured or counted and then expressed as numbers like units, prices, proportions, rate of change etc. We can perform mathematical operations on them. Quantitative data can be used for statistical analysis. This data answers questions like how much, how many. We can represent this data on different kinds of graphs and charts. We can go even deeper into This data and break it down into Discrete and Continuous data.
Discrete data: Discrete data represents variables which you can count in a finite amount of time. Only a limited number of values are possible in this type of data. These variables are countable instead of measurable. This data can only assume specific values that cannot be broken down into smaller parts. We will get only whole numbers without fraction or decimal. Discrete data can visualize using bar charts or pie charts.
Example:
Number of people who visit a hospital on a daily basis (10,50,100)
Room’s maximum capacity allowed.
Tickets sold in a day.
The number of items we buy at the farmer market each week.
Continuous data: Continuous data is numerical data that can assume any value in a specified range. This data can be broken down into smaller parts to make them more precise based on the sensitivity of the scale. We can change Continuous data over the time of period and have different values. Continuous data can have decimal points which offer an exact measurement between two defined data points. This data changes with time.
Example:
Daily wind speed
Temperature
Runtime marker in a video
Weight of a student
Qualitative data: Qualitative data is a type of data which describes qualities or characteristics on an item. This type of data cannot be measured or counted. It does not have the potential to be expressed as a number. This data is in the form of text, images, audio, video etc. Qualitative data is collected through questionnaires, interviews, observations etc. This data talks about the perception of people. To understand the market research and customers’ tastes, this data helps. Usually listed as a name, category, description etc. We can divide Qualitative data into Nominal data and Ordinal data.
Nominal data: A type of qualitative data that categorized without a set of order that does not have a sequence. Nominal data cannot be ordered or measured. This data is the foundation of statistical analysis. The mode is the only measure of central tendency for Nominal data.
Examples:
First time customer, new job applicant
New reduced-price listing
Which state do you live in?
Ordinal data: Ordinal data variables represent categories with a distinct and meaningful order. The categories have a natural order or rank based on some hierarchical scale. This data classified “in -between” qualitative and quantitative data. Numbers can be assigned but we cannot perform arithmetic operations on Ordinal data.
Examples:
Movie rating (numbers of star)
Rank in Race(1st, 2nd, 3rd)
Income level (low income, middle income, high income)
Key difference between Qualitative data and Quantitative data
Qualitative data | Quantitative data | |
Focus | Formulating hypotheses | Testing hypotheses |
Analysis | Categorizing/non-statistical analysis | More straightforward analysis based on math and statistical models |
Questions format | Open – ended | Multiple choice or close-ended |
Sample | Small sample size | Big sample size |
Data format | Textual | Numerical |
Internal Data vs External Data
Internal data: Internal data that lives inside a company’s own systems. This data type is usually produced by the company’s department like sales, finance, human resources to help to get better insights for the organization. It is more reliable and easier to collect.
Example:
Sales data by store location
Wages of employees across different business units tracked by HR
Forecasting future sales
External data: External data that lives outside of a company or organization. This data is collected from external sources, like customers, partners, competitors etc. It allows us to see how the company fits into the global market.
Example:
National average wages for the various positions throughout your organization
Credit reports for customers of an auto dealership.
Market research reports
Primary Data vs Secondary Data
Primary data: Primary data is the information that is collected directly from the data source without going through any existing source. It is mostly collected by a researcher for a research project from first -hand sources. This kind of data is usually up to date, reliable and authentic because it collects in real time and does not collect from old sources.
Examples:
Data from an interview you conducted,
Data from a survey returned from 20 participants,
Data from questionnaires you got back from a group of workers.
Primary data is not collected usually as implementation cost is very high and collecting new data is time consuming. That is the reason Secondary data is more popular than Primary data.
Secondary data: Secondary data is data that has been collected by someone else in the past for a different purpose. They were usually primary data but became secondary when used by a third party. This data can come from various sources like books, articles or even research studies.
Example:
Demographic data collected by a university.
Census data gathered by the federal government.
Online magazines, Press releases.
Secondary data is very affordable compared to Primary data as it is available on many different platforms that can be accessed by anyone easily with no cost, but it is not that reliable and authentic.
Structured Data vs Unstructured Data: All data are not created equal, some are structured, but most of them are unstructured. Both data are sourced, collected, and scaled in different ways and each one resides in a different type of database.
Structured data: Structured data is categorized as quantitative data. It is organized in a certain format, such as rows and columns. This data is typically stored in a relational database (RDBMS). The programming language SQL is being used to organize and manage structured data. For example, when we rate our favorite brand online, we create structured data. Dates, names, addresses, credit card numbers, all are examples of structured data.
Unstructured data: unstructured data is not organized in any easy to identify way. It may have a native, internal structure but not structured in a predefined way. The data is stored in its native format but there is no data model. For example, when we use Google Earth to check out a satellite image of a restaurant location, we are using unstructured data. Like text, social media activity, video files, audio files, satellite imagery, PDF files all qualify as types of unstructured data.
Key differences between structured data and unstructured data: Here are the five major differences between structured vs unstructured data.
| Structured data | Unstructured data |
Defined | Follow a predefined model or schema | Unorganized or in raw form |
Types | Quantitative data | Qualitative data |
Forms | It consists of numbers and values | Consists of sensors, text files, audio and video files |
Analysis | Easily analyzed using traditional statistics methods and data mining technique | Required advanced techniques like NLP and ML algorithms |
Storage | Tabular form like excel sheet, SQL database, need less storage space | Stored as media file or no SQL database, data lakes, need more space |
Uses | Business intelligence, data analytics, financial reporting | Sentiment analysis, social media monitoring, text mining |
In today's world, having a basic understanding of data types and their uses is crucial. We have discussed various data types and their differences, from qualitative vs quantitative data to primary vs secondary data in brief. Hope you can recognize and apply these differences to unlock new insights regarding your data.
Comments