Module 1: The importance of integrity

Looking answers for ‘process data from dirty to clean module 1 challenge’?

In this post, I provide accurate answers and detailed explanations for Module 1: The importance of integrity of Course 4: Process Data from Dirty to Clean Google Data Analytics Professional Certificate.

Whether you’re preparing for quizzes or brushing up on your knowledge, these insights will help you master the concepts effectively. Let’s dive into the correct answers and detailed explanations for each question.

Test your knowledge on data integrity and analytics objectives

Practice Quiz

1. Which of the following principles are key elements of data integrity? Select all that apply.

  • Trustworthiness ✅
  • Accuracy ✅
  • Consistency ✅
  • Selectivity

Explanation:
Data integrity refers to maintaining and ensuring the accuracy, consistency, and trustworthiness of data throughout its lifecycle. These principles ensure that data is reliable and usable for analysis. “Selectivity” is not a core principle of data integrity.

2. Which process do data analysts use to make data more organized and easier to read?

  • Data manipulation ✅
  • Data uniformity
  • Data transfer
  • Data replication

Explanation:
Data manipulation involves organizing, transforming, and structuring data to make it more comprehensible and suitable for analysis. Processes like sorting, filtering, and cleaning fall under this category. The other options (data uniformity, transfer, and replication) do not directly address making data easier to read.

3. Before analysis, a company collects data from countries that use different date formats. Which of the following updates would improve the data integrity?

  • Change all of the dates to the same format ✅
  • Organize the data by country
  • Remove data in an unfamiliar date format
  • Leave the dates in their current formats

Explanation:
Standardizing date formats ensures consistency and accuracy in the dataset, which is critical for reliable analysis. Leaving dates in their current format or removing data would compromise integrity, and organizing data by country does not address the format inconsistency issue.

Test your knowledge on insufficient data

Practice Quiz

4. What should an analyst do if they do not have the data needed to meet a business objective? Select all that apply.

  • Create and use hypothetical data that aligns with analysis predictions.
  • Gather related data on a small scale and request additional time to find more complete data. ✅
  • Continue with the analysis using data from less reliable sources.
  • Perform the analysis by finding and using proxy data from other datasets. ✅

Explanation:
If the required data isn’t available, analysts should either gather related data on a small scale while requesting more time or use proxy data, which can serve as a substitute for missing information. Creating hypothetical data compromises the integrity of the analysis, and using data from unreliable sources introduces inaccuracies and bias.

5. Which of the following are limitations that might lead to insufficient data? Select all that apply.

  • Outdated data ✅ 
  • Duplicate data
  • Data that updates continually ✅
  • Data from a single source ✅

Explanation:

  • Outdated data: Data that is no longer current may not reflect the present situation or trends, making it less useful for analysis.
  • Data that updates continually: While it provides real-time insights, continually updating data can be challenging to manage and may result in incomplete datasets if not captured properly.
  • Data from a single source: Relying on one source of data limits diversity and comprehensiveness, which can lead to incomplete or biased analysis.

6. A data analyst wants to find out how many people in Utah have swimming pools. It’s unlikely that they can survey every Utah resident. Instead, they survey enough people to be representative of the population. This describes what data analytics concept?

  • Statistical significance
  • Confidence level
  • Margin of error
  • Sample ✅

Explanation:
A sample is a subset of a population that is surveyed or analyzed to infer information about the entire population. This approach allows analysts to gather insights without needing to collect data from every individual, making it a practical and efficient method in data analytics.

Test your knowledge on testing your data

Practice Quiz

7. A research team runs an experiment to determine if a new security system is more effective than the previous version. What type of results are required for the experiment to be statistically significant?

  • Results that are real and not caused by random chance ✅
  • Results that are unlikely to occur again
  • Results that are hypothetical and in need of more testing
  • Results that are inaccurate and should be ignored

Explanation:
Statistical significance indicates that the results of an experiment are unlikely to have occurred due to random variation or chance. It ensures that the observed effect or difference is genuine and meaningful for decision-making.

8. In order to have a high confidence level in a customer survey, what should the sample size accurately reflect?

  • The trends from other customer surveys
  • The entire population ✅
  • The most valuable members of the population
  • The predictions of stakeholders

Explanation:
A sample size must accurately reflect the entire population to ensure the survey results are representative. This minimizes bias and increases the reliability of conclusions drawn from the data.

9. A data analyst determines an appropriate sample size for a survey. They can check their work by making sure the confidence level percentage plus the margin of error percentage add up to 100%.

  • True
  • False ✅

Explanation:
The confidence level and margin of error are independent measures. The confidence level indicates the probability that the true population parameter lies within the margin of error, while the margin of error defines the range of values. They do not sum up to 100%; instead, they complement each other in interpreting survey accuracy.

Test your knowledge on margin of error

Practice Quiz

10. Fill in the blank: Margin of error is the _____ amount that the sample results are expected to differ from those of the actual population.

  • minimum
  • median
  • maximum ✅
  • average

Explanation:
Margin of error represents the maximum expected difference between the observed results in a sample and the true population parameter. It accounts for variability and ensures that the actual value is within this range with a specified confidence level.

11. In a survey about a new cleaning product, 75% of respondents report they would buy the product again. The margin of error for the survey is 5%. Based on the margin of error, what percentage range reflects the population's true response?

  • Between 73% and 78%
  • Between 70% and 80% ✅
  • Between 70% and 75%
  • Between 75% and 80%

Explanation:
The margin of error (5%) is added to and subtracted from the survey result (75%) to calculate the range:

  • Lower Bound: 75%−5%=70%75\% – 5\% = 70\%
  • Upper Bound: 75%+5%=80%75\% + 5\% = 80\%

Thus, the true response of the population is expected to fall within 70% to 80% with the specified confidence level.

Module 1 challenge

Graded Quiz

12. Fill in the blank: If a data analyst is using data that has been _____, the data will lack integrity and the analysis will be faulty.

  • wide
  • compromised ✅
  • public
  • clean

13. A financial analyst imports a dataset to their computer from a storage device. As it’s being imported, the connection is interrupted, which compromises the data. Which of the following processes caused the compromise?

  • Data analysis
  • Data gathering
  • Data manipulation
  • Data transfer ✅

14. A data analyst is given a dataset for analysis. It includes data about the total population of every country in the previous 20 years. Based on the available data, an analyst would be able to determine the reasons behind a certain country's population increase from 2016 to 2017.

  • True
  • False ✅

15. A data analyst is given a dataset for analysis. To use the template for this dataset, click the link below and select “Use Template.”

Link to template: June 2014 Invoices

OR

If you don’t have a Google account, download the CSV file directly from the attachment below.

Which of the following has duplicate data?

  • Data for Symteco on 2/21/2014
  • Data for Symteco on 5/20/2014
  • Data for Valando on 2/18/2014 ✅
  • Data for Valando on 1/1/2014

16. A data analyst at a nonprofit organization is working with a dataset about a summer fundraiser. Although they have a lot of useful data by the end of the month, they recognize that the data is insufficient. So, they decide to wait until the end of the season to begin working with the dataset. Which type of insufficient data does this example describe?

  • Outdated data
  • Data from only one source
  • Geographically limited data
  • Data that keeps updating ✅

17. When gathering data through a survey, companies can save money by surveying 100% of a population.

  • True
  • False ✅

18.Fill in the blank: Sampling bias in data collection happens when a sample isn’t representative of _____.

  • the population as a whole ✅
  • a dataset about the population
  • a subset of the population
  • the population most affected by the data

19. Data and business objectives might not align for a number of reasons. Which of the following issues can prevent alignment? Select all that apply.

  • Sampling bias ✅
  • Data integrity
  • Data visualization
  • Insufficient data ✅

20. Which of the following conditions are necessary to ensure data integrity? Select all that apply.

  • Privacy
  • Completeness ✅
  • Statistical power
  • Accuracy ✅

21. What is one potential problem associated with data manipulation that analysts must be aware of?

  • Data manipulation can separate a dataset among different locations.
  • Data manipulation can help organize a dataset.
  • Data manipulation can introduce errors. ✅
  • Data manipulation can make a dataset easier to read.

22. As a data analyst, you are working for a national pizza restaurant chain. You have a dataset with monthly order totals for each branch over the past year. With only this data, what questions can you answer?

  • Which region had the highest sales over the last two years?
  • Which branch will be the most profitable over the next year?
  • What was the most popular item on the menu?
  • Which branch had the most orders in the last month of last year? ✅

23. A data analyst is given a dataset for analysis. To use the template for this dataset, click the link below and select “Use Template.”

Link to template: June 2014 Invoices

OR

If you don’t have a Google account, download the CSV file directly from the attachment below.

June 2014 Invoices - Sheet1

The data analyst is asked to find the average estimate for Symteco over the past three years. What limitation of the data makes this impossible?

  • The data uses the wrong currency.
  • The data is all from a single year. ✅
  • The data does not include Symteco.
  • The data does not include estimates.

24. A data analyst at a software company wants to learn more about industry competitors. Because the software industry has more mergers than any other field, the companies and their products are constantly evolving. The analyst has a dataset from three years ago, and they notice that many of the companies and products in the dataset have changed. What makes the analyst decide that the data is insufficient, so they should generate fresh data instead?

  • It is outdated data. ✅
  • It is geographically limited data.
  • It is data that keeps updating.
  • It is data from only one source.

Explanation:
Data that is outdated no longer reflects current conditions, especially in a fast-changing industry like software. Fresh data is required to ensure relevance and accuracy.

25. A restaurant gathers data about a new dish by providing free samples to parties of six or more diners. What does this scenario describe?

  • Random sampling
  • Unbiased sampling
  • Geographically limited sampling
  • Sampling bias ✅

26. Which of the following processes helps ensure a close alignment of data and business objectives?

  • Completing data replication
  • Transferring data multiple times
  • Maintaining data integrity ✅
  • Having data update automatically during analysis

27. What can jeopardize data integrity throughout its lifecycle? Select all that apply.

  • Insufficient data
  • Human error ✅
  • Malware ✅
  • System failures ✅

28. A healthcare company keeps copies of their data at several locations across the country. The data becomes compromised because each location creates a copy of the original at different times of day. Which of the following processes caused the compromise?

  • Data gathering
  • Data manipulation
  • Data transfer
  • Data replication ✅

29. A data analyst is given a dataset for analysis. It includes data about the total population of every country in the previous 20 years. Which of the following questions would the analyst need more data to address?

  • Which country had the smallest population in 2017?
  • Which country had the greatest population in 2015?
  • What was the reason for the population increase in a certain country? ✅
  • What was the population of a certain country in 2020?

30. A data analyst is given a dataset for analysis. To use the template for this dataset, click the link below and select “Use Template.”

Link to template: June 2014 Invoices

OR

If you don’t have a Google account, download the CSV file directly from the attachment below.

June 2014 Invoices - Sheet1

Which of the following are limitations of this dataset?

  • Identifying the most profitable clients between January and November of 2014 ✅
  • Identifying the least profitable clients between January and November of 2014 ✅
  • Identifying the worst paying client between March and December of 2014 ✅
  • Identifying the best paying client between January and November of 2014

Explanation:
The dataset is limited because it only covers June 2014 invoices, making it insufficient for analyzing clients’ profitability or payments across a broader time frame.

31. A car manufacturer wants to learn more about the brand preferences of electric car owners. There are millions of electric car owners in the world. Who should the company survey?

  • A sample of all electric car owners ✅
  • The entire population of electric car owners
  • A sample of car owners who have owned more than one electric car
  • A sample of car owners who most recently bought an electric car

Explanation:
A representative sample of all electric car owners ensures the survey captures diverse brand preferences across the entire population.

32. A candy manufacturer finds an even distribution of sales across all age ranges of customers who purchase their products. The manufacturer decides to conduct a survey to learn more about its customer base. Due to age requirements, they can only send the survey to customers who are 21 years or older. This scenario can be described as what?

  • Down sampling bias
  • Sampling bias ✅
  • Unbiased sampling
  • Upsampling bias

33. What best describes a sample size?

  • A subset of the population between the 25th and 50th percentile
  • A random subset of the population
  • A subset that is representative of the population as a whole ✅
  • A subset of the population excluding outliers

Explanation:
A representative sample accurately reflects the population’s characteristics, ensuring reliable analysis and conclusions.

34. Fill in the blank: In order to have a strong and thorough analysis, a data analyst must verify _____.

  • data replication
  • data manipulation
  • data engineering
  • data integrity ✅

35. Fill in the blank: _____ is the process of changing data to make it more organized and easier to read.

  • Data transfer
  • Data manipulation ✅
  • Data gathering
  • Data replication

36. You are working for a global technology company. You have a dataset with the company’s total cell phone sales by country from 2015 to present. Based on the data you have, what questions are you able to answer?

  • What was the effect on sales when a new phone model was launched?
  • What was the effect on sales when new phone features were introduced?
  • What countries have the most cell phone sales in the past three years? ✅
  • What are the mean cell phone sales for each country since 2010?

37. A data analyst, working for a publishing company, gathers a dataset which includes all books sold in the United Kingdom over the last three years. However, they decide to generate new data that represents global book sales. What type of insufficient data does this scenario describe?

  • Data that keeps updating
  • Data that is outdated
  • Data that is geographically limited ✅
  • Data from only one source

38. A company is trying to learn more about their customer base. They would like to conduct a survey to understand why their customers chose their brand. How should the company survey its customers?

  • Conduct a survey of customers who purchased a different brand
  • Conduct a survey of customers that live in high-income areas
  • Conduct a survey with a representative sample of their customer population ✅
  • Conduct a survey with customers who have purchased more than five products

39. Sometimes during analysis, an analyst discovers that it’s necessary to adjust the business objective. When this happens, the analyst should take the initiative to do so without involving others in order to be respectful of their time.

  • True
  • False ✅

40. A car dealership gathers data about their entire customer population. They decide to conduct a survey to understand why their customers chose their dealership. They send out an email to all customers who have purchased more than two vehicles in the past five years. What does this scenario describe?

  • Unbiased sampling
  • Geographically limited sampling
  • Random sampling
  • Sampling bias ✅

41. A data analyst needs to migrate data from a server located at their company's headquarters to a remote site. This can lead to what type of data integrity issue?

  • Data replication ✅
  • Data cleaning
  • Data transfer ✅
  • Data manipulation

Explanation:

  • Data transfer: Errors during transfer can cause data loss or corruption.
  • Data replication: Issues like duplicate or inconsistent records may arise when replicating data between systems.

42. As a data analyst, you work with data about the life expectancy of sea turtles in the Coral Triangle. The dataset contains an estimated birthdate and deathdate for all tracked sea turtles. With the data you have, what questions are you able to answer?

  • What is the median age a sea turtle has lived in the Coral Triangle? ✅
  • Where is the most prevalent location sea turtles are being hatched in the Coral Triangle?
  • What is the largest sea turtle ever recorded?
  • Is the sea turtle population increasing throughout the world?

43. A clothing manufacturer wants to learn more about why their consumers have purchased the brand’s products. How should this manufacturer conduct their survey?

  • Send the survey to a representative sample of their customers ✅
  • Send the survey to customers who have purchased more than one product
  • Send the survey to their least frequent customers
  • Send the survey to random people who buy clothes

44. A data analyst wants to predict the production output of a factory using a dataset that covers the years 2020 to 2021. In 2022, the factory implemented major labor and facility changes. What limitation of the data means that the analyst needs to get new data?

  • The data keeps updating.
  • The data is outdated. ✅
  • The data is geographically limited.
  • The data is from only one source.

45. In the data analysis process, how does a sample relate to a population?

  • A sample is a duplicate selection of data that is taken from the population.
  • A sample is an ideal example taken from a population.
  • A sample is a part of a population that is representative of the population. ✅
  • A sample is an average of all the data that represents the population.

46. Fill in the blank: Data _____ refers to the accuracy, completeness, consistency, and trustworthiness of data throughout its life cycle.

  • sampling
  • integrity ✅
  • analysis
  • replication

Explanation:
Data integrity ensures the data’s reliability and accuracy from collection to usage. It encompasses its completeness, consistency, and trustworthiness throughout its lifecycle.

47. A data analyst is given a dataset for analysis. It includes data about the total population of every country in the previous 20 years. Which of the following questions can the analyst use this dataset to address? Select all that apply.

  • What was the average population of a certain country from 2015 through 2020? ✅
  • What was the difference in population between two specific countries in 2018? ✅
  • What was the effect of migration on the population of a certain country?
  • What was the reason for the population increase in a certain country?

Explanation:
This dataset contains population data, making it suitable for calculating averages or differences in specific years. However, questions about migration effects or reasons for population changes require additional context beyond raw population data.

48. A high school principal is estimating the total number of students that will attend an upcoming event. She assumes that the older students are unlikely to attend and decides to only survey the first-year students. What issue will the principal face when calculating her estimation?

  • The sample is too small.
  • The sample should be the older students.
  • The sample exhibits sampling randomness.
  • The sample exhibits sampling bias. ✅

Explanation:
Surveying only first-year students introduces sampling bias, as it excludes older students who may also attend the event. A representative sample is necessary for accurate estimations.

Leave a Reply