process data from dirty to clean weekly challenge 1 answers
1. Fill in the blank: If a data analyst is using data that has been _____, the data will lack integrity and the analysis will be faulty.
- wide
- compromised
- public
- clean
2. A financial analyst imports a dataset to their computer from a storage device. As it’s being imported, the connection is interrupted, which compromises the data. Which of the following processes caused the compromise?
- Data analysis
- Data gathering
- Data manipulation
- Data transfer
3. A data analyst is given a dataset for analysis. It includes data about the total population of every country in the previous 20 years. Based on the available data, an analyst would be able to determine the reasons behind a certain country's population increase from 2016 to 2017.
- True
- False
4. A data analyst is given a dataset for analysis. To use the template for this dataset, click the link below and select “Use Template.”
Link to template: June 2014 Invoices
OR
If you don’t have a Google account, download the CSV file directly from the attachment below.
Which of the following has duplicate data?
- Data for Symteco on 2/21/2014
- Data for Symteco on 5/20/2014
- Data for Valando on 2/18/2014
- Data for Valando on 1/1/2014
5. A data analyst at a nonprofit organization is working with a dataset about a summer fundraiser. Although they have a lot of useful data by the end of the month, they recognize that the data is insufficient. So, they decide to wait until the end of the season to begin working with the dataset. Which type of insufficient data does this example describe?
- Outdated data
- Data from only one source
- Geographically limited data
- Data that keeps updating
6. When gathering data through a survey, companies can save money by surveying 100% of a population.
- True
- False
7.Fill in the blank: Sampling bias in data collection happens when a sample isn’t representative of _____.
- the population as a whole
- a dataset about the population
- a subset of the population
- the population most affected by the data
8. Data and business objectives might not align for a number of reasons. Which of the following issues can prevent alignment? Select all that apply.
- Sampling bias
- Data integrity
- Data visualization
- Insufficient data
9. Which of the following conditions are necessary to ensure data integrity? Select all that apply.
- Privacy
- Completeness
- Statistical power
- Accuracy
10. What is one potential problem associated with data manipulation that analysts must be aware of?
- Data manipulation can separate a dataset among different locations.
- Data manipulation can help organize a dataset.
- Data manipulation can introduce errors.
- Data manipulation can make a dataset easier to read.
11. As a data analyst, you are working for a national pizza restaurant chain. You have a dataset with monthly order totals for each branch over the past year. With only this data, what questions can you answer?
- Which region had the highest sales over the last two years?
- Which branch will be the most profitable over the next year?
- What was the most popular item on the menu?
- Which branch had the most orders in the last month of last year?
12. A data analyst is given a dataset for analysis. To use the template for this dataset, click the link below and select “Use Template.”
Link to template: June 2014 Invoices
OR
If you don’t have a Google account, download the CSV file directly from the attachment below.
June 2014 Invoices - Sheet1
The data analyst is asked to find the average estimate for Symteco over the past three years. What limitation of the data makes this impossible?
- The data uses the wrong currency.
- The data is all from a single year.
- The data does not include Symteco.
- The data does not include estimates.
13. A data analyst at a software company wants to learn more about industry competitors. Because the software industry has more mergers than any other field, the companies and their products are constantly evolving. The analyst has a dataset from three years ago, and they notice that many of the companies and products in the dataset have changed. What makes the analyst decide that the data is insufficient, so they should generate fresh data instead?
- It is outdated data.
- It is geographically limited data.
- It is data that keeps updating.
- It is data from only one source.
14. A restaurant gathers data about a new dish by providing free samples to parties of six or more diners. What does this scenario describe?
- Random sampling
- Unbiased sampling
- Geographically limited sampling
- Sampling bias
15. Which of the following processes helps ensure a close alignment of data and business objectives?
- Completing data replication
- Transferring data multiple times
- Maintaining data integrity
- Having data update automatically during analysis
16. What can jeopardize data integrity throughout its lifecycle? Select all that apply.
- Insufficient data
- Human error
- Malware
- System failures
17. A healthcare company keeps copies of their data at several locations across the country. The data becomes compromised because each location creates a copy of the original at different times of day. Which of the following processes caused the compromise?
- Data gathering
- Data manipulation
- Data transfer
- Data replication
18. A data analyst is given a dataset for analysis. It includes data about the total population of every country in the previous 20 years. Which of the following questions would the analyst need more data to address?
- Which country had the smallest population in 2017?
- Which country had the greatest population in 2015?
- What was the reason for the population increase in a certain country?
- What was the population of a certain country in 2020?
19. A data analyst is given a dataset for analysis. To use the template for this dataset, click the link below and select “Use Template.”
Link to template: June 2014 Invoices
OR
If you don’t have a Google account, download the CSV file directly from the attachment below.
June 2014 Invoices - Sheet1
Which of the following are limitations of this dataset?
- Identifying the most profitable clients between January and November of 2014
- Identifying the least profitable clients between January and November of 2014
- Identifying the worst paying client between March and December of 2014
- Identifying the best paying client between January and November of 2014
20. A car manufacturer wants to learn more about the brand preferences of electric car owners. There are millions of electric car owners in the world. Who should the company survey?
- A sample of all electric car owners
- The entire population of electric car owners
- A sample of car owners who have owned more than one electric car
- A sample of car owners who most recently bought an electric car
21. A candy manufacturer finds an even distribution of sales across all age ranges of customers who purchase their products. The manufacturer decides to conduct a survey to learn more about its customer base. Due to age requirements, they can only send the survey to customers who are 21 years or older. This scenario can be described as what?
- Down sampling bias
- Sampling bias
- Unbiased sampling
- Upsampling bias
22. What best describes a sample size?
- A subset of the population between the 25th and 50th percentile
- A random subset of the population
- A subset that is representative of the population as a whole
- A subset of the population excluding outliers
23. Fill in the blank: In order to have a strong and thorough analysis, a data analyst must verify _____.
- data replication
- data manipulation
- data engineering
- data integrity
24. Fill in the blank: _____ is the process of changing data to make it more organized and easier to read.
- Data transfer
- Data manipulation
- Data gathering
- Data replication
25. You are working for a global technology company. You have a dataset with the company’s total cell phone sales by country from 2015 to present. Based on the data you have, what questions are you able to answer?
- What was the effect on sales when a new phone model was launched?
- What was the effect on sales when new phone features were introduced?
- What countries have the most cell phone sales in the past three years?
- What are the mean cell phone sales for each country since 2010?
26. A data analyst, working for a publishing company, gathers a dataset which includes all books sold in the United Kingdom over the last three years. However, they decide to generate new data that represents global book sales. What type of insufficient data does this scenario describe?
- Data that keeps updating
- Data that is outdated
- Data that is geographically limited
- Data from only one source
27. A company is trying to learn more about their customer base. They would like to conduct a survey to understand why their customers chose their brand. How should the company survey its customers?
- Conduct a survey of customers who purchased a different brand
- Conduct a survey of customers that live in high-income areas
- Conduct a survey with a representative sample of their customer population
- Conduct a survey with customers who have purchased more than five products
28. Sometimes during analysis, an analyst discovers that it’s necessary to adjust the business objective. When this happens, the analyst should take the initiative to do so without involving others in order to be respectful of their time.
- True
- False
29. A car dealership gathers data about their entire customer population. They decide to conduct a survey to understand why their customers chose their dealership. They send out an email to all customers who have purchased more than two vehicles in the past five years. What does this scenario describe?
- Unbiased sampling
- Geographically limited sampling
- Random sampling
- Sampling bias
30. A data analyst needs to migrate data from a server located at their company's headquarters to a remote site. This can lead to what type of data integrity issue?
- Data replication
- Data cleaning
- Data transfer
- Data manipulation
31. As a data analyst, you work with data about the life expectancy of sea turtles in the Coral Triangle. The dataset contains an estimated birthdate and deathdate for all tracked sea turtles. With the data you have, what questions are you able to answer?
- What is the median age a sea turtle has lived in the Coral Triangle?
- Where is the most prevalent location sea turtles are being hatched in the Coral Triangle?
- What is the largest sea turtle ever recorded?
- Is the sea turtle population increasing throughout the world?
32. A clothing manufacturer wants to learn more about why their consumers have purchased the brand’s products. How should this manufacturer conduct their survey?
- Send the survey to a representative sample of their customers
- Send the survey to customers who have purchased more than one product
- Send the survey to their least frequent customers
- Send the survey to random people who buy clothes
33. A data analyst wants to predict the production output of a factory using a dataset that covers the years 2020 to 2021. In 2022, the factory implemented major labor and facility changes. What limitation of the data means that the analyst needs to get new data?
- The data keeps updating.
- The data is outdated.
- The data is geographically limited.
- The data is from only one source.
34. In the data analysis process, how does a sample relate to a population?
- A sample is a duplicate selection of data that is taken from the population.
- A sample is an ideal example taken from a population.
- A sample is a part of a population that is representative of the population.
- A sample is an average of all the data that represents the population.
35. Fill in the blank: Data _____ refers to the accuracy, completeness, consistency, and trustworthiness of data throughout its life cycle.
- sampling
- integrity
- analysis
- replication
36. A data analyst is given a dataset for analysis. It includes data about the total population of every country in the previous 20 years. Which of the following questions can the analyst use this dataset to address? Select all that apply.
- What was the average population of a certain country from 2015 through 2020?
- What was the difference in population between two specific countries in 2018?
- What was the effect of migration on the population of a certain country?
- What was the reason for the population increase in a certain country?
37. A high school principal is estimating the total number of students that will attend an upcoming event. She assumes that the older students are unlikely to attend and decides to only survey the first-year students. What issue will the principal face when calculating her estimation?
- The sample is too small.
- The sample should be the older students.
- The sample exhibits sampling randomness.
- The sample exhibits sampling bias.