Module 2: Clean data for more accurate insights
Looking answers for ‘process data from dirty to clean module 2 challenge’?
In this post, I provide accurate answers and detailed explanations for Module 2: Clean data for more accurate insights of Course 4: Process Data from Dirty to Clean – Google Data Analytics Professional Certificate.
Whether you’re preparing for quizzes or brushing up on your knowledge, these insights will help you master the concepts effectively. Let’s dive into the correct answers and detailed explanations for each question.
Test your knowledge on clean versus dirty data
Practice Quiz
1. Describe the difference between a null and a zero in a dataset.
- A null represents a number with no significance. A zero represents the number zero.
- A null indicates that a value does not exist. A zero is a numerical response. ✅
- A null represents a value of zero. A zero represents an empty cell.
- A null signifies invalid data. A zero is missing data.
Explanation:
A null means the data is missing or undefined, while a zero represents a valid numerical value indicating “nothing” or “none” in a specific context.
2. What are the most common processes and procedures handled by data engineers? Select all that apply.
- Verifying results of data analysis
- Developing, maintaining, and testing databases and related systems ✅
- Transforming data into a useful format for analysis ✅
- Giving data a reliable infrastructure ✅
Explanation:
Data engineers focus on:
- Database management: Developing and maintaining databases.
- Data transformation: Preparing raw data for analysis.
- Infrastructure reliability: Ensuring systems can handle data storage and processing efficiently.
Verifying analysis results is typically a data analyst’s role, not a data engineer’s.
3. What are the most common processes and procedures handled by data warehousing specialists? Select all that apply.
- Ensuring data is properly cleaned
- Ensuring data is backed up to prevent loss ✅
- Ensuring data is secure ✅
- Ensuring data is available ✅
Explanation:
Data warehousing specialists focus on:
- Data backups: Protecting against data loss.
- Data security: Ensuring only authorized access.
- Data availability: Making data accessible for reporting and analysis.
Cleaning data is typically the responsibility of data analysts or engineers.
4. A data analyst is cleaning a dataset. They want to confirm that users entered five-digit zip codes correctly by checking the data in a certain spreadsheet column. What would be most helpful as the next step?
- Using the field length tool to specify the number of characters in each cell in the column ✅
- Formatting the cells in the column as number
- Changing the column width to fit only five digits
- Using the MAX function to determine the maximum value in the cells in the column
Explanation:
The field length tool checks whether each zip code entry contains exactly five characters, ensuring accuracy.
- Formatting cells as numbers or changing column width doesn’t validate input.
- The MAX function is irrelevant for confirming zip code lengths.
Test your knowledge on data-cleaning techniques
Practice Quiz
5. Fill in the blank: Every database has its own formatting, which can cause the data to seem inconsistent. Data analysts use the _____ tool to create a clean and consistent visual appearance for their spreadsheets.
- spellcheck
- conditional formatting
- clear formats ✅
- autocorrect
Explanation:
The clear formats tool is used to remove all formatting from cells, ensuring that the data is displayed in a clean and consistent manner. This includes removing fonts, colors, borders, and other formatting styles, which can sometimes cause the data to appear inconsistent across a spreadsheet.
6. What is the process of combining two or more datasets into a single dataset?
- Data validation
- Data transferring
- Data composition
- Data merging ✅
Explanation:
Data merging is the process of combining datasets into a unified structure, enabling better analysis and integration. Other options like data validation or transferring are distinct tasks unrelated to merging.
7. Fill in the blank: In data analytics, _____ describes how well two or more datasets are able to work together.
- alignment
- compatibility ✅
- suitability
- agreement
Explanation:
Compatibility refers to the ability of datasets to align and work together effectively, such as matching formats, structures, or content for seamless integration and analysis.
Test your knowledge on cleaning data in spreadsheets
Practice Quiz
8. Describe the relationship between a text string and a substring.
- A text string is a group of characters within a cell. A substring is a smaller subset of that text string. ✅
- A text string is a row of data within a table. A substring is one cell within that row.
- A text string is the list of attributes at the top of columns within a table. A substring is a single attribute within that list.
- A text string is a column of data within a table. A substring is one cell within that column.
Explanation:
A text string refers to a sequence of characters, such as letters, numbers, or symbols, that are treated as text in a cell. A substring is a part or smaller portion of that larger text string. For example, in the string “Data Analysis,” the substring “Data” is a smaller part of the whole string.
9. A data analyst uses the COUNTIF function to count the number of times a value less than 5 occurs between spreadsheet cells A2 through A100. What is the correct syntax?
- =COUNTIF(A2:A100,>5)
- =COUNTIF(A2:A100,”>5″)
- =COUNTIF(A2:A100,<5)
- =COUNTIF(A2:A100,”<5″) ✅
Explanation:
The COUNTIF function requires a range and a condition. The condition must be enclosed in quotation marks when using comparison operators like <
, >
, or =
. The correct syntax ensures that the function counts cells where the value is less than 5.
10. Fill in the blank: To remove leading, trailing, and repeated spaces in data, analysts use the ____ function.
- LEFT
- TRIM ✅
- MID
- RIGHT
Explanation:
The TRIM function removes all extra spaces from text, leaving only single spaces between words. It is particularly useful for cleaning data with unnecessary spaces that can cause errors or inconsistencies in analysis.
Module 2 challenge
Practice Quiz
11. As part of the data-cleaning process, a data analyst creates a rule to highlight any empty cells in a bright blue color. This is an example of data visualization.
- True
- False ✅
Explanation:
Highlighting empty cells visually represents missing data, which is a simple form of data visualization. It helps analysts quickly identify gaps in the dataset.
12. A data analyst at a nonprofit organization is working with the following spreadsheet, which contains member name data in column C. They want to divide this data using the underscore as a delimiter, so that first names are stored in one column and last names in another. Which tool should the analyst use?
- Conditional formatting
- Pivot table
- SPLIT function ✅
- MID function
Explanation:
The SPLIT function divides text into separate components based on a specified delimiter, such as an underscore (_
). This allows analysts to separate first and last names into different columns.
13. Fill in the blank: When describing a SUM function, the _____ is =SUM(value 1 through value 2).
- syntax ✅
- standard
- structure
- script
14. You are working with the following selection of a spreadsheet:
In order to extract the five-digit postal code from Burlington, MA, what is the correct function?
- =RIGHT(B3,5) ✅
- =RIGHT(5,B3)
- =LEFT(5,B3)
- =LEFT(B3,5)
Explanation:
The RIGHT function extracts a specified number of characters starting from the right side of a text string. Here, it retrieves the last 5 characters of cell B3
.
15. A data analyst in a human resources department is working with the following selection of a spreadsheet:
They want to create employee identification numbers (IDs) in column D. The IDs should include the year hired plus the last four digits of the employee’s Social Security Number (SS#). What function will create the ID 20093208 for the employee in row 5?
- =CONCATENATE(A5!B5)
- =CONCATENATE(A5*B5)
- =CONCATENATE(A5+B5)
- =CONCATENATE(A5,B5) ✅
16. A data analyst at an e-commerce company is working with a spreadsheet containing last month's sales. The most expensive product their company sells costs $49.99, so they want to quickly confirm that all of the data in the Sales column is $49.99 or less. What function can they use?
- SUMIF
- COUNTIF ✅
- COUNT
- SUM
17. A data analyst wants to search for a certain value in a column, then return a corresponding piece of information. Which function should they use?
- VALUE
- VLOOKUP ✅
- MATCH
- FIND
18. A data analyst needs to combine two datasets. Each dataset comes from a different system, and the systems store data in different ways. What can the data analyst do to ensure the data is compatible?
- Use a data visualization
- Map the data ✅
- Apply a data structure
- Merge the data
19. In their spreadsheet, a data analyst makes cells stand out for more efficient analysis. What spreadsheet tool is used to do this?
- Cell filtering
- Conditional ranking
- Conditional formatting ✅
- Cell querying
20. A data analyst uses the SPLIT function to divide a text string around a specified character and put each fragment into a new, separate cell. What is the specified character separating each item called?
- Unit
- Delimiter ✅
- Partition
- Substring
21. A data analyst is using a function in a spreadsheet. For the function to work correctly, they follow the function’s syntax. What does this entail?
- It is the function’s name and placement.
- It is how the function can be used in a program.
- It is the function’s required information and its proper placement. ✅
- It is the purpose of the function and its use.
Explanation:
A function’s syntax specifies how to structure the function, including the order and type of arguments needed for the function to work correctly.
22. In a spreadsheet, what is the correct function for extracting the first two characters of the string located in cell A7?
- =LEFT(A7,2) ✅
- =LEFT(2,A7)
- =RIGHT(A7,2)
- =RIGHT(2,A7)
23. Fill in the blank: In a spreadsheet, the function VLOOKUP is used to _____ information in a column based on a specified data value.
- return ✅
- replace
- transform
- delete
24. What describes syntax?
- It is the function’s required information and its proper placement. ✅
- It is how the function can be used in a program.
- It is the purpose of the function and its use.
- It is the function’s name and placement.
25. A data analyst in a human resources department is working with the following selection of a spreadsheet:
They want to create employee identification numbers (IDs) in column D. The IDs should include the last four digits of the employee’s Social Security Number(SS#) plus the year hired. What function will create the ID 19392020 for the employee in row 4?
- =CONCATENATE(B4+A4)
- =CONCATENATE(B4,A4) ✅
- =CONCATENATE(A4+B4)
- =CONCATENATE(A4!B4)
26. An analyst is cleaning a new dataset. They want to determine how many of the cells in column F have a value of 0. However, they only want rows 7 to 120 to be considered. Which COUNTIF function syntax can be used to answer this question?
- =COUNTIF(F2:F1250, 0)
- =COUNTIF(F7:F120, =0)
- =COUNTIF(F7:F120,”0″) ✅
- =COUNTIF(F7:F120,”=0”)
27. A data analyst needs to combine two datasets. Each dataset comes from a different system, and the systems store data in different ways. What can the data analyst do to ensure the data is compatible prior to analyzing the data?
- Use a data visualization
- Map the data ✅
- Spot check for null values
- Apply a data structure
28. A data analyst is working on a spreadsheet in which one of the columns contains name data. This data is formatted as lastname_firstname. The analyst splits this data at the underscore so that each piece—firstname and lastname—are contained in their own columns.
In this context, what is the underscore acting as?
- Partition
- Delimiter ✅
- Substring
- MID function
29. A data analyst is using a function in a spreadsheet. When they input the function, they follow a predetermined structure that includes all required information for the function and its proper placement. What aspect of a function does this describe?
- The specified value of the function
- The syntax of the function ✅
- The length of the function
- The number of characters in the function
30. You are working with the following selection of a spreadsheet:
In order to extract the five-digit postal code from Brandon, FL, what is the correct function?
- =RIGHT(5,B4)
- =RIGHT(B4,5) ✅
- =LEFT(B4,5)
- =LEFT(5,B4)
31. A data analyst in a human resources department is working with the following selection of a spreadsheet:
They want to create employee identification numbers (IDs) in column D. The IDs should include the last four digits of the employee’s Social Security Number(SS#) plus the year hired. What function will create the ID 32082009 for the employee in row 5?
- =CONCATENATE(B5,A5) ✅
- =CONCATENATE(A5!B5)
- =CONCATENATE(A5+B5)
- =CONCATENATE(B5+A5)
32. Before analyzing a dataset, an analyst maps the data. What is the reason for doing this?
- The analyst wants to know what attributes the data has.
- The analyst thinks the dataset might have some null values.
- The dataset has no visualizations.
- The dataset contains data from different sources. ✅
33. A data analyst suspects that there are many blank cells in their spreadsheet corresponding to missing information. What spreadsheet tool can they use to identify only those cells containing the null values?
- Conditional ranking
- Conditional formatting ✅
- Cell querying
- Cell filtering
34. A data analyst is working on a spreadsheet in which one of the columns is name data. This data is formatted as lastname, firstname. The analyst chooses to divide this data into two new columns, one containing the firstname data and the other containing the lastname data. What spreadsheet tool would they use to do this?
- The MID function
- The SPLIT function ✅
- Substring formatting
- Conditional formatting
35. Fill in the blank: The function _____ is used to return information in a column that contains a specified value.
- VALUE
- MATCH
- VLOOKUP ✅
- FIND
Explanation:
The VLOOKUP function searches for a value in the first column of a range and returns data from the corresponding row in a specified column.
36. In a spreadsheet, what function would you use to extract the last three characters of the string located in row 4, column C?
- =RIGHT(3,C4)
- =LEFT(C4,3)
- =LEFT(3,C4)
- =RIGHT(C4,3) ✅
37. Fill in the blank: In order to make your spreadsheet easier to analyze, you choose to alter the way cells appear if their values meet certain conditions. The spreadsheet tool that you use to do this is called _____.
- conditional formatting ✅
- cell querying
- cell filtering
- conditional ranking
38. An analyst is cleaning a new dataset. They want to make sure the data contained from cell B2 through cell B100 does not contain a number smaller than 10. Which COUNTIF function syntax can be used to answer this question?
- =COUNTIF(B2:B100,”<9″)
- =COUNTIF(B2:B100,”>=10”) ✅
- =COUNTIF(B2:B100,>50)
- =COUNTIF(B2:B200, ”<=50”)
39. A data analyst is adding new data to the dataset. This data comes from a different source than the current data. What should the data analyst do to ensure the compatibility of the current and new data?
- Use a data visualization
- Map the data ✅
- Spot check for null values
- Apply a data structure
40. A delimiter is a character that indicates the beginning or end of a data item. The split text to columns tool uses a delimiter to accomplish what task?
- To specify where to split a text string ✅
- To split duplicate substrings
- To format a string to numeric
- To change the format of a column of text
41. You are working with the following selection of a spreadsheet:
In order to extract the five-digit postal code from North Wales, PA, what is the correct function?
- =LEFT(5,B2)
- =LEFT(B2,5)
- =RIGHT(5,B2)
- =RIGHT(B2,5) ✅
42. A data analyst in a human resources department is working with the following selection of a spreadsheet:
They want to create employee identification numbers (IDs) in column D. The IDs should include the year hired plus the last four digits of the employee’s Social Security Number(SS#). What function will create the ID 20142683 for the employee in row 3?
- =CONCATENATE(A3*B3)
- =CONCATENATE(A3,B3) ✅
- =CONCATENATE(A3+B3)
- =CONCATENATE(A3!B3)
43. To evaluate how well two or more data sources work together, data analysts use data mapping.
- True ✅
- False
44. An analyst is cleaning a new dataset containing 500 rows. They want to make sure the data contained from cell B2 through cell B300 does not contain a number greater than 50. Choose the statement that includes the correct syntax for this COUNTIF function?
- =COUNTIF(B2:B300,”>50″) ✅
- =COUNTIF(B2:B300,”<50”)
- =COUNTIF(B2:B300,>50)
- =COUNTIF(B2:B300,<=50)
Explanation:
The COUNTIF function counts cells within a range that meet a specified condition. The condition must be enclosed in quotation marks when using operators like >
or <
.
45. Fill in the blank: Data mapping is the process of _____ fields from one data source to another.
- matching ✅
- extracting
- inserting
- transforming
Explanation:
Data mapping involves linking or matching fields from one dataset to corresponding fields in another, ensuring proper alignment for analysis or integration.
46. A data analyst in a human resources department is working with the following selection of a spreadsheet:.
They want to create employee identification numbers (IDs) in column D. The IDs should include the year hired plus the last four digits of the employee’s Social Security Number (SS#). What function will create the ID 20201939 for the employee in row 4?
- =CONCATENATE(A4*B4)
- =CONCATENATE(A4+B4)
- =CONCATENATE(A4!B4)
- =CONCATENATE(A4,B4) ✅
Explanation:
The CONCATENATE function combines text from multiple cells. Here, it merges the year from A4
with the last four digits of the SS# in B4
.
Related contents:
Module 1: The importance of integrity
Module 3: Data cleaning with SQL
Module 4: Verify and report on cleaning results
Module 6: Course wrap-up
You might also like:
Course 1: Foundations: Data, Data, Everywhere
Course 2: Ask Questions to Make Data-Driven Decisions
Course 3: Prepare Data for Exploration
Course 5: Analyze Data to Answer Questions
Course 6: Share Data Through the Art of Visualization
Course 7: Data Analysis with R Programming
Course 8: Google Data Analytics Capstone: Complete a Case Study