Module 3: Data cleaning with SQL
Looking answers for ‘process data from dirty to clean module 3 challenge’?
In this post, I provide accurate answers and detailed explanations for Module 3: Data cleaning with SQL of Course 4: Process Data from Dirty to Clean – Google Data Analytics Professional Certificate.
Whether you’re preparing for quizzes or brushing up on your knowledge, these insights will help you master the concepts effectively. Let’s dive into the correct answers and detailed explanations for each question.
Test your knowledge on SQL
Practice Quiz
1. Which of the following are benefits of using SQL? Select all that apply.
- SQL can be used to program microprocessors on database servers.
- SQL can handle huge amounts of data. ✅
- SQL offers powerful tools for cleaning data. ✅
- SQL can be adapted and used with multiple database programs. ✅
Explanation:
- SQL is designed to manage large datasets efficiently, which is why it’s widely used in data analytics and database management.
- SQL has tools like
TRIM()
,REPLACE()
, and conditional logic that make it useful for data cleaning. - SQL is versatile and works across different database systems (e.g., MySQL, PostgreSQL, Oracle) with minor adaptations.
- Incorrect option: SQL is not used for programming microprocessors; it is a query language for databases.
2. Which of the following tasks can data analysts do using both spreadsheets and SQL? Select all that apply.
- Join data ✅
- Perform arithmetic ✅
- Use formulas ✅
- Process huge amounts of data efficiently
Explanation:
- Both spreadsheets (e.g., Excel) and SQL can join data through merging tables or using lookup functions.
- Arithmetic operations such as addition, subtraction, and aggregation are possible in both.
- Formulas or functions (e.g.,
SUM()
in Excel,SUM()
in SQL) can be used in both tools. - Incorrect option: While SQL excels at handling large datasets, spreadsheets struggle with huge data due to memory limitations.
3. SQL is a language used to communicate with databases. Like most languages, SQL has dialects. What are the advantages of learning and using standard SQL? Select all that apply.
- Standard SQL is automatically translated by databases to other dialects.
- Standard SQL works with a majority of databases. ✅
- Standard SQL requires a small number of syntax changes to adapt to other dialects. ✅
- Standard SQL is much easier to learn than other dialects.
Explanation:
- Standard SQL is compatible with most relational databases, which makes it a valuable skill for analysts.
- Adapting standard SQL to dialects like T-SQL (Microsoft SQL Server) or PL/SQL (Oracle) usually requires minor syntax adjustments.
- Incorrect options:
- Standard SQL is not automatically translated by databases into other dialects.
- Learning standard SQL is not necessarily easier than learning other dialects but provides a strong foundational understanding.
Test your knowledge on SQL queries
Practice Quiz
4. Which of the following SQL functions can data analysts use to clean string variables? Select all that apply.
- LENGTH
- SUBSTR ✅
- TRIM ✅
- COUNTIF
Explanation:
The SUBSTR and TRIM functions are used to clean string variables in SQL. SUBSTR extracts a substring from a string, while TRIM removes leading and trailing spaces.
5. You are working with a database table that contains data about playlists for different types of digital media. The table includes columns for playlist_id and name. You want to remove duplicate entries for playlist names and sort the results by playlist ID.
You write the SQL query below. Add a DISTINCT clause that will remove duplicate entries from the name column.
NOTE: The three dots (...) indicate where to add the clause.
What playlist name appears in row 6 of your query result?
- Music Videos ✅
- Audiobooks
- Movies
- TV Shows
Explanation:
The DISTINCT
clause removes duplicate entries from the name
column, ensuring each playlist name appears only once. The query will return the sorted list, with “Music Videos” appearing in row 6.
6. You are working with a database table that contains data about music albums. The table includes columns for album_id, title, and artist_id. You want to check for album titles that are less than 4 characters long.
You write the SQL query below. Add a LENGTH function that will return any album titles that are less than 4 characters long.
What album ID number appears in row 3 of your query result?
- 239
- 236
- 182 ✅
- 131
Explanation:
The LENGTH(title) < 4
condition checks for album titles that are less than 4 characters long. The album with ID 182 appears in row 3 of the result.
7. You are working with a database table that contains customer data. The table includes columns about customer location such as city, state, and country. You want to retrieve the first 3 letters of each country name. You decide to use the SUBSTR function to retrieve the first 3 letters of each country name, and use the AS command to store the result in a new column called new_country.
You write the SQL query below. Add a statement to your SQL query that will retrieve the first 3 letters of each country name and store the result in a new column as new_country.
NOTE: The three dots (...) indicate where to add the statement.
What customer ID number appears in row 2 of your query result?
- 47
- 55 ✅
- 3
- 28
Explanation:
The SUBSTR(country, 1, 3)
function extracts the first 3 characters of the country name, which is stored in a new column called new_country
. Customer ID 55 appears in row 2 of the result.
Module 3 challenge
Graded Quiz
8. Fill in the blank: Data analysts usually use _____ to deal with very large datasets.
- web browsers
- spreadsheets
- SQL ✅
- word processors
9.What are some of the benefits of using SQL for analysis? Select all that apply.
- SQL interacts with database programs. ✅
- SQL tracks changes across a team. ✅
- SQL has built-in functionalities.
- SQL can pull information from different database sources. ✅
10. A data analyst creates many new tables in their company’s database. When the project is complete, the analyst wants to remove the tables so they don’t clutter the database. What SQL commands can they use to delete the tables?
- CREATE TABLE IF NOT EXISTS
- DROP TABLE IF EXISTS ✅
- UPDATE
- INSERT INTO
11. You are working with a database table that contains invoice data. The table includes columns for invoice_id and customer_id. You want to remove duplicate entries for customer ID and sort the results by invoice ID.
You write the SQL query below. Add a DISTINCT clause that will remove duplicate entries from the customer_id column.
NOTE: The three dots (...) indicate where to add the clause.
What customer ID number appears in row 12 of your query result?
- 23
- 42
- 16 ✅
- 8
12. You are working with a database table that contains customer data. The table includes columns about customer location such as city, state, country, and postal_code. You want to check for postal codes that are greater than 7 characters long.
You write the SQL query below. Add a LENGTH function that will return any postal codes that are greater than 7 characters long.
What is the last name of the customer that appears in row 10 of your query result?
- Rocha
- Brooks
- Hughes ✅
- Ramos
13. A data analyst is cleaning transportation data for a ride-share company. The analyst converts the data on ride duration from text strings to floats. What does this scenario describe?
- Visualizing
- Processing
- Calculating
- Typecasting ✅
14. The CAST function can be used to convert the DATE datatype to the DATETIME datatype.
- True ✅
- False
15. Fill in the blank: The _____ function can be used to return non-null values in a list.
- TRIM
- COALESCE ✅
- CAST
- CONCAT
16. You are working with a database table that contains employee data. The table includes columns about employee location such as city, state, country, and postal_code. You want to retrieve the first 3 characters of each postal code. You decide to use the SUBSTR function to retrieve the first 3 characters of each postal code, and use the AS command to store the result in a new column called new_postal_code.
You write the SQL query below. Add a statement to your SQL query that will retrieve the first 3 characters of each postal code and store the result in a new column as new_postal_code.
NOTE: The three dots (...) indicate where to add the statement.
What employee ID number appears in row 5 of your query result?
NOTE: The query index starts at 1 not 0.
- 3 ✅
- 1
- 8
- 7
17. Why do data analysts choose to work with SQL? Select all that apply.
- SQL can handle huge amounts of data. ✅
- SQL is a powerful software program.
- SQL is a well-known standard in the professional community. ✅
- SQL is a programming language that can also create web apps.
Explanation:
SQL is a powerful tool widely used in data analysis and is well-known for its ability to handle large datasets and serve as a standard across different databases.
18. A team of data analysts is working on a large project that will take months to complete and contains a huge amount of data. They need to document their process and communicate with multiple databases. The team decides to use a SQL server as the main analysis tool for this project and SQL for the queries. What makes this the most efficient tool? Select all that apply.
- SQL efficiently handles large amounts of data. ✅
- SQL records queries and changes throughout a project. ✅
- SQL contains commands that build visualizations.
- SQL allows you to connect to multiple databases. ✅
19. Fill in the blank: _____ refers to the process of converting data from one type to another.
- Formatting
- Cleaning
- Typecasting ✅
- Querying
20. A data analyst is working with product sales data. They import new data into a database. The database recognizes the data for product price as text strings. What SQL function can the analyst use to convert text strings to floats?
- LENGTH
- TRIM
- SUBSTR
- CAST ✅
Explanation: The CAST
function is used to convert data from one type to another, such as converting a text string to a float.
21. Fill in the blank: The _____ function can be used to join strings to create a new column.
- CAST
- COALESCE
- TRIM
- CONCAT ✅
22. As a data analyst, you are working on a quick project containing a small amount of data. As the data was emailed to you, there is no need to query the data. What tool should you use to perform your analysis?
- Spreadsheet ✅
- SQL
- word process
- CSV
23. A data analyst has added a massive table to their database on accident and needs to remove the table. What command can the analyst use to correct their mistake?
- DROP TABLE IF NOT EXISTS
- INSERT INTO
- REMOVE TABLE IF EXISTS
- DROP TABLE IF EXISTS ✅
24. You are working with a database table that contains invoice data. The table includes a column for customer_id. You want to remove duplicate entries for customer_id and get a count of total customers in the database.
You write the SQL query below. Add a DISTINCT clause that will remove duplicate entries from the customer_id column.
NOTE: The three dots (...) indicate where to add the clause.
What is the total number of customers in the database?
- 84
- 105
- 43
- 59 ✅
25. In SQL databases, what data type refers to a number that does not contain a decimal?
- String
- Integer ✅
- Boolean
- Float
26. After joining multiple tables you find your data contains a significant amount of null values. What function can you use to return only the non-null values in a list ?
- CAST
- COALESCE ✅
- TRIM
- CONCAT
27. You are working with a database table that contains customer data. The table includes columns about customer location such as city, state, and country. The state names are abbreviated. You want to retrieve the first 2 letters of each state name. You decide to use the SUBSTR function to retrieve the first 2 letters of each state name, and use the AS command to store the result in a new column called new_state.
You write the SQL query below. Add a statement to your SQL query that will retrieve the first 2 letters of each state name and store the result in a new column as new_state.
NOTE: The three dots (...) indicate where to add the statement.
NOTE: SUBSTR takes in three arguments being column, starting_index, ending_index
What customer ID number is in row 9 of your query result?
NOTE: The query index starts at 1 not 0.
- 3
- 55
- 47 ✅
- 10
Explanation:
The SUBSTR(state, 1, 2)
function extracts the first 2 characters of the state
column and stores them in the new_state
column. Customer ID 47 appears in row 9 after sorting by state
.
28. A junior data analyst joins a new company. The analyst learns that SQL is heavily utilized within the organization. Why would the organization choose to invest in SQL? Select all that apply.
- SQL is a programming language that can also create web apps.
- SQL can handle huge amounts of data. ✅
- SQL is a powerful software program.
- SQL is a well-known standard in the professional community. ✅
29. You are working with a database table that contains invoice data. The table includes columns for invoice_id and billing_state. You want to remove duplicate entries for billing state and sort the results by invoice ID.
You write the SQL query below. Add a DISTINCT clause that will remove duplicate entries from the billing_state column.
NOTE: The three dots (...) indicate where to add the clause.
What billing state appears in row 17 of your query result?
NOTE: The query index starts at 1 not 0.
- AZ ✅
- NV
- CA
- WI
30. You are working with a database table that contains customer data. The table includes columns about customer location such as city, state, country, and postal_code. You want to check for city names that are greater than 9 characters long.
You write the SQL query below. Add a LENGTH function that will return any city names that are greater than 9 characters long.
What is the first name of the customer that is in row 7 of your query result?
NOTE: The query index starts at 1 not 0.
- Diego
- Kara
- Julia ✅
- Roberto
31. In SQL databases, what data type refers to a number that contains a decimal?
- Boolean
- Float ✅
- Integer
- String
32. You’re working with a dataset that contains a float column with a significant amount of decimal places. This level of granularity is not needed for your current analysis. How can you convert the data in the float column to be integer data?
- CAST ✅
- COALESCE
- TRIM
- CONCAT
33. What SQL function lets you add strings together to create new text strings that can be used as unique keys?
- CAST
- COALESCE
- TRIM
- CONCAT ✅
Explanation:
The CONCAT
function is used to join or add strings together, which is useful for creating unique keys from multiple text columns.
34. What are some of the benefits of using SQL for analysis? Select all that apply.
- SQL interacts with database programs. ✅
- SQL has better user management than spreadsheets.
- SQL can pull information from different database sources. ✅
- SQL tracks changes across a team. ✅
35. A data analyst is managing a database of customer information for a retail store. What SQL command can the analyst use to add a new customer to the database?
- UPDATE
- CREATE TABLE IF NOT EXISTS
- DROP TABLE IF EXISTS
- INSERT INTO ✅
Explanation:
The INSERT INTO
command is used to add new rows of data into an existing table.
36. In SQL databases, True/False values refers to what data type?
- String
- Float
- Integer
- Boolean ✅
Explanation:
In SQL, TRUE
/FALSE
values are represented as the Boolean data type.
37. A data analyst is tasked with identifying what orders are still in transit. The current list of orders contains trillions of rows. What is the best tool for the analyst to use?
- Spreadsheets
- CSV
- SQL ✅
- Word processor
38. Your manager tasks you with analyzing a dataset and visually inspecting the data. Upon initial inspection you realize that this is a small dataset. What tool should you use to analyze the data?
- CSV
- Spreadsheet ✅
- SQL
- Word processor
39. A data analyst creates a database to store information on the company's customer data. When completing the initial import the analyst notices that they forgot to add a few customers into the table. What command can the analyst use to add these missed customers?
- ADD
- APPEND
- INSERT INTO ✅
- DROP
40. You are working with a database table that contains customer data. The table includes columns about customer location such as city, state, country, and postal_code. You want to find what state names are greater than 3 characters.
You write the SQL query below. Add a LENGTH function that will return any state names that are greater than 3 characters long.
What state is in row 1 of your query result?
NOTE: The query index starts at 1 not 0.
- India
- Chile
- Dublin ✅
- Ireland
41. In SQL databases, what function can be used to convert data from one datatype to another?
- CAST ✅
- LENGTH
- TRIM
- SUBSTR
42. After a company merger, a data analyst receives a dataset with billions of rows of data. They need to leverage this data to identify insights for upper management. What tool would be most efficient for the analyst to use?
- Spreadsheet
- Word processor
- SQL ✅
- CSV
43. You are working with a database table that contains customer data. The table includes columns about customer location such as city, state, country, and postal_code. The state names are abbreviated. You want to check for state names that are greater than 2 characters long.
You write the SQL query below. Add a LENGTH function that will return any state names that are greater than 2 characters long.
What country is in row 1 of your query result?
NOTE: The query index starts at 1 not 0.
- Ireland ✅
- India
- France
- Chile
44. You are working with a database table that contains employee data. The table includes columns about employee location such as city, state, country, and postal_code. You use the SUBSTR function to retrieve the first 3 characters of each last_name, and use the AS command to store the result in a new column called new_last_name.
You write the SQL query below. Add a statement to your SQL query that will retrieve the first 3 characters of each last_name and store the result in a new column as new_last_name.
NOTE: The three dots (...) indicate where to add the statement.
NOTE: SUBSTR takes in three arguments being column, starting_index, ending_index
What employee ID number is in row 8 of your query result?
NOTE: The query index starts at 1 not 0.
- 7
- 3
- 1 ✅
- 8
45. A data analyst runs a SQL query to extract some data from a database for further analysis. How can the analyst save the data? Select all that apply.
- Download the data as a spreadsheet. ✅
- Use the UPDATE query to save the data.
- Create a new table for the data. ✅
- Run a SQL query to automatically save the data.
46. A data analyst notices their Boolean column is incorrectly storing True/False values as strings. What function can the analyst use to convert the data type from a string to Boolean?
- CAST ✅
- TRIM
- LENGTH
- SUBSTR
47. A data analyst is cleaning survey data. The results for an optional question contain many nulls. What function can the analyst use to eliminate the null values from the results?
- COALESCE ✅
- CONCAT
- CAST
- LENGTH
48. A data analyst is analyzing medical data for a health insurance company. The dataset contains billions of rows of data. Which of the following tools will handle the data most efficiently?
- A word processor
- CSV
- A spreadsheet
- SQL ✅
49. You are working with a database table that contains invoice data. The table includes columns for invoice_id and billing_city. You want to remove duplicate entries for billing_city and sort the results by invoice_id.
You write the SQL query below. Add a DISTINCT clause that will remove duplicate entries from the billing_city column.
NOTE: The three dots (...) indicate where to add the clause.
What billing city appears in row 15 of your query result?
NOTE: The query index starts at 1 not 0.
- London
- Reno ✅
- Santiago
- Oslo
50. You are working with a database table that contains invoice data. The table includes columns about billing location such as billing_city, billing_state, and billing_postal_code. You use the SUBSTR function to retrieve the first 4 numbers of each billing_postal_code, and use the AS command to store the result in a new column called new_postal_code.
You write the SQL query below. Add a statement to your SQL query that will retrieve the first 4 numbers of each billing postal code and store the result in a new column as new_postal_code.
NOTE: The three dots (...) indicate where to add the statement.
NOTE: SUBSTR takes in three arguments being column, starting_index, ending_index
What invoice id is located in row 4?
NOTE: The query index starts at 1 not 0.
- 32
- 104
- 52
- 206 ✅
51. In which of the following situations would a data analyst use SQL instead of a spreadsheet? Select all that apply.
- When quickly pulling information from many different sources in a database ✅
- When recording queries and changes throughout a project ✅
- When working with a huge amount of data ✅
- When using the COUNTIF function to find a specific piece of information
Explanation:
SQL is used in situations where data is stored in databases, particularly for handling large datasets, complex queries, and when tracking changes over time. The COUNTIF function is specific to spreadsheets, not SQL.
52. You are working with a database table named invoice that contains invoice data. The table includes columns for customer_id and total. You want to remove duplicate customers and identify which unique customers have a total greater than 5.
You write the SQL query below. Add a DISTINCT clause that will remove duplicate entries from the customer_id column.
NOTE: The three dots (...) indicate where to add the clause.
What customer_id number is located in row 5?
NOTE: The query index starts at 1 not 0.
- 57
- 14
- 52 ✅
- 4
Explanation:
The DISTINCT
clause removes duplicate entries from the customer_id
column, and the query returns the unique customer IDs where the total is greater than 5.
53. You are working with a database table named customer that contains customer data. The table includes columns about customer location such as city, state, country, and postal_code. You want to check what country names are greater than 7 characters.
You write the SQL query below. Add a LENGTH function that will return any country names that are greater than 7 characters long.
NOTE: The three dots (...) indicate where to add the clause.
What is the value of postal_code in row 1 of your query result? (Hint: you will have to scroll to the right with your mouse or track pad to locate the indicated column.)
NOTE: The query index starts at 1 not 0.
- 14700 ✅
- None
- 2010
- N1 5LH
Explanation:
The LENGTH(country) > 7
condition filters countries with names longer than 7 characters. The postal code for Ireland (in this case) is N1 5LH
in row 1.
Related contents:
Module 1: The importance of integrity
Module 2: Clean data for more accurate insights
Module 4: Verify and report on cleaning results
Module 6: Course wrap-up
You might also like:
Course 1: Foundations: Data, Data, Everywhere
Course 2: Ask Questions to Make Data-Driven Decisions
Course 3: Prepare Data for Exploration
Course 5: Analyze Data to Answer Questions
Course 6: Share Data Through the Art of Visualization
Course 7: Data Analysis with R Programming
Course 8: Google Data Analytics Capstone: Complete a Case Study