Week 3 – Cleaning data with SQL

process data from dirty to clean weekly challenge 3 answers

1. Fill in the blank: Data analysts usually use _____ to deal with very large datasets.

  • web browsers
  • spreadsheets
  • SQL
  • word processors

2.What are some of the benefits of using SQL for analysis? Select all that apply.

  • SQL interacts with database programs.
  • SQL tracks changes across a team.
  • SQL has built-in functionalities.
  • SQL can pull information from different database sources.

3. A data analyst creates many new tables in their company’s database. When the project is complete, the analyst wants to remove the tables so they don’t clutter the database. What SQL commands can they use to delete the tables?

  • CREATE TABLE IF NOT EXISTS
  • DROP TABLE IF EXISTS
  • UPDATE
  • INSERT INTO

4. You are working with a database table that contains invoice data. The table includes columns for invoice_id and customer_id. You want to remove duplicate entries for customer ID and sort the results by invoice ID.

You write the SQL query below. Add a DISTINCT clause that will remove duplicate entries from the customer_id column.

NOTE: The three dots (...) indicate where to add the clause.

What customer ID number appears in row 12 of your query result?

  • 23
  • 42
  • 16
  • 8

5. You are working with a database table that contains customer data. The table includes columns about customer location such as city, state, country, and postal_code. You want to check for postal codes that are greater than 7 characters long.

You write the SQL query below. Add a LENGTH function that will return any postal codes that are greater than 7 characters long.

What is the last name of the customer that appears in row 10 of your query result?

  • Rocha
  • Brooks
  • Hughes
  • Ramos

6. A data analyst is cleaning transportation data for a ride-share company. The analyst converts the data on ride duration from text strings to floats. What does this scenario describe?

  • Visualizing
  • Processing
  • Calculating
  • Typecasting

7. The CAST function can be used to convert the DATE datatype to the DATETIME datatype.

  • True
  • False

8. Fill in the blank: The _____ function can be used to return non-null values in a list.

  • TRIM
  • COALESCE
  • CAST
  • CONCAT

9. You are working with a database table that contains employee data. The table includes columns about employee location such as city, state, country, and postal_code. You want to retrieve the first 3 characters of each postal code. You decide to use the SUBSTR function to retrieve the first 3 characters of each postal code, and use the AS command to store the result in a new column called new_postal_code.

You write the SQL query below. Add a statement to your SQL query that will retrieve the first 3 characters of each postal code and store the result in a new column as new_postal_code.

NOTE: The three dots (...) indicate where to add the statement.

What employee ID number appears in row 5 of your query result?

NOTE: The query index starts at 1 not 0.

  • 3
  • 1
  • 8
  • 7

Shuffle Q/A 1

10. Why do data analysts choose to work with SQL? Select all that apply.

  • SQL can handle huge amounts of data.
  • SQL is a powerful software program.
  • SQL is a well-known standard in the professional community.
  • SQL is a programming language that can also create web apps.

11. A team of data analysts is working on a large project that will take months to complete and contains a huge amount of data. They need to document their process and communicate with multiple databases. The team decides to use a SQL server as the main analysis tool for this project and SQL for the queries. What makes this the most efficient tool? Select all that apply.

  • SQL efficiently handles large amounts of data.
  • SQL records queries and changes throughout a project.
  • SQL contains commands that build visualizations.
  • SQL allows you to connect to multiple databases.

Devendra Kumar

Project Management Apprentice at Google

Leave a Reply