Module 4: Perform data calculations

Looking answers for ‘analyze data to answer questions module 4 challenge’?

In this post, I provide accurate answers and detailed explanations for Module 4: Perform data calculations of Course 5: Analyze Data to Answer Questions Google Data Analytics Professional Certificate.

Whether you’re preparing for quizzes or brushing up on your knowledge, these insights will help you master the concepts effectively. Let’s dive into the correct answers and detailed explanations for each question.

Test your knowledge on data calculations

Practice Quiz

1. What is the correct spreadsheet formula for multiplying 50 and 233?

  • 50*233
  • 50×233
  • =50×233
  • =50*233 ✅

Explanation:
In spreadsheets, formulas always start with an equals sign (=). The asterisk (*) is used for multiplication. Any other format, such as 50*233 (without =) or using x for multiplication, is invalid.

2. The following is a selection of a spreadsheet:

You are trying to determine what percentage of your monthly income is spent on big-ticket items, such as rent and groceries. To add together only the values from Column B that cost more than $150, what is the correct syntax?

  • =SUMIF(B2:B12,”<150″)
  • =SUMIF(B2:B12,<150)
  • =SUMIF(B2:B12,”>150″) ✅
  • =SUMIF(B2:B12,>150)

Explanation:
The SUMIF function adds up values in a range that meet a specific condition.

  • B2:B12: The range of values to evaluate.
  • ">150": The condition to include only values greater than 150.
    Using <150 would sum values less than 150, which is incorrect for this question.

3. A data analyst is working with a spreadsheet from a cosmetics company.

You may click the link to create a copy of the dataset: Cosmetics Inc.

Or, if you don’t have a Google account, download the template directly from the attachment below.

Which of the following is an example of an array in this spreadsheet?

  • Cells D7 and D14
  • All cells with values greater than 100
  • All cells with number values
  • The values in cells B2 through B31 ✅

Explanation:
An array in a spreadsheet refers to a range of cells that can be used together as a group.

  • B2 through B31 is a range of cells representing an array.
  • Individual cells (e.g., D7 and D14) or conditions (e.g., “values greater than 100”) are not arrays.
  • “All cells with number values” is too broad and not a specific array.

Test your knowledge on using pivot tables

Practice Quiz

4. The following is a sample pivot table from a furniture company's spreadsheet:

What is the purpose of the pivot table in this spreadsheet?

  • To summarize data about each product
  • To organize all of the data into a smaller format
  • To find the average price of each product
  • To calculate the sum of individual prices for each product type ✅
Explanation:
The pivot table is specifically designed to sum up the purchase prices for each product type (e.g., bed, chair) and provides a grand total for all products. It does not provide averages or a general summary of data.

5.

How could the pivot table be adjusted to show the same data, but only for products categorized as beige?

  • Add a new column labeled beige
  • Add a filter to show only beige products ✅
  • Summarize the values by product
  • Sort the current row by product color

Explanation:
By applying a filter to the pivot table, you can restrict the data to display only beige products. Adding a new column labeled “beige” or sorting the rows by color does not directly filter the pivot table.

6.

Which spreadsheet tool should you use if you want to find an average value using values generated within a pivot table?

  • A calculated field ✅
  • Data validation
  • A filter
  • Conditional formatting

Explanation:
A calculated field allows you to perform operations such as calculating averages, sums, or ratios using data summarized in the pivot table. This is the appropriate tool for generating custom calculations.

Test your knowledge on SQL calculations

Practice Quiz

7. You are working with a database table that contains invoice data. The table includes columns for invoice_line_id (line items for each invoice), invoice_id, unit_price, and quantity (the number of purchases in each line item). Each invoice contains multiple line items. You want to know the total price for each of the first 5 line items in the table. You decide to multiply unit price by quantity to get the total price for each line item, and use the AS command to store the total in a new column called line_total.

Add a statement to your SQL query that calculates the total price for each line item and stores it in a new column as line_total.

NOTE: The three dots (...) indicate where to add the statement.

What total appears in row 1 of your query result?

  • 3.96
  • 7.92
  • 1.98
  • 0.99 ✅

Explanation:
The query calculates the total price for each line item by multiplying unit_price and quantity, storing the result as line_total. For the first row:

unit_price×quantity=0.99×1=0.99\text{unit\_price} \times \text{quantity} = 0.99 \times 1 = 0.99

8. In a SQL query, which calculation does the modulo (%) operator perform?

  • It converts a decimal to a percent
  • It finds the square root of a number
  • It applies an exponent to a value
  • It returns the remainder of a division calculation ✅

Explanation:
The modulo operator % calculates the remainder when one number is divided by another.

For example:
10%3=1(because 10÷3=3 remainder 1).10 \% 3 = 1 \quad \text{(because } 10 \div 3 = 3 \text{ remainder } 1\text{)}.

9. You are working with a dataset with the column name “firstquarterexpenses.” How can you rename this column to make it more readable?

  • first+quarter+expenses
  • first quarter expenses
  • first_quarter_expenses ✅
  • Firstquarterexpenses

Explanation:
Renaming columns to include underscores improves readability while maintaining compatibility with SQL naming conventions, as spaces or special characters may cause errors.

Test your knowledge on data validation

Practice Quiz

10. What are the goals of checking and rechecking the quality of your data during data validation? Select all that apply.

  • Data is secure ✅
  • Data is consistent ✅
  • Data is sorted and filtered
  • Data is complete and accurate ✅

Explanation:
The primary goals of data validation are to ensure that:

  • Data is consistent (no logical contradictions within the dataset).
  • Data is complete and accurate (no missing or incorrect information).
  • Data is secure (protected from unauthorized access or corruption).

Sorting and filtering data, while useful for analysis, are not goals of validation.

11. You’re analyzing patient data for a health care company. During the data-validation process, you notice that the first date of service for some of the patients is later than the most recent date of service. Which type of data-validation check are you completing?

  • Data type
  • Data structure
  • Data consistency ✅
  • Data range

Explanation:
A data consistency check verifies that values in a dataset logically align. In this case, the first date of service should always occur before or at the same time as the most recent date. Identifying such discrepancies ensures the dataset makes logical sense.

12. During analysis, you complete a data-validation check for errors in customer identification (ID) numbers. Customer IDs must be eight characters and can contain numbers only. Which of the following customer ID errors will a data-type check help you identify?

  • IDs with text ✅
  • IDs in the wrong column
  • IDs with more than eight characters
  • IDs that are repeated

Explanation:
A data-type check ensures that data in a column adheres to the expected format. If customer IDs are expected to be numeric, a data-type check will catch entries containing text.

  • IDs in the wrong column would require a positional or schema validation.
  • IDs with more than eight characters would need a length validation.
  • Duplicate IDs would be identified by a uniqueness validation.

Test your knowledge on using SQL with temporary tables

Practice Quiz

13. When are temporary tables automatically deleted?

  • After running a report from the table
  • After running a query in your SQL database
  • After ending the session in a SQL database ✅
  • After completing all calculations in the table

Explanation: Temporary tables are created within a specific session and are automatically deleted when that session ends. These tables are not stored permanently in the database. This feature helps optimize resources and avoid leaving unnecessary data behind.

  • Why not the others?
    • After running a query in your SQL database: Temporary tables are not deleted after running a query. They remain in the session until the session ends.
    • After completing all calculations in the table: The table doesn’t get deleted automatically after calculations are completed. It is deleted when the session ends.
    • After running a report from the table: The report being run does not affect the deletion of the table. It’s the session’s end that triggers deletion.

14. The following SQL query contains information about bike trips:

What data will appear in the temporary table created through this query?

  • The total number of bike trips
  • Bike trips equal to or more than one hour
  • A random subset of bike trips
  • Bike trips that lasted exactly 60 minutes ✅

Explanation: The query specifies a condition WHERE tripduration = 60. This means the data in the temporary table will only include rows where the duration of the bike trip is exactly 60 minutes. The WITH clause creates a temporary table named 1_hr_trips that will store this filtered data.

  • Why not the others?
    • Bike trips equal to or more than one hour: This is incorrect because the query specifically filters for trips with a duration of exactly 60 minutes, not trips that are longer than or equal to one hour.
    • The total number of bike trips: The query does not aggregate data to give a total count; it only selects trips that match the duration condition.
    • A random subset of bike trips: The query isn’t creating a random sample; it’s filtering trips based on a specific condition.

15. What benefit does a CREATE TABLE statement add to a temporary table?

  • Specific naming conventions
  • Access for anyone to use the table ✅
  • Metadata about the data in the table
  • Automated calculations

Explanation: The CREATE TABLE statement is used to define a table and make it accessible to other users (based on permissions). It ensures the table is persistent, meaning it will be available for others to access during the session and beyond. For a temporary table, this is typically used when you need more control over the table’s access and schema.

  • Why not the others?
    • Specific naming conventions: While CREATE TABLE defines the table, it doesn’t enforce specific naming conventions. It’s up to the user to decide on the name.
    • Metadata about the data in the table: Metadata is important, but it’s not directly provided by CREATE TABLE. The table itself can have metadata like column types, but CREATE TABLE isn’t for adding extra metadata.
    • Automated calculations: The CREATE TABLE statement doesn’t perform calculations. It’s used for creating a structure to hold data. Calculations are performed using other SQL commands.

Module 4 challenge

Graded Quiz

16. A data analyst is working with a spreadsheet from a furniture company. To use the template for this spreadsheet, click the link below and select “Use Template.”

Link to template: Sample Transaction Table.

Or, if you don’t have a Google account, download the file directly from the attachment below.

The syntax of which of the following formulas would allow the analyst to count purchase sizes of two or more?

  • =COUNTIF(G2:G30, “>=2”)
  • =COUNTIF(H2:H30, “>=2”) ✅ 
  • =SUMIF(H2:H30, “=4”)
  • =SUMIF(G2:G30, “<=1”)

17. You are working in a spreadsheet and use the SUMIF function in the formula below as part of your analysis.

=SUMIF(A1:A25, ”<10”, C1:C25)

Which part of this formula is the criteria or condition?

  • ”<10” ✅
  • =SUMIF
  • C1:C25
  • A1:A25

Explanation: In the formula =SUMIF(A1:A25,”<10”,C1:C25), the condition or criteria is "<10". This tells Excel to sum the values in C1:C25 where the corresponding value in A1:A25 is less than 10.

  • Why not the others?

    • A1:A25: This is the range that is evaluated against the criteria.
    • C1:C25: This is the sum range where the values are added based on the condition.
    • SUMIF: This is the function itself, not the condition.

18. A data analyst is working in a spreadsheet and uses the SUMPRODUCT function in the formula below as part of their analysis.

=SUMPRODUCT(A2:A10,B2:B10)

How does the SUMPRODUCT function calculate the cell ranges identified in the parentheses?The analyst wants to figure out the value of all of the items in the spreadsheet. Which formula will calculate the total price of all of the items?

  • It multiplies the values in the first range, then multiplies the values in the second range .
  • It adds the values in the first range, then adds the values in the second range.
  • It multiplies the ranges, then adds the sum of the products of the two ranges. ✅
  • It adds the ranges, then multiplies them by the last value in the second array.

19. You create a pivot table in a spreadsheet containing movie data. To use the template for this spreadsheet, click the link below and select “Use Template.”

Link to template: Movie Data Project.

Or, if you don’t have a Google account, download the file directly from the attachment below.

If you want to summarize the data using the AVERAGE function in the Values menu, which spreadsheet columns could you add data from? Select all that apply.

  • Box Office Revenue ✅
  • Movie Title
  • Genre
  • Budget ✅

20. A data analyst uses the following SQL query to perform basic calculations on their data. Which types of operators is the analyst using in this SQL query? Select all that apply.

  • Multiplication
  • Addition ✅
  • Subtraction
  • Division ✅

21. You are working with a database table that contains data about music. The table includes columns for track_id, track_name, composer, and milliseconds (duration of the music track). You are only interested in data about the classical musician Johann Sebastian Bach. You want to know the duration of each Bach track in seconds. You decide to divide milliseconds by 1000 to get the duration in seconds, and use the AS command to store the result in a new column called secs.

Add a statement to your SQL query that calculates the duration in seconds for each track and stores it in a new column as secs.

NOTE: The three dots (...) indicate where to add the statement.

What is the duration in seconds of the track with Id number 3408?

  • 307
  • 120 ✅
  • 153 
  • 193

22. You are working with a database table that contains data about music. The table includes columns for album_id and milliseconds (duration of the music tracks on each album). You want to find out the total duration for each album in milliseconds, and store the result in a new column named total_duration.

You write the SQL query below. Add a GROUP BY clause that will group the data by album Id number.

What is the total duration of the album with Id number 2?

  • 257252
  • 959711
  • 342562 ✅
  • 858088

23. You are working with a database table that contains invoice data. The table includes columns for billing_state, billing_country, and total. You want to know the average total price for the invoices billed to the state of Wisconsin. You decide to use the AVG function to find the average total, and use the AS command to store the result in a new column called average_total.

Add a statement to your SQL query that calculates the average total and stores it in a new column as average_total.

NOTE: The three dots (...) indicate where to add the statement.

What is the average total for Wisconsin?

  • 5.54
  • 5.78
  • 6.08 ✅
  • 5.37

24. A data analyst wants to calculate the number of rows that have a value of “shipped”. Which function could they use?

  • =MAX(G2:G30,”=shipped”)
  • =SUM(G2:G30,”=shipped”)
  • =COUNT(G2:G30,”=shipped”)
  • =COUNTIF(G2:G30,”=shipped”) ✅

25. You are working in a spreadsheet and use the SUMIF function in the following formula as part of your analysis.

=SUMIF(D2:D10,”>=50”,E2:E10)

Which part of this formula indicates the range of values to be added?

  • E2:E10 ✅
  • >=50
  • D2:D10
  • =SUMIF

26. You create a pivot table and want to add up the total of all cells for each row and column value in the pivot table. Which function in the values menu would you use to summarize the data?

  • AVERAGE
  • SUM ✅
  • PRODUCT
  • COUNTA

27. What column is set as a value in the following pivot table?

  • Direction
  • Duration ✅
  • MAX
  • Date

28. In the following SQL query, which column is part of an addition operation that creates a new column?

SELECT
Yes_Responses,
No_Responses,
Total_Surveys,
Yes_Responses + No_Responses AS Responses_Per_Survey
FROM
Survey_1

  • Total_Surveys
  • Responses_Per_Survey
  • Yes_Responses ✅
  • Survey_1

Explanation:
In the query SELECT Yes_Responses, No_Responses, Total_Surveys, Yes_Responses + No_Responses AS Responses_Per_Survey, the addition operation involves Yes_Responses and No_Responses to create a new column called Responses_Per_Survey.

  • Why not the others?

    • Responses_Per_Survey: This is the result of the addition operation, not a column involved in the operation.
    • Total_Surveys: Not part of the addition operation.
    • Survey_1: This is the table, not a column.

29. What SQL operator is used to return the remainder of a division operation?

  • /
  • !=
  • <>
  • % ✅

Explanation:
The % operator in SQL is used to return the remainder of a division operation (modulo).

  • Why not the others?

    • <> and !=: These are inequality operators used for comparisons, not for division operations.
    • /: This is the division operator, not for returning the remainder.

30. What is the purpose of using data validation during your analysis process?

  • To ensure that you are able to use every piece of data from your raw data
  • To guarantee that all of your stakeholders will be happy with your results
  • To ensure that all data is complete, accurate, secure, and consistent ✅
  • To guarantee that visualizations are visually pleasing

31. What is the purpose of the operator in SQL?

  • To add two values
  • To return the remainder of a division operation
  • To check if two values are not equal ✅
  • To set a value equal to another

32. What is a reason to use a temporary table instead of a standard table in SQL?

  • A temporary table allows functions that are unavailable to standard tables.
  • A temporary table calculates formulas using less memory than standard tables.
  • A temporary table calculates formulas faster than standard tables.
  • A temporary table allows analysts to repeatedly work with the same subset of data. ✅

Explanation:
Temporary tables are often used when analysts need to work with a subset of data during a session without permanently storing it in the database.

  • Why not the others?

    • A temporary table allows functions that are unavailable to standard tables: This is incorrect because temporary tables do not grant special functions.
    • A temporary table calculates formulas using less memory than standard tables: Temporary tables don’t necessarily use less memory.
    • A temporary table calculates formulas faster than standard tables: Speed improvements are not the primary reason to use temporary tables.

33. Which of the following SQL queries adds a table into the database?

  • SELECT * FROM table GROUP BY columnA ORDER BY columnB;
  • CREATE TABLE my_table AS (SELECT * FROM other_table); ✅
  • SELECT * FROM table;
  • WITH my_table AS (SELECT * FROM other_table WHERE x = 0);

34. What is the purpose of using pivot tables?

  • To multiply two arrays and add the results
  • To allow quick copying from one table to another
  • To view data in multiple ways to find insights and trends ✅
  • To allow the use of SQL in spreadsheets

Explanation:
Pivot tables are used to summarize, analyze, explore, and present data in different ways, helping to uncover trends, patterns, and insights.

  • Why not the others?

    • To allow quick copying from one table to another: This is not the purpose of pivot tables.
    • To allow the use of SQL in spreadsheets: Pivot tables do not use SQL; they are based on the data already in the spreadsheet.
    • To multiply two arrays and add the results: This is a mathematical operation, not related to the purpose of pivot tables.

35. How many different columns have been added to the values section of the pivot table editor?

  • 3
  • 2 ✅
  • 6
  • 1

36. What SQL keyword is used to define a name for a calculated column?

  • SELECT
  • AS ✅
  • FROM
  • WITH

37. A data analyst uses the following formula to calculate a new row in a SQL query. What best describes the result of the formula?

(colA + colB) / colC = new_col

  • colB is added to colA then the result is multiplied by colC.
  • colB is subtracted from colA then the result is multiplied by colC.
  • colB is added to colA then the result is divided by colC. ✅
  • colB is divided by colC then the result is added to colA.

38. What is the process of checking and rechecking the quality of your data so that it is complete, accurate, secure, and consistent?

  • Data-driven development
  • Data visualization
  • Data augmentation
  • Data validation ✅

39. A data analyst finds some data that seems inconsistent. What is the first thing they should do?

  • Remove the inconsistent values.
  • Convert the inconsistent values to JSON.
  • Fill the odd values with filler values.
  • Determine if the inconsistent values are valid. ✅

Explanation:
The first step is to verify whether the inconsistent data points are valid or represent errors. This ensures the accuracy of the data before taking any further actions.

  • Why not the others?

    • Convert the inconsistent values to JSON: This is not relevant in this context.
    • Remove the inconsistent values: Before removing them, it’s important to check their validity.
    • Fill the odd values with filler values: This is not a good practice without proper validation of the data.

40. What is a reason to use a WITH AS clause in a SQL statement?

  • The result is temporary. ✅
  • The result is a pivot table.
  • The result calculates faster.
  • The result is a visualization.

41. Which of the following SQL statements can be used to create temporary tables in SQL?

  • WITH my_table FROM (SELECT * FROM other_table);
  • WITH my_table AS (SELECT * FROM other_table WHERE x = 0); ✅
  • CREATE TABLE my_table AS (SELECT * FROM other_table);
  • SELECT * FROM table;

42. A data analyst wants to calculate the number of rows that have a SKU value of “K102145”. Which function could they use?

  • =COUNTIF(G2:G30,K102145)
  • =COUNTIF(K102145=G2:G30)
  • =COUNTIF(G2:G30,“=K102145”)
  • =COUNTIF(G2:G30,“K102145”) ✅

Explanation:
The COUNTIF function counts the number of cells in a range that meet a specific condition. The correct formula should have the value “K102145” within the quotes, ensuring that the function is checking for exact matches.

  • Why not the others?

    • =COUNTIF(G2:G30,“=K102145”): This is incorrect because the COUNTIF function does not require an equals sign within the quotes.
    • =COUNTIF(K102145=G2:G30): This is incorrect syntax for the COUNTIF function.
    • =COUNTIF(G2:G30,K102145): This is incorrect because the value should be in quotes to be recognized as a string.

43. A data analyst wants to use a single function to multiply two ranges and then add the multiplied values. What single function can they use to accomplish this?

  • SUM
  • SUMPRODUCT ✅
  • SUMIF
  • SUMIFS

44. Which values of Date and Direction are used to calculate the value 450 in the following pivot table?

  • 2/3 and Down
  • 2/4 and Up
  • 2/5 and Down ✅
  • 2/4 and Down

Explanation:
The value 450 comes from the MAX of A for 2/5 and the Direction “Down” in the pivot table, which sums up to 450.

  • Why not the others?

    • The other combinations (such as 2/4 and Down, etc.) result in different sums (like 100 or 12), not 450.

45. When writing custom calculations in SQL, what characters can be used to group calculations to change the order of calculation?

  • Parentheses – ( ) ✅
  • Curly Braces – { }
  • Quotation Marks – “ “
  • Square Brackets – [ ]

46. A data analyst is trying to manually recalculate a column that was present in their dataset. They want to find rows where the values in their column do not match the values in the original column. Which of the following SQL clauses could they use?

  • WHERE original_column !! recalcualted_column
  • WHERE original_column NOT EQUALS recalcualted_column
  • WHERE original_column <> recalcualted_column ✅
  • WHERE original_column ~= recalcualted_column

47. When working with a new dataset, how can you ensure that your data is valid?

  • Personally collect all data that you use in your analysis.
  • Manually check the calculations of calculated columns. ✅
  • Convert all data to JavaScript Object Notation (JSON).
  • Fill in missing values with values that will favor your initial hypothesis.

Explanation:
One way to ensure that data is valid is by manually checking the calculations, especially for derived or calculated columns, to ensure they follow the correct logic.

  • Why not the others?

    • Personally collect all data: While important, this is not the only method for ensuring data validity.
    • Fill in missing values with values that will favor your initial hypothesis: This could introduce bias, and it’s not a recommended approach.
    • Convert all data to JavaScript Object Notation (JSON): This is not necessary to validate data.

48. Which of the following statements about temporary tables is correct?

  • They must be created using the WITH AS SQL clause.
  • They must be created using the WITH AS SQL clause. ✅
  • They are declared by enclosing a FROM statement between ##.
  • They are a special feature of BigQuery unavailable in other RDBMS.

49. A data analyst wants to calculate the number of rows that have a value less than 150. Which function could they use?

  • =COUNTIF(”<150”,G2:G30)
  • =SUMIF(“<150”,G2:G30)
  • =COUNTIF(G2:G30,”<150”) ✅
  • =SUMIF(G2:G30,“<150”)

50. What is the purpose of the EXTRACT function in SQL?

  • Calculate using data extracted from other tables
  • Return a specific key-value pair from a JSON object
  • Return a specific portion of a date ✅
  • Calculate the mathematical extract operation

51. Which portion of a pivot table do you change if you want to use a different calculation to combine the results?

  • Filter
  • Columns
  • Values ✅
  • Rows

52. Which of the following statements about temporary tables is correct?

  • They must be created using the WITH AS SQL clause.
  • They are automatically deleted when the SQL database session ends. ✅
  • They are declared by enclosing a FROM statement between ##.
  • They are a special feature of BigQuery unavailable in other RDBMS.

53. Which of the following SQL queries adds a table into the database? Select all that apply.

  • CREATE TABLE my_table AS (SELECT * FROM other_table); ✅
  • WITH my_table AS (SELECT * FROM other_table WHERE x = 0); ✅
  • SELECT * FROM table;
  • SELECT * FROM table GROUP BY columnA ORDER BY columnB;

54. Which of the following SQL statements correctly implements a WITH AS clause?

  • WITH my_table AS (SELECT * FROM other_table); ✅
  • WITH AS my_table AS (SELECT * FROM other_table);
  • WITH AS my_table AS SELECT * FROM other_table;
  • WITH my_table AS SELECT * FROM other_table;

Explanation:
The correct syntax for a WITH AS clause is WITH my_table AS (SELECT * FROM other_table);, where you define a temporary table (my_table) that holds the results of the query (SELECT * FROM other_table).

  • Why not the others?

    • The incorrect answers have either missing parts or incorrect placement of keywords.

Leave a Reply