Week 3 – Working with data in R

data analysis with r programming weekly challenge 3

1. A data analyst is working with a dataset in R that has more than 50,000 observations. Why might they choose to use a tibble instead of the standard data frame? Select all that apply.

  • Tibbles can create row names
  • Tibbles automatically only preview the first 10 rows of data
  • Tibbles can automatically change the names of variables
  • Tibbles automatically only preview as many columns as fit on screen

2.A data analyst is exploring their data to get more familiar with it. They want a preview of just the first six rows to get a better idea of how the data frame is laid out. What function should they use?

  • print()
  • preview()
  • head()
  • colnames()

 

3. You are working with the ToothGrowth dataset. You want to use the head() function to get a preview of the dataset. Write the code chunk that will give you this preview.

What are the names of the columns in the ToothGrowth dataset?

  • VC, supp, dose
  • len, supp, dose
  • len, supp, VC
  • len, VC, dose

4. A data analyst is working with a data frame named sales. They write the following code:

sales %>%

The data frame contains a column named q1_sales. What code chunk does the analyst add to change the name of the column from q1_sales to quarter1_sales ?

  • rename(quarter1_sales = q1_sales)
  • rename(q1_sales <- “quarter1_sales”)
  • rename(quarter1_sales <- “q1_sales”)
  • rename(q1_sales = quarter1_sales)

5. A data analyst is working with the penguins data. They write the following code:

penguins %>%

The variable species includes three penguin species: Adelie, Chinstrap, and Gentoo. What code chunk does the analyst add to create a data frame that only includes the Gentoo species?

  • filter(species == “Gentoo”)
  • filter(species <- “Gentoo”)
  • filter(Gentoo == species)
  • filter(species == “Adelie”)

6. You are working with the penguins dataset. You want to use the summarize() and max() functions to find the maximum value for the variable flipper_length_mm. You write the following code:

penguins %>%

drop_na() %>%

group_by(species) %>%

Add the code chunk that lets you find the maximum value for the variable flipper_length_mm. drop_na() %>%

group_by(species) %>%

Add the code chunk that lets you find the minimum value for the variable bill_depth_mm.

What is the minimum bill depth in mm for the Chinstrap species?

What is the maximum flipper length in mm for the Gentoo species?

  • 200
  • 212
  • 210
  • 231

7. A data analyst is working with a data frame called salary_data. They want to create a new column named total_wages that adds together data in the standard_wages and overtime_wages columns. What code chunk lets the analyst create the total_wages column?

  • mutate(salary_data, standard_wages = total_wages + overtime_wages)
  • mutate(salary_data, total_wages = standard_wages + overtime_wages)
  • mutate(salary_data, total_wages = standard_wages * overtime_wages)
  • mutate(total_wages = standard_wages + overtime_wages)

8. A data analyst is working with a data frame named stores. It has separate columns for city (city) and state (state). The analyst wants to combine the two columns into a single column named location, with the city and state separated by a comma. What code chunk lets the analyst create the location column?

  • unite(stores, “location”, city, state, sep=”,”)
  • unite(stores, “location”, city, sep=”,”)
  • unite(stores, city, state, sep=”,”)
  • unite(stores, “location”, city, state)

9. A data analyst writes the following code chunk to return a statistical summary of their dataset: quartet %>% group_by(set) %>% summarize(mean(x), sd(x), mean(y), sd(y), cor(x, y))

Which function will return the average value of the y column?

  • mean(y)
  • mean(x)
  • cor(x, y)
  • sd(x)

10. A data analyst uses the bias() function to compare the actual outcome with the predicted outcome to determine if the model is biased. They get a score of 0.8. What does this mean?

  • Bias cannot be determined
  • The model is biased
  • Bias can be determined
  • The model is not biased

 

Shuffle Q/A 1

11. What is an advantage of using data frames instead of tibbles?

  • Data frames allow you to create row names
  • Data frames make printing easier
  • Data frames allow you to use column names
  • Data frames store never change variable names

12. A data analyst is examining a new dataset for the first time. They load the dataset into a data frame to learn more about it. What function(s) will allow them to review the names of all of the columns in the data frame? Select all that apply.

  • colnames()
  • head()
  • str()
  • library()

Devendra Kumar

Project Management Apprentice at Google

This Post Has 2 Comments

  1. A.KIRAN

    Good and thanks for your help

Leave a Reply