Instructions

Upload a PDF file, named with your UC Davis email ID and homework number (e.g., xjw18_hw6.pdf), to Gradescope (accessible through Canvas). You will give the commands to answer each question in its own code block, which will also produce output that will be automatically embedded in the output file. When asked, answer must be supported by written statements as well as any code used.
All code used to produce your results must be shown in your PDF file (e.g., do not use echo = FALSE or include = FALSE as options anywhere). Rmd files do not need to be submitted, but may be requested by the TA and must be available when the assignment is submitted.
Students may choose to collaborate with each other on the homework, but must clearly indicate with whom they collaborated. Every student must upload their own submission.
Start to work on it as early as possible. Finishing this homework can help prepare midterm 1.
When you want to show your result as a vector that is too long, slice the first 10 objects. When you want to show your result as a data frame, use head() on it. Failure to do so may lead to point deduction.
Directly knit the Rmd file will give you an html file. Open that file in your browser and then you can print it into a PDF file.

# Load necessary libraries
library(tidyverse)

Problem 1: Book problems (40 points)

1.1 (10 points)

Consider the following research topic: The goal is to determine if there is a statistically significant increase in the average weight gain of anorexic patients for a new treatment ($\mu_N$) when compared to a standard treatment ($\mu_S$)

State the null hypothesis and the alternative hypothesis.
Interpret a Type I error in terms of the problem.
Interpret a Type II error in terms of the problem.
If we wanted to minimize the probability of a Type I error, what action should we take? (Hint: Think about $\alpha$)

1.2 (12 points)

Answer the following questions.

What is a null hypothesis? What is the alternative hypothesis?
What is the significance level?
What is the decision rule for rejecting or failing to reject the null hypothesis based on?
What is the p-value?

1.3 (10 points)

To evaluate the policy of routine vaccination of infants for whooping cough, adverse reactions were monitored in 340 infants who received their first injection of the vaccine. Reactions were noted in 68 of the infants.

Find the 95% confidence interval for the true probability of an adverse reaction to the vaccine.
Interpret the confidence interval from (a) in terms of the problem.
Does your interval suggest that under 25% of infants had an adverse reaction?
If we made many, many 95% confidence intervals, what percentage would we expect to cover the true proportion?

1.4 (8 points)

Jane has just begun her new job as on the sales force of a very competitive company. In a sample of 16 sales calls it was found that she closed the contract for an average value of 108 dollars with a standard deviation of 12 dollars. Test at $5\%$ significance that the population mean is at least 100 dollars against the null hypothesis that it is less than 100 dollars, assume the population of contract is normally distributed. Company policy requires that new members of the sales force must exceed an average of $100 per contract during the trial employment period. Can we conclude that Jane has met this requirement at the significance level of $95\%$ ?

Read the extra examples on CI.

Link

Problem 2: Gacha simulator (60 points)

Honkai Star Rail is a turn-based RPG game that is similar to old-school JRPGs. The game features a gacha system where players can obtain new characters and weapons. In this problem we will study the probability in Gacha system. We will use the current Butterfly on Swordtip banner. Information of this banner is provided below:

One 5★ Character and three 4★ Characters will be featured on any given Character Oriented Warp, The rate for pulling a 5★ Character from this banner is currently set at 0.6%, The rate for pulling a 4★ Character from this banner is currently set at 5.1%, Otherwise the result is 3★ light cone.

Once you pull a 5★ Character, you have 50% chance of it being the Rate-up one, If you lose the 50/50 then the second 5★ Character you pull will 100% be the Rate-up one, You are guaranteed to obtain a 5★ within 90 pulls (that’s the pity). So if you’re unlucky, it will take you at most 180 pulls to get the rate-up character. Once you pull a 4★ Character, you have 50% chance of it being the Rate-up one,

4★ Characters: Arlan, Asta, Dan Heng, Herta, Hook, March 7th, Natasha, Pela, Qingque, Sampo, Serval, Sushang,Tingyun

5★Characters: Bailu, Bronya, Clara, Gepard, Himeko, Seele, Welt, Yanqing

Rate-up: 5★ Seele 4★N atasha, Hook, Pela

Part 0: Gacha simulator

Here I provide the Gacha simulator in R for you.

starRail_pull = function(n_pulls = 1000, print = FALSE){
  # Gacha parameters
  five_star_rate <- 0.006
  four_star_rate <- 0.051
  rate_up_five_star <- 0.5
  rate_up_four_star <- 0.5
  
  # Define characters
  four_star_characters <- c("Arlan", "Asta", "Dan Heng", "Herta", "Hook", 
                            "March 7th", "Natasha", "Pela", "Qingque", "Sampo", 
                            "Serval", "Sushang", "Tingyun")
  five_star_characters <- c("Bailu", "Bronya", "Clara", "Gepard", "Himeko", "Seele", "Welt", "Yanqing")
  rate_up_five_star_char <- "Seele"
  rate_up_four_star_chars <- c("Natasha", "Hook", "Pela")
  
  # Initialize counters
  pity_counter_five_star <- 0 # counts how many pulls no 5 star characters
  pity_counter_four_star_plus <- 0 # counts how many pulls no 4+ star characters
  last_five_star_not_up <- FALSE # counts whether the last 5 star character is not the up character
  pull_results <- vector(length = n_pulls)
  star_results <- vector(length = n_pulls)
  
  for(pull in 1:n_pulls) {
    # generate a random number to determine the pull result
    rand <- runif(1)
    pity_counter_four_star_plus <- 1 + pity_counter_four_star_plus
    pity_counter_five_star <- 1 + pity_counter_five_star
    # condition for a 5 star character
    # either this pull achieves 5 star rate or you haven't got 5 star in last 89 pulls
    if (rand <= five_star_rate || pity_counter_five_star >= 90) {
      # Lucky, got 5 star, so reset the counter.
      pity_counter_five_star <- 0
      pity_counter_four_star_plus <- 0
      # condition for a 5 star up character
      # either this pull achieves 5 star up rate or you the last 5 star is not the up character
      if (rand <= five_star_rate * rate_up_five_star || last_five_star_not_up == TRUE) {
        char <- rate_up_five_star_char
        star <- 5
      } else {
        # randomly choose 1 non-up 5 star character
        char <- sample(five_star_characters[five_star_characters != rate_up_five_star_char], 1)
        star <- 5
        # This 5 star is not up character
        last_five_star_not_up <- TRUE
      }
    }
    # When it's not 5 star character, but it's 4 star character 
    # (five_star_rate < rand <= (five_star_rate + four_star_rate))
    else if (rand <= (five_star_rate + four_star_rate) || pity_counter_four_star_plus >= 10) {
      pity_counter_four_star_plus <- 0
      # condition for a 4 star up character
      if (rand <= (five_star_rate + four_star_rate * rate_up_four_star)) {
        char <- sample(rate_up_four_star_chars, 1)
        star <- 4
      } else {
        char <- sample(four_star_characters[!four_star_characters %in% rate_up_four_star_chars], 1)
        star <- 4
      }
    } else {
      char <- "3★ light cone"
      star <- 3
    }
    pull_results[pull] <- char
    star_results[pull] <- star
  }
  if(print == TRUE){
    print(table(pull_results))
  }
  return(data.frame(result = pull_results, star = star_results, index = 1:n_pulls))
}
# Save pull results to a CSV file
#write.csv(starRail_pull(80), "gacha_pull_results.csv", row.names = FALSE)

Here is the example to use this function simulate 80 draws.

df = starRail_pull(80)
head(df)

         result star index
1 3★ light cone    3     1
2 3★ light cone    3     2
3 3★ light cone    3     3
4 3★ light cone    3     4
5 3★ light cone    3     5
6 3★ light cone    3     6

print(table(df$star))


 3  4 
71  9

Optional: Try to understand how this function works.

Part 1.1: For a new account, simulate the draws required to obtain 1 Seele. (5 points)

Hint: The output of the starRail_pull contains 3 columns: result, star, index. From the banner information, in the most unlucky situation, you will get Seele at the draw 180. So df=starRail_pull(180) will at least have 1 Seele in the result column. For 1 simulation, you need to find the index for the first Seele. Some functions you can use: filter, pull, min.

Part 1.2: Now wrap the previous code to simulate the draws to obtain the first Seele as a function. This function does not need to take any input. (5 points)

sim_first_Seele_draws = function(){

}

Part 1.3: Now simulate n = 50, 500, 2000 results of the draws to obtain first Seele for a new account. Store the result into 3 vectors (don’t print it) (5 points)

set.seed(123)
n = 50
P13_50 = replicate(?, ?)
n = 500
P13_500 = replicate(?, ?)
n = 2000
P13_2000 = replicate(?, ?)

Hint: replicate(), how to use it? You may need to wait for some time while simulating 2000 results.

Hint2: set.seed(123) in the beginning of the code chunk. You can change the seed into another number. That will fix the sample result.

Part 1.4: Now construct the confidence interval of average draws to get first Seele. Be sure to answer these questions: (3+4+3+12+3 = 25 points)

What’s the shape of distribution of the samples? is it bell shape? You can use histogram on the n=2000 sample
Which confidence interval should we construct? List the assumptions for this question.
Do you think the confidence interval is a good method for this situation? Why or why not?
Write the function to calculate the $95\%$ confidence interval. Use that function to show the confidence intervals. You can refer to Extra example on CI slide 9, but you may need to change the t.score used in that example.
Interpret the confidence interval of the n=2000 sample.

Part 2: I got Seele at exactly 180 pulls. How unlucky I am? (10 points)

Construct a $95\%$ confidence interval of sample proportion of getting Seele at exactly 180 pulls. Use the n=2000 sample P13_2000. From the confidence interval result, are you really that unlucky?

Hint: sample1 = P13_2000==180 will make sample1 a bool vector. TRUE indicate getting Seele at exactly 180 pulls.

Hint2: Refer to Lecture 20 slide 32.

Part 3: I want to get full eidolons for Seele, how much should I pay? (10 points)

In the Gacha system, if you get a duplicate character, then the excess ones will be converted into eidolons and can be used to buff the original character. One character can be buffed by eidolons 6 times. So here we are asking about how many draws we need to expect to draw Seele 7 times. Use n=500 for simulation.

When you used up all the free resources, you can pay 99.99$ for approximately 60 draws. Based on the draws, how much will you expect to pay?

Hint: first write the function sim_7_Seele_draws to obtain the index of getting 7th Seele. Then generate n=500 samples, and then use the one_sample_mean_CI function on that sample vector.

Include the person you coop with:

Names:

Appendix

sessionInfo()

R version 4.1.1 (2021-08-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale:
[1] LC_COLLATE=Chinese (Simplified)_China.936 
[2] LC_CTYPE=Chinese (Simplified)_China.936   
[3] LC_MONETARY=Chinese (Simplified)_China.936
[4] LC_NUMERIC=C                              
[5] LC_TIME=Chinese (Simplified)_China.936    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] forcats_0.5.2   stringr_1.4.1   dplyr_1.0.10    purrr_0.3.5    
[5] readr_2.1.3     tidyr_1.2.1     tibble_3.1.8    ggplot2_3.3.6  
[9] tidyverse_1.3.2

loaded via a namespace (and not attached):
 [1] tidyselect_1.2.0    xfun_0.34           bslib_0.4.0        
 [4] haven_2.5.1         gargle_1.2.1        colorspace_2.0-3   
 [7] vctrs_0.5.0         generics_0.1.3      htmltools_0.5.3    
[10] yaml_2.3.6          utf8_1.2.2          rlang_1.0.6        
[13] jquerylib_0.1.4     pillar_1.9.0        withr_2.5.0        
[16] glue_1.6.2          DBI_1.1.3           dbplyr_2.2.1       
[19] readxl_1.4.1        modelr_0.1.9        lifecycle_1.0.3    
[22] munsell_0.5.0       gtable_0.3.1        cellranger_1.1.0   
[25] rvest_1.0.3         evaluate_0.17       knitr_1.40         
[28] tzdb_0.3.0          fastmap_1.1.1       fansi_1.0.3        
[31] broom_1.0.1         backports_1.4.1     scales_1.2.1       
[34] googlesheets4_1.0.1 cachem_1.0.6        jsonlite_1.8.4     
[37] fs_1.5.2            hms_1.1.2           digest_0.6.30      
[40] stringi_1.7.8       grid_4.1.1          cli_3.4.1          
[43] tools_4.1.1         magrittr_2.0.3      sass_0.4.2         
[46] crayon_1.5.2        pkgconfig_2.0.3     ellipsis_0.3.2     
[49] xml2_1.3.3          reprex_2.0.2        googledrive_2.0.0  
[52] lubridate_1.9.0     timechange_0.1.1    assertthat_0.2.1   
[55] rmarkdown_2.17      httr_1.4.4          rstudioapi_0.14    
[58] R6_2.5.1            compiler_4.1.1

STA 032 Homework 6

CHANGE YOUR NAME HERE

DUE: May 28 2023, 12PM