Upload a PDF file, named with your UC Davis email ID and homework number (e.g., xjw18_hw6.pdf
), to Gradescope (accessible through Canvas). You will give the commands to answer each question in its own code block, which will also produce output that will be automatically embedded in the output file. When asked, answer must be supported by written statements as well as any code used.
All code used to produce your results must be shown in your PDF file (e.g., do not use echo = FALSE
or include = FALSE
as options anywhere). Rmd
files do not need to be submitted, but may be requested by the TA and must be available when the assignment is submitted.
Students may choose to collaborate with each other on the homework, but must clearly indicate with whom they collaborated. Every student must upload their own submission.
Start to work on it as early as possible. Finishing this homework can help prepare midterm 1.
When you want to show your result as a vector that is too long, slice the first 10 objects. When you want to show your result as a data frame, use head()
on it. Failure to do so may lead to point deduction.
Directly knit the Rmd file will give you an html file. Open that file in your browser and then you can print it into a PDF file.
# Load necessary libraries
library(tidyverse)
Consider the following research topic: The goal is to determine if there is a statistically significant increase in the average weight gain of anorexic patients for a new treatment (\(\mu_N\)) when compared to a standard treatment (\(\mu_S\))
State the null hypothesis and the alternative hypothesis.
Interpret a Type I error in terms of the problem.
Interpret a Type II error in terms of the problem.
If we wanted to minimize the probability of a Type I error, what action should we take? (Hint: Think about \(\alpha\))
Answer the following questions.
What is a null hypothesis? What is the alternative hypothesis?
What is the significance level?
What is the decision rule for rejecting or failing to reject the null hypothesis based on?
What is the p-value?
To evaluate the policy of routine vaccination of infants for whooping cough, adverse reactions were monitored in 340 infants who received their first injection of the vaccine. Reactions were noted in 68 of the infants.
Find the 95% confidence interval for the true probability of an adverse reaction to the vaccine.
Interpret the confidence interval from (a) in terms of the problem.
Does your interval suggest that under 25% of infants had an adverse reaction?
If we made many, many 95% confidence intervals, what percentage would we expect to cover the true proportion?
Jane has just begun her new job as on the sales force of a very competitive company. In a sample of 16 sales calls it was found that she closed the contract for an average value of 108 dollars with a standard deviation of 12 dollars. Test at \(5\%\) significance that the population mean is at least 100 dollars against the null hypothesis that it is less than 100 dollars, assume the population of contract is normally distributed. Company policy requires that new members of the sales force must exceed an average of $100 per contract during the trial employment period. Can we conclude that Jane has met this requirement at the significance level of \(95\%\) ?
Honkai Star Rail is a turn-based RPG game that is similar to old-school JRPGs. The game features a gacha system where players can obtain new characters and weapons. In this problem we will study the probability in Gacha system. We will use the current Butterfly on Swordtip banner. Information of this banner is provided below:
One 5★ Character and three 4★ Characters will be featured on any given Character Oriented Warp, The rate for pulling a 5★ Character from this banner is currently set at 0.6%, The rate for pulling a 4★ Character from this banner is currently set at 5.1%, Otherwise the result is 3★ light cone.
Once you pull a 5★ Character, you have 50% chance of it being the Rate-up one, If you lose the 50/50 then the second 5★ Character you pull will 100% be the Rate-up one, You are guaranteed to obtain a 5★ within 90 pulls (that’s the pity). So if you’re unlucky, it will take you at most 180 pulls to get the rate-up character. Once you pull a 4★ Character, you have 50% chance of it being the Rate-up one,
4★ Characters: Arlan, Asta, Dan Heng, Herta, Hook, March 7th, Natasha, Pela, Qingque, Sampo, Serval, Sushang,Tingyun
5★Characters: Bailu, Bronya, Clara, Gepard, Himeko, Seele, Welt, Yanqing
Rate-up: 5★ Seele 4★N atasha, Hook, Pela
Here I provide the Gacha simulator in R for you.
starRail_pull = function(n_pulls = 1000, print = FALSE){
# Gacha parameters
five_star_rate <- 0.006
four_star_rate <- 0.051
rate_up_five_star <- 0.5
rate_up_four_star <- 0.5
# Define characters
four_star_characters <- c("Arlan", "Asta", "Dan Heng", "Herta", "Hook",
"March 7th", "Natasha", "Pela", "Qingque", "Sampo",
"Serval", "Sushang", "Tingyun")
five_star_characters <- c("Bailu", "Bronya", "Clara", "Gepard", "Himeko", "Seele", "Welt", "Yanqing")
rate_up_five_star_char <- "Seele"
rate_up_four_star_chars <- c("Natasha", "Hook", "Pela")
# Initialize counters
pity_counter_five_star <- 0 # counts how many pulls no 5 star characters
pity_counter_four_star_plus <- 0 # counts how many pulls no 4+ star characters
last_five_star_not_up <- FALSE # counts whether the last 5 star character is not the up character
pull_results <- vector(length = n_pulls)
star_results <- vector(length = n_pulls)
for(pull in 1:n_pulls) {
# generate a random number to determine the pull result
rand <- runif(1)
pity_counter_four_star_plus <- 1 + pity_counter_four_star_plus
pity_counter_five_star <- 1 + pity_counter_five_star
# condition for a 5 star character
# either this pull achieves 5 star rate or you haven't got 5 star in last 89 pulls
if (rand <= five_star_rate || pity_counter_five_star >= 90) {
# Lucky, got 5 star, so reset the counter.
pity_counter_five_star <- 0
pity_counter_four_star_plus <- 0
# condition for a 5 star up character
# either this pull achieves 5 star up rate or you the last 5 star is not the up character
if (rand <= five_star_rate * rate_up_five_star || last_five_star_not_up == TRUE) {
char <- rate_up_five_star_char
star <- 5
} else {
# randomly choose 1 non-up 5 star character
char <- sample(five_star_characters[five_star_characters != rate_up_five_star_char], 1)
star <- 5
# This 5 star is not up character
last_five_star_not_up <- TRUE
}
}
# When it's not 5 star character, but it's 4 star character
# (five_star_rate < rand <= (five_star_rate + four_star_rate))
else if (rand <= (five_star_rate + four_star_rate) || pity_counter_four_star_plus >= 10) {
pity_counter_four_star_plus <- 0
# condition for a 4 star up character
if (rand <= (five_star_rate + four_star_rate * rate_up_four_star)) {
char <- sample(rate_up_four_star_chars, 1)
star <- 4
} else {
char <- sample(four_star_characters[!four_star_characters %in% rate_up_four_star_chars], 1)
star <- 4
}
} else {
char <- "3★ light cone"
star <- 3
}
pull_results[pull] <- char
star_results[pull] <- star
}
if(print == TRUE){
print(table(pull_results))
}
return(data.frame(result = pull_results, star = star_results, index = 1:n_pulls))
}
# Save pull results to a CSV file
#write.csv(starRail_pull(80), "gacha_pull_results.csv", row.names = FALSE)
Here is the example to use this function simulate 80 draws.
df = starRail_pull(80)
head(df)
result star index
1 3★ light cone 3 1
2 3★ light cone 3 2
3 3★ light cone 3 3
4 3★ light cone 3 4
5 3★ light cone 3 5
6 3★ light cone 3 6
print(table(df$star))
3 4
71 9
Optional: Try to understand how this function works.
Hint: The output of the starRail_pull
contains 3 columns: result
, star
, index
. From the banner information, in the most unlucky situation, you will get Seele at the draw 180. So df=starRail_pull(180)
will at least have 1 Seele in the result column. For 1 simulation, you need to find the index for the first Seele. Some functions you can use: filter
, pull
, min
.
sim_first_Seele_draws = function(){
}
set.seed(123)
n = 50
P13_50 = replicate(?, ?)
n = 500
P13_500 = replicate(?, ?)
n = 2000
P13_2000 = replicate(?, ?)
Hint: replicate()
, how to use it? You may need to wait for some time while simulating 2000 results.
Hint2: set.seed(123)
in the beginning of the code chunk. You can change the seed into another number. That will fix the sample result.
What’s the shape of distribution of the samples? is it bell shape? You can use histogram on the n=2000 sample
Which confidence interval should we construct? List the assumptions for this question.
Do you think the confidence interval is a good method for this situation? Why or why not?
Write the function to calculate the \(95\%\) confidence interval. Use that function to show the confidence intervals. You can refer to Extra example on CI slide 9, but you may need to change the t.score
used in that example.
Interpret the confidence interval of the n=2000 sample.
Construct a \(95\%\) confidence interval of sample proportion of getting Seele at exactly 180 pulls. Use the n=2000 sample P13_2000
. From the confidence interval result, are you really that unlucky?
Hint: sample1 = P13_2000==180
will make sample1
a bool vector. TRUE indicate getting Seele at exactly 180 pulls.
Hint2: Refer to Lecture 20 slide 32.
In the Gacha system, if you get a duplicate character, then the excess ones will be converted into eidolons and can be used to buff the original character. One character can be buffed by eidolons 6 times. So here we are asking about how many draws we need to expect to draw Seele 7 times. Use n=500
for simulation.
When you used up all the free resources, you can pay 99.99$ for approximately 60 draws. Based on the draws, how much will you expect to pay?
Hint: first write the function sim_7_Seele_draws
to obtain the index of getting 7th Seele. Then generate n=500
samples, and then use the one_sample_mean_CI
function on that sample vector.
Names:
sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)
Matrix products: default
locale:
[1] LC_COLLATE=Chinese (Simplified)_China.936
[2] LC_CTYPE=Chinese (Simplified)_China.936
[3] LC_MONETARY=Chinese (Simplified)_China.936
[4] LC_NUMERIC=C
[5] LC_TIME=Chinese (Simplified)_China.936
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] forcats_0.5.2 stringr_1.4.1 dplyr_1.0.10 purrr_0.3.5
[5] readr_2.1.3 tidyr_1.2.1 tibble_3.1.8 ggplot2_3.3.6
[9] tidyverse_1.3.2
loaded via a namespace (and not attached):
[1] tidyselect_1.2.0 xfun_0.34 bslib_0.4.0
[4] haven_2.5.1 gargle_1.2.1 colorspace_2.0-3
[7] vctrs_0.5.0 generics_0.1.3 htmltools_0.5.3
[10] yaml_2.3.6 utf8_1.2.2 rlang_1.0.6
[13] jquerylib_0.1.4 pillar_1.9.0 withr_2.5.0
[16] glue_1.6.2 DBI_1.1.3 dbplyr_2.2.1
[19] readxl_1.4.1 modelr_0.1.9 lifecycle_1.0.3
[22] munsell_0.5.0 gtable_0.3.1 cellranger_1.1.0
[25] rvest_1.0.3 evaluate_0.17 knitr_1.40
[28] tzdb_0.3.0 fastmap_1.1.1 fansi_1.0.3
[31] broom_1.0.1 backports_1.4.1 scales_1.2.1
[34] googlesheets4_1.0.1 cachem_1.0.6 jsonlite_1.8.4
[37] fs_1.5.2 hms_1.1.2 digest_0.6.30
[40] stringi_1.7.8 grid_4.1.1 cli_3.4.1
[43] tools_4.1.1 magrittr_2.0.3 sass_0.4.2
[46] crayon_1.5.2 pkgconfig_2.0.3 ellipsis_0.3.2
[49] xml2_1.3.3 reprex_2.0.2 googledrive_2.0.0
[52] lubridate_1.9.0 timechange_0.1.1 assertthat_0.2.1
[55] rmarkdown_2.17 httr_1.4.4 rstudioapi_0.14
[58] R6_2.5.1 compiler_4.1.1