Title: | Multistage Sampling Allocation and Sample Selection |
---|---|
Description: | Multivariate optimal allocation for different domains in one and two stages stratified sample design. 'R2BEAT' extends the Neyman (1934) – Tschuprow (1923) allocation method to the case of several variables, adopting a generalization of the Bethel’s proposal (1989). 'R2BEAT' develops this methodology but, moreover, it allows to determine the sample allocation in the multivariate and multi-domains case of estimates for two-stage stratified samples. It also allows to perform both Primary Stage Units and Secondary Stage Units selection. This package requires the availability of 'ReGenesees', that can be installed from <https://github.com/DiegoZardetto/ReGenesees>. |
Authors: | Andrea Fasulo,Giulio Barcaroli,Ilaria Bombelli,Stefano Falorsi,Alessio Guandalini,Marco Dionisio Terribili |
Maintainer: | Andrea Fasulo <[email protected]> |
License: | EUPL |
Version: | 1.0.6 |
Built: | 2025-03-09 05:44:15 UTC |
Source: | https://github.com/barcaroli/r2beat |
Function to increase or decrease the precision constraints in order to obtain the desired sample size
adjust_CVs(target_size, strata, errors, adj_rate = 0.01)
adjust_CVs(target_size, strata, errors, adj_rate = 0.01)
target_size |
desired sample size. |
strata |
the 'strata' dataset. |
errors |
the 'errors' dataset containing the current precision constraints |
adj_rate |
the rate of adjustment (default=0.01): the smaller, the higher the precision in reaching the target size; the higher, the quicker is the adjustment |
the new 'errors' dataset containing the modified precision constraints
data(beat.example) errors a <- beat.1st(strata,errors) sum(a$alloc$ALLOC[-nrow(a$alloc)]) errors_new <- adjust_CVs(9000,strata,errors,adj_rate=0.005) errors_new
data(beat.example) errors a <- beat.1st(strata,errors) sum(a$alloc$ALLOC[-nrow(a$alloc)]) errors_new <- adjust_CVs(9000,strata,errors,adj_rate=0.005) errors_new
Function to aggregate the information from a set of strata
aggrStrata(strata, nvar, vett, censiti, dominio)
aggrStrata(strata, nvar, vett, censiti, dominio)
strata |
name of the dataframe containing the strata to be aggregated. |
nvar |
number of target variables Y |
vett |
vector of integers of the same length of the dimension of the 'strata' dataframe indicating how the strata must be aggregated. |
censiti |
flag indicating if the strata are take-all (=1) or not (=0) |
dominio |
variable in the strata indicating the domain level |
Be aware that this function is applicable only to strata of a same domain level
a dataframe containing the aggregated strata
data(beat.example) vett <- c(rep(1,5),rep(2,5),rep(3,7)) R2BEAT:::aggrStrata(strata,vett,nvar=2,dominio="DOM1",censiti=0)
data(beat.example) vett <- c(rep(1,5),rep(2,5),rep(3,7)) R2BEAT:::aggrStrata(strata,vett,nvar=2,dominio="DOM1",censiti=0)
Example data frame containing a given allocation.
data(beat.example)
data(beat.example)
The Strata data frame contains a row per each stratum with the following variables:
Stratum sample size (numeric)
Note: the names of the variables must be the ones indicated above.
# Load example data data(beat.example) allocation str(allocation)
# Load example data data(beat.example) allocation str(allocation)
The function returns a dataframe with planned and actual coefficients of variation (CV) in a multivariate multi-domain allocation problem.
beat.1cv(stratif, errors, alloc, minnumstrat = 2)
beat.1cv(stratif, errors, alloc, minnumstrat = 2)
stratif |
A dataframe with strata information. |
errors: |
A dataframe with planned CV. |
alloc: |
A vector with an allocation into strata. |
minnumstrat: |
Optional. A number indicating the lower bound for the number of allocated units in each stratum (default is 2). |
A dataframe with planned and actual CV for each combination of domain, domain category and auxiliary variable.
Compute multivariate optimal allocation for different domains in one stage stratified sample design
beat.1st(stratif, errors, minnumstrat=2, maxiter=200, maxiter1=25, epsilon=10^(-11))
beat.1st(stratif, errors, minnumstrat=2, maxiter=200, maxiter1=25, epsilon=10^(-11))
stratif |
Data frame of survey strata, for more details see, e.g.,strata. |
errors |
Data frame of expected coefficients of variation (CV) for each domain, for more details see, e.g.,errors. |
minnumstrat |
Minimum number of elementary units per strata (default=2). |
maxiter |
Maximum number of iterations (default=200) of the general procedure. This kind of iteration may be required by the fact that when in a stratum the number of allocated units is greater or equal to its population, that stratum is set as "census stratum", and the whole procedure is re-initialised. |
maxiter1 |
Maximum number of iterations in Chromy algorithm (default=25). |
epsilon |
Tollerance for the maximum absolute differences between the expected CV and the realised CV with the allocation obtained in the last iteraction for all domains. The default is 10^(-11). |
The methodology is a generalization of Bethel multivariate allocation (1989) that extended the Neyman (1959) - Tchuprov (1923) allocation for multi-purpose and multi-domains surveys. The generalized Bethel’s algorithm allows to determine the optimal sample size for each stratum in a stratified sample design. The overall sample size and the allocation among the different strata is determined starting from the accuracy constraints imposed in the survey on interest estimates.
Object of class list
. The list contains 4 objects:
n |
Vector with the optimal sample size for each stratum. |
file_strata |
Data frame corresponding to the input data.frame |
alloc |
Data frame with optimal ( |
sensitivity |
Data frame with a summary of expected coefficients of variation ( |
Developed by Stefano Falorsi, Andrea Fasulo, Alessio Guandalini, Daniela Pagliuca, Marco D. Terribili.
Bethel, J. (1989) Sample allocation in multivariate surveys. Survey methodology, 15.1: 47-57.
Cochran, W. (1977) Sampling Techniques. John Wiley & Sons, Inc., New York
Neyman, J. (1934). On the two different aspects of the representative method: the method of stratified sampling and the method of purposive selection. Journal of the Royal Statistical Society, 97(4): 558-625.
Tschuprow, A. A. (1923). On the mathematical expectation of the moments of frequency distributions in the case of correlated observation. (Chapters 4-6). Metron, 2: 646-683.
# Load example data data(beat.example) ## Example 1 # Allocate the sample allocation_1 <- beat.1st(stratif=strata, errors=errors) # The total sample size is sum(allocation_1$n) ## Example 2 # Assume 5700 units is the maximum sample size to stick to our budget. # Looking at allocation_1$sensitivity we can see that most of the # sensitivity is in DOM1 for REG1 and REG2 due to V1. allocation_1$sensitivity # We can relax the constraints increasing the expected coefficients of variation for X1 by 10% errors1 <- errors errors1[1,2] <- errors[1,2]+errors[1,2]*0.1 # Try the new allocation allocation_2 <- beat.1st(stratif=strata, errors=errors1) sum(allocation_2$n) ## Example 3 # On the contrary, if we tighten the constraints decreasing the expected coefficients of variation # for X1 by 10% errors2 <- errors errors2[1,2] <- errors[1,2]-errors[1,2]*0.1 # The new allocation leads to a larger sample than the first example allocation_3 <- beat.1st(stratif=strata, errors=errors2) sum(allocation_3$n)
# Load example data data(beat.example) ## Example 1 # Allocate the sample allocation_1 <- beat.1st(stratif=strata, errors=errors) # The total sample size is sum(allocation_1$n) ## Example 2 # Assume 5700 units is the maximum sample size to stick to our budget. # Looking at allocation_1$sensitivity we can see that most of the # sensitivity is in DOM1 for REG1 and REG2 due to V1. allocation_1$sensitivity # We can relax the constraints increasing the expected coefficients of variation for X1 by 10% errors1 <- errors errors1[1,2] <- errors[1,2]+errors[1,2]*0.1 # Try the new allocation allocation_2 <- beat.1st(stratif=strata, errors=errors1) sum(allocation_2$n) ## Example 3 # On the contrary, if we tighten the constraints decreasing the expected coefficients of variation # for X1 by 10% errors2 <- errors errors2[1,2] <- errors[1,2]-errors[1,2]*0.1 # The new allocation leads to a larger sample than the first example allocation_3 <- beat.1st(stratif=strata, errors=errors2) sum(allocation_3$n)
Compute multivariate optimal allocation for different domains corrected considering stratified two stages design
beat.2st(stratif, errors, des_file, psu_file, rho, deft_start = NULL, effst = NULL, epsilon1 = 5, mmdiff_deft = 1,maxi = 20, epsilon = 10^(-11), minPSUstrat = 2, minnumstrat = 2, maxiter = 200, maxiter1 = 25)
beat.2st(stratif, errors, des_file, psu_file, rho, deft_start = NULL, effst = NULL, epsilon1 = 5, mmdiff_deft = 1,maxi = 20, epsilon = 10^(-11), minPSUstrat = 2, minnumstrat = 2, maxiter = 200, maxiter1 = 25)
stratif |
Data frame of survey strata, for more details see, e.g.,strata. |
errors |
Data frame of coefficients of variation for each domain, for more details see, e.g.,errors. |
des_file |
Data frame containing information on sampling design variables, for more details see, e.g.,design. |
psu_file |
Data frame containing information on primary stage units stratification, for more details see, e.g.,PSU_strat. |
rho |
Data frame of survey strata, for more details see, e.g.,rho. |
deft_start |
Data frame of survey strata, for taking into account the initial design effect on each variable, for more details see, e.g.,deft_start. |
effst |
Data frame of survey strata, for taking into account the estimator effect on each variable, for more details see, e.g.,effst. |
epsilon1 |
First stop condition: sample sizes differences beetween two iterations; iteration continues until the maximum of sample sizes differences is greater than the default value. The default is 5. |
mmdiff_deft |
Second stop condition: defts differences beetween two iterations; iteration continues until the maximum of defts largest differences is greater than the default value. The default is 0.06. |
maxi |
Third stop condition: maximum number of allowed iterations. The default is 20. |
epsilon |
The same as in function beat.1st. |
minPSUstrat |
Minimum number of non-self-represenative PSUs to be selected in each stratum. |
minnumstrat |
The same as in function beat.1st. |
maxiter |
The same as in function beat.1st. |
maxiter1 |
The same as in function beat.1st. |
The methodology is a generalization of Bethel multivariate allocation (1989) that extended the Neyman (1959) - Tchuprov (1923) allocation for multi-purpose and multi-domains surveys. The generalized Bethel’s algorithm allows to determine the optimal sample size for each stratum in a stratified sample design. The overall sample size and the allocation among the different strata is determined starting from the accuracy constraints imposed in the survey on interest estimates. The optimal allocation is obtained throught a procedure that converge in few iteractions:
The first iteration is a computation of an initial allocation with the multivariate optimal allocation for different domains in one stages statified sample design (the methodology is a generalization for multidomains and multistages designs of Bethel multivariate allocation, 1989).
The correction of the initial allocation is based on an iterative method calculating new allocations and is based on an inflaction of strata variances using the design effect (Ganninger, 2010).
Object of class list
. The list contains 8 objects:
iteractions |
Data frame that for each iteraction provides a summary with the number of Primary Stage Units ( |
file_strata |
Input data frame in |
alloc |
Data frame with optimal ( |
planned |
Data frame with a summary of expected coefficients of variation for each variable in each domain. |
expected |
Data frame with a summary of realized coefficients of variation with the given optimal allocation for each variable in each domain. |
sensitivity |
Data frame with a summary of the sensitivity at 10% for each domain and each variable. Sensitivity can be a useful tool to help in finding the best allocation, because it provides a hint of the expected sample size variation for a 10% change in planned CVs. |
deft_c |
Data frame with the design effect for each variable in each domain in each iteraction. Note that |
param_alloc |
A vector with a resume of all the parameter given for the allocation. |
Developed by Stefano Falorsi, Andrea Fasulo, Alessio Guandalini, Daniela Pagliuca, Marco D. Terribili.
Cochran, W. (1977) Sampling Techniques. John Wiley & Sons, Inc., New York
Ganninger, M. (2010). Design effects: model-based versus design-based approach. Vol. 3, p. 174. DEU.
Neyman, J. (1934). On the two different aspects of the representative method: the method of stratified sampling and the method of purposive selection. Journal of the Royal Statistical Society, 97(4), 558-625.
Tschuprow, A. A. (1923). On the mathematical expectation of the moments of frequency distributions in the case of correlated observation. (Chapters 4-6). Metron, 1923, 2: 646-683.
## Not run: # Load example data data(beat.example) ## Example 1 # Allocate the sample allocation2st_1 <- beat.2st(stratif=strata, errors=errors, des_file=design, psu_file=PSU_strat,rho=rho) # The total ammount of sample size is 191 PSU (36 SR + 155 NSR) and 15147 SSU. ## Example 2 # Assume 13000 SSUs is the maximum sample size to stick to our budget. # Look at the sensitivity is in DOM1 for REG1 and REG2 due to V1. allocation2st_1$sensitivity # We can relax the constraints increasing the expected coefficients of variation for X1 by 10 errors1 <- errors errors1[1,2] <- errors[1,2]+errors[1,2]*0.1 # Try the new allocation allocation2st_2 <- beat.2st(stratif=strata, errors=errors1, des_file=design, psu_file=PSU_strat,rho=rho) ## Example 3 # On the contrary, if we tighten the constraints decreasing the expected coefficients of variation # for X1 by 10 errors2 <- errors errors2[1,2] <- errors[1,2]-errors[1,2]*0.1 # The new allocation leads to a larger sample than the first example (around 18000) allocation2st_3 <- beat.2st(stratif=strata, errors=errors2, des_file=design, psu_file=PSU_strat,rho=rho) ## Example 4 # Sometimes some budget constraints concern the number of PSU involved in the survey. # Tuning the PSUs number is possible modyfing the MINIMUM in des_file. # Assume to increase the MINIMUM from 48 to 60 design1 <- design design1[,4] <- 60 allocation2st_4 <- beat.2st(stratif=strata, errors=errors2, des_file=design1, psu_file=PSU_strat, rho=rho) # The PSUs number is decreased, while the SSUs number increased # due to cluster intra-correlation effect. # Under the same expected errors, to offset a slight reduction of PSUs (from 221 to 207) # an increase of SSUs involved is observed. allocation2st_3$expected allocation2st_4$expected ## Example 5 # On the contrary, assume to decrease the MINIMUM from 48 to 24. # The SSUs number strongly decrease in the face of an increase of PSUs, # always under the same expected errors. design2 <- design design2[,4] <- 24 allocation2st_5 <- beat.2st(stratif=strata, errors=errors2, des_file=design2, psu_file=PSU_strat, rho=rho) allocation2st_4$expected allocation2st_5$expected ## Example 6 # Assume that the SSUs are in turn clusters, for instance households composed by individuals. # In the previous examples we always derived optimal allocations # for sample of SSUs (i.e. households, because # DELTA = 1). design design1 design2 # For obtaining a sample in terms of the elements composing SSUs # (i.e., individuals) is just sufficient to # modify the DELTA in des_file. design3 <- design design3$DELTA <- 2.31 # DELTA_IND=2.31, the average size of household in Italy. allocation2st_6 <- beat.2st(stratif=strata, errors=errors, des_file=design3, psu_file=PSU_strat, rho=rho) ## Example 7 # Complete workflow library(readr) pop <- read_rds("https://github.com/barcaroli/R2BEAT_workflows/blob/master/pop.RDS?raw=true") library(R2BEAT) cv <- as.data.frame(list(DOM=c("DOM1","DOM2"), CV1=c(0.02,0.03), CV2=c(0.03,0.06), CV3=c(0.03,0.06), CV4=c(0.05,0.08))) cv samp_frame <- pop samp_frame$one <- 1 id_PSU <- "municipality" id_SSU <- "id_ind" strata_var <- "stratum" target_vars <- c("income_hh","active","inactive","unemployed") deff_var <- "stratum" domain_var <- "region" delta = 1 # households = survey units minimum <- 50 # minimum number of SSUs to be interviewed in each selected PSU deff_sugg <- 1.5 # suggestion for the deff value inp <- prepareInputToAllocation1(samp_frame, id_PSU, id_SSU, strata_var, target_vars, deff_var, domain_var, minimum, delta, deff_sugg) inp$desfile$MINIMUM <- 50 alloc <- beat.2st(stratif = inp$strata, errors = cv, des_file = inp$des_file, psu_file = inp$psu_file, rho = inp$rho, deft_start = NULL, effst = inp$effst, minPSUstrat = 2, minnumstrat = 50 ) ## End(Not run)
## Not run: # Load example data data(beat.example) ## Example 1 # Allocate the sample allocation2st_1 <- beat.2st(stratif=strata, errors=errors, des_file=design, psu_file=PSU_strat,rho=rho) # The total ammount of sample size is 191 PSU (36 SR + 155 NSR) and 15147 SSU. ## Example 2 # Assume 13000 SSUs is the maximum sample size to stick to our budget. # Look at the sensitivity is in DOM1 for REG1 and REG2 due to V1. allocation2st_1$sensitivity # We can relax the constraints increasing the expected coefficients of variation for X1 by 10 errors1 <- errors errors1[1,2] <- errors[1,2]+errors[1,2]*0.1 # Try the new allocation allocation2st_2 <- beat.2st(stratif=strata, errors=errors1, des_file=design, psu_file=PSU_strat,rho=rho) ## Example 3 # On the contrary, if we tighten the constraints decreasing the expected coefficients of variation # for X1 by 10 errors2 <- errors errors2[1,2] <- errors[1,2]-errors[1,2]*0.1 # The new allocation leads to a larger sample than the first example (around 18000) allocation2st_3 <- beat.2st(stratif=strata, errors=errors2, des_file=design, psu_file=PSU_strat,rho=rho) ## Example 4 # Sometimes some budget constraints concern the number of PSU involved in the survey. # Tuning the PSUs number is possible modyfing the MINIMUM in des_file. # Assume to increase the MINIMUM from 48 to 60 design1 <- design design1[,4] <- 60 allocation2st_4 <- beat.2st(stratif=strata, errors=errors2, des_file=design1, psu_file=PSU_strat, rho=rho) # The PSUs number is decreased, while the SSUs number increased # due to cluster intra-correlation effect. # Under the same expected errors, to offset a slight reduction of PSUs (from 221 to 207) # an increase of SSUs involved is observed. allocation2st_3$expected allocation2st_4$expected ## Example 5 # On the contrary, assume to decrease the MINIMUM from 48 to 24. # The SSUs number strongly decrease in the face of an increase of PSUs, # always under the same expected errors. design2 <- design design2[,4] <- 24 allocation2st_5 <- beat.2st(stratif=strata, errors=errors2, des_file=design2, psu_file=PSU_strat, rho=rho) allocation2st_4$expected allocation2st_5$expected ## Example 6 # Assume that the SSUs are in turn clusters, for instance households composed by individuals. # In the previous examples we always derived optimal allocations # for sample of SSUs (i.e. households, because # DELTA = 1). design design1 design2 # For obtaining a sample in terms of the elements composing SSUs # (i.e., individuals) is just sufficient to # modify the DELTA in des_file. design3 <- design design3$DELTA <- 2.31 # DELTA_IND=2.31, the average size of household in Italy. allocation2st_6 <- beat.2st(stratif=strata, errors=errors, des_file=design3, psu_file=PSU_strat, rho=rho) ## Example 7 # Complete workflow library(readr) pop <- read_rds("https://github.com/barcaroli/R2BEAT_workflows/blob/master/pop.RDS?raw=true") library(R2BEAT) cv <- as.data.frame(list(DOM=c("DOM1","DOM2"), CV1=c(0.02,0.03), CV2=c(0.03,0.06), CV3=c(0.03,0.06), CV4=c(0.05,0.08))) cv samp_frame <- pop samp_frame$one <- 1 id_PSU <- "municipality" id_SSU <- "id_ind" strata_var <- "stratum" target_vars <- c("income_hh","active","inactive","unemployed") deff_var <- "stratum" domain_var <- "region" delta = 1 # households = survey units minimum <- 50 # minimum number of SSUs to be interviewed in each selected PSU deff_sugg <- 1.5 # suggestion for the deff value inp <- prepareInputToAllocation1(samp_frame, id_PSU, id_SSU, strata_var, target_vars, deff_var, domain_var, minimum, delta, deff_sugg) inp$desfile$MINIMUM <- 50 alloc <- beat.2st(stratif = inp$strata, errors = cv, des_file = inp$des_file, psu_file = inp$psu_file, rho = inp$rho, deft_start = NULL, effst = inp$effst, minPSUstrat = 2, minnumstrat = 50 ) ## End(Not run)
Compute the coefficients of variation considering a given multivariate optimal allocation.
beat.cv(n_file, stratif, errors, des_file, psu_file, rho, epsilon)
beat.cv(n_file, stratif, errors, des_file, psu_file, rho, epsilon)
n_file |
Data frame containing the sample size allocated in each stratum, for more details, e.g.,allocation. |
stratif |
Data frame of survey strata, for more details see, e.g.,strata. |
errors |
Data frame of expected coefficients of variation (CV) for each domain, for more details see, e.g.,errors. |
des_file |
Data frame containing information on sampling design variables, for more details see, e.g.,design. |
psu_file |
Data frame containing information on primary stage units stratification, for more details see, e.g.,PSU_strat. |
rho |
Data frame of survey strata, for more details see, e.g.,rho. |
epsilon |
The same as in function beat.1st. |
This function enables to derive the expected coefficient of variation (CV) from a given allocation.
The function beat.cv
returns the estimates expected accuracy in terms of coefficient of variation, for several variables in different domains, given a certain allocation among the different strata.
Object of class list
. The list contains a set of data.frame
, as many of the cross product between domain and interest variables, containing total estimates, population, variance and expected coefficient of variation for every domain modality.
For each domain and each variable is defined
Tot1 |
Total estimate |
N |
Measure of size |
Varfin |
The sample variance of the total estimate |
CV |
The coefficient of variation of the |
Developed by Stefano Falorsi, Andrea Fasulo, Alessio Guandalini, Daniela Pagliuca, Marco D. Terribili.
## Not run: # Load example data data(beat.example) ## Example 1 # Calculate coefficients of variation, for two variables in two domains, # given an allocation among the different strata. allocation cv1<-beat.cv( n_file=allocation, stratif=strata, errors=errors, des_file=design, psu_file=PSU_strat, rho=rho) ## Example 2 # Take the example 1 in beat.2st. allocation2st_1 <- beat.2st(stratif=strata, errors=errors, des_file=design, psu_file=PSU_strat,rho=rho) # The allocation obtained is allocation2st_1$alloc # with these precision constraints errors # and these expected coefficient of variation allocation2st_1$expected # Now, fit the output of beat.2st to allocation, that is SIZE <- allocation2st_1$alloc[-18,c(2)] allocation1 <- data.frame(SIZE) # If apply beat.cv the same error in allocation2st_1$expected should be obtained. # In fact cv2<-beat.cv( n_file=allocation1, stratif=strata, errors=errors, des_file=design, psu_file=PSU_strat, rho=rho) cv2 # Please, note that some very slightly differences may occur. ## End(Not run)
## Not run: # Load example data data(beat.example) ## Example 1 # Calculate coefficients of variation, for two variables in two domains, # given an allocation among the different strata. allocation cv1<-beat.cv( n_file=allocation, stratif=strata, errors=errors, des_file=design, psu_file=PSU_strat, rho=rho) ## Example 2 # Take the example 1 in beat.2st. allocation2st_1 <- beat.2st(stratif=strata, errors=errors, des_file=design, psu_file=PSU_strat,rho=rho) # The allocation obtained is allocation2st_1$alloc # with these precision constraints errors # and these expected coefficient of variation allocation2st_1$expected # Now, fit the output of beat.2st to allocation, that is SIZE <- allocation2st_1$alloc[-18,c(2)] allocation1 <- data.frame(SIZE) # If apply beat.cv the same error in allocation2st_1$expected should be obtained. # In fact cv2<-beat.cv( n_file=allocation1, stratif=strata, errors=errors, des_file=design, psu_file=PSU_strat, rho=rho) cv2 # Please, note that some very slightly differences may occur. ## End(Not run)
Function to derive dummy variables from the target variables: it is useful when it is necessary to differentiate the precision constraints in the different domains of interest
build_dummy_variables(frame, domain_var, initial_target_vars, cv)
build_dummy_variables(frame, domain_var, initial_target_vars, cv)
frame |
the sampling frame. |
domain_var |
the indication of the variable that indicates the domains of interest. |
initial_target_vars |
a vector containing the names of the current target variables. |
cv |
the set of the current precision constraints |
a list containing: (a) the new frame with the values of the dummy variables; (b) the new target variables; (c) the new set of precision constraints
Checks the coherence between the population in the strata dataset and the population calculated by the PSUs dataset
check_input(strata,des,strata_var_strata,strata_var_des)
check_input(strata,des,strata_var_strata,strata_var_des)
strata |
strata dataset |
des |
design dataset |
strata_var_strata |
variable identifying stratum in strata dataset |
strata_var_des |
variable identifying stratum in design dataset |
Giulio Barcaroli
## Not run: library(R2BEAT) load("R2BEAT_ReGenesees.RData") # ReGenesees design and calibration objects plus PSU data RGdes <- des # ReGenesees design object RGcal <- cal # ReGenesees calibrated object strata_vars <- c("stratum") # variables of stratification target_vars <- c("income_hh", "active", "inactive", "unemployed") # target variables deff_vars <- "stratum" # stratification variables for calculating deff and effst # (n.b: must coincide or be a subset of variables of stratification) id_PSU <- c("municipality") # identification variable of PSUs id_SSU <- c("id_hh") # identification variable of SSUs domain_vars <- c("region") # domain variables inp1 <- input_to_beat.2st_1(RGdes, RGcal, id_PSU, id_SSU, strata_vars, target_vars, deff_vars, domain_vars) head(inp1$strata) head(psu) psu_id="municipality" # Identifier of the PSU stratum_var="stratum" # Identifier of the stratum mos_var="ind" # Variable to be used as 'measure of size' delta=1 # Average number of SSUs for each selection unit minimum <- 50 # Minimum number of SSUs to be selected in each PSU inp2 <- input_to_beat.2st_2(psu, psu_id, stratum_var, mos_var, delta, minimum) head(inp2$psu_file) head(inp2$des_file) newstrata <- check_input(strata=inp1$strata, des=inp2$des_file, strata_var_strata="STRATUM", strata_var_des="STRATUM") ## End(Not run)
## Not run: library(R2BEAT) load("R2BEAT_ReGenesees.RData") # ReGenesees design and calibration objects plus PSU data RGdes <- des # ReGenesees design object RGcal <- cal # ReGenesees calibrated object strata_vars <- c("stratum") # variables of stratification target_vars <- c("income_hh", "active", "inactive", "unemployed") # target variables deff_vars <- "stratum" # stratification variables for calculating deff and effst # (n.b: must coincide or be a subset of variables of stratification) id_PSU <- c("municipality") # identification variable of PSUs id_SSU <- c("id_hh") # identification variable of SSUs domain_vars <- c("region") # domain variables inp1 <- input_to_beat.2st_1(RGdes, RGcal, id_PSU, id_SSU, strata_vars, target_vars, deff_vars, domain_vars) head(inp1$strata) head(psu) psu_id="municipality" # Identifier of the PSU stratum_var="stratum" # Identifier of the stratum mos_var="ind" # Variable to be used as 'measure of size' delta=1 # Average number of SSUs for each selection unit minimum <- 50 # Minimum number of SSUs to be selected in each PSU inp2 <- input_to_beat.2st_2(psu, psu_id, stratum_var, mos_var, delta, minimum) head(inp2$psu_file) head(inp2$des_file) newstrata <- check_input(strata=inp1$strata, des=inp2$des_file, strata_var_strata="STRATUM", strata_var_des="STRATUM") ## End(Not run)
Function to propose a convenient set of precision constraints, given the different variability of target variables Ys in the strata
CVs_hint(strata, cv)
CVs_hint(strata, cv)
strata |
the 'strata' dataset. |
errors |
the 'errors' dataset containing the current precision constraints. |
The function requires in input the 'strata' dataset, plus an initial set of precision constraints. It is suggested to define a 'neutral' set, i.e. equal CVs for all variable, possibly differentiated only with regard to the different domain levels
the new 'errors' dataset containing the changed precision constraints
data(beat.example) errors[1,c(2:3)] <- c(0.03,0.03) errors[2,c(2:3)] <- c(0.03,0.03) errors cv <- CVs_hint(strata,errors) cv
data(beat.example) errors[1,c(2:3)] <- c(0.03,0.03) errors[2,c(2:3)] <- c(0.03,0.03) errors cv <- CVs_hint(strata,errors) cv
)
Example data frame containing the starting values for the Design Effect () .
data(beat.example)
data(beat.example)
The Design Effect data frame contains a row per each stratum with the following variables:
Identifier of the stratum (numeric).
Starting values for the Design Effect in the stratum of the first variable (numeric).
Starting values for the Design Effect in the stratum of the j-th variable (numeric).
Starting values for the Design Effect in the stratum of the last variable (numeric).
Note: the names of the variables must be the ones indicated above.
This is an optional input.
The function beat.2st
independently computes and updates the design effect.
However, it is possible to set the starting values of design effect for each variable in each stratum.
The design effect is the square root of the ratio of the actual sampling variance to the variance expected with the simple random sampling (SRS), on equal sample size.
Under SRS the desing effect is equal to 1.
Usually, as increasing the stages of selection the design effect increases because it takes into account the "clusterization" of sampling units and the sample size in Self Representative (SR) and Non Self Representative (NSR) strata.
In practice, higher is the intraclass correlation, higher will be the design effect and much more sample size for satisfying the precision constraints is needed with respect to SRS.
## Not run: # Load example data data(beat.example) deft_start str(deft_start) ## End(Not run)
## Not run: # Load example data data(beat.example) deft_start str(deft_start) ## End(Not run)
Example data frame containing variables for describing the sampling design.
data(beat.example)
data(beat.example)
The design data frame contains a row per each stratum with the following variables:
Identifier of the stratum (numeric)
Measure of size of the stratum (numeric)
The average size of Secondary Stage Units (SSU) in the strata. With respect to the sample on which we are interested in, it could be equal or greater than 1 (numeric). See details for a depth explanation.
the minimum number of SSU to be selected in each PSU. It could be different in each stratum (numeric)
Note: the names of the variables must be the ones indicated above.
The sample design can be defined through a measure of size of the stratum, the average size of each SSU (>=1) and the minimum number of SSU to be selected in each PSU. In particular, if SSU are not cluster DELTA=1 and the sample size determined will be given in term of SSU. Instead, when SSUs are, in turn, clusters (for instance, households composed by individuals), defining DELTA equal to the average size of SSUs, enables to derive a sample in term of individuals.
Furthermore, modifying the MINIMUM it is possible to tune the number of PSU in the sample (see the example in beat.1st). In fact, considering the same sample size, increasing the MINIMUM, less PSU will be involved in the sample, but worst estimates in term of expected coefficient of variations will be provided. On the contrary, decreasing the MINIMUM, more PSU will be involved in the sample and better estimates will be obtained. Instead, increasing the MINIMUM for obtaining the same expected errors, requests less PSU, but much more SSU. The contrary occurs decreasing the MINIMUM.
## Not run: # Load example data data(beat.example) design str(design) ## End(Not run)
## Not run: # Load example data data(beat.example) design str(design) ## End(Not run)
Example data frame containing estimator effect, (), in each stratum for each variable.
data(beat.example)
data(beat.example)
The estimator effect data frame contains a row per each stratum with the following variables:
Identifier of the stratum (numeric).
Estimator effect in the stratum of the first variable (numeric).
Estimator effect in the stratum of the j-th variable (numeric).
Estimator effect in the stratum of the last variable (numeric).
Note: the names of the variables must be the ones indicated above.
The estimator effect, (), provides a measure of the variance inflaction or reduction due to the use of a different estimator from the HT (Horvitz and Thompson, 1952).
It is equal to the ratio between the sampling variance of the estimator planned to be used and the sampling variance of the HT.
Then, when the HT is used, () is equal to 1.
However, always more often, different estimators, such as calibration estimator (Deville and Särndal, 1992) or generalized regression estimator GREG (Fuller, 2002 and references therein), are used.
Usually this kind of estimators take into account auxiliary variables that enables to increase the accuracy of the estimates, that is, they reduce their errors (CV). Then, their
is usually lower than 1.
Therefore, taking into account the estimator effect when planning the survey can help in saving sample size or at least to more properly evatuate the allocation.
Deville, J.C., Särndal, C.E. (1992). Calibration estimators in survey sampling. Journal of the American statistical Association, 87(418): 376-38.
Fuller, W.A.. (2002). Regression estimation for survey samples. Survey Methodology 28(1): 5-23.
Horvitz, D.G., Thompson, D.J. (1952) A generalization of sampling without replacement from a finite universe. Journal of the American statistical Association, 47(260): 663-685.
## Not run: # Load example data data(beat.example) effst str(effst) ## End(Not run)
## Not run: # Load example data data(beat.example) effst str(effst) ## End(Not run)
Example data frame containing precision levels (expressed in terms of acceptable CV's).
data(beat.example)
data(beat.example)
The constraint data frame (errors) contains a row per each type of domain with the following variables:
Type of domain code (factor).
Planned coefficient of variation for first variable (numeric).
Planned coefficient of variation for j-th variable (numeric).
Planned coefficient of variation for last variable (numeric).
Note: the names of the variables must be the ones indicated above.
The coefficient of variation (CV) is a standardized measure of variance. It is often expressed as a percentage and is defined as the ratio between the standard deviation of the estimate and the estimate (or its absolute value).
## Not run: # Load example data data(beat.example) errors str(errors) ## End(Not run)
## Not run: # Load example data data(beat.example) errors str(errors) ## End(Not run)
The user can indicate the number of samples that must be selected by the optimized frame. First, the true values of the parameters are calculated from the frame. Then, for each sample the sampling estimates are calculated, together with the differences between them and the true values of the parameters. At the end, an estimate of the CV is produced for each target variable, in order to compare them with the precision constraints set at the beginning of the optimization process. If the flag 'writeFiles' is set to TRUE, boxplots of distribution of the CV's in the different domains are produced for each Y variable ('cv.pdf'), together with boxplot of the distributions of differences between estimates and values of the parameters in the population ('differences.pdf').
eval_2stage(df, PSU_code, SSU_code, domain_var, target_vars, PSU_sampled, nsampl = 100, writeFiles = TRUE)
eval_2stage(df, PSU_code, SSU_code, domain_var, target_vars, PSU_sampled, nsampl = 100, writeFiles = TRUE)
df |
The sampling frame. |
PSU_code |
In the sampling frame, the identifier of the PSU. |
SSU_code |
In the sampling frame, the identifier of the SSU. |
domain_var |
In the sampling frame, the identifier of the domain of interest for the estimates. |
target_vars |
In the sampling frame, the variables used to produce the target estimates. |
PSU_sampled |
The set of selected PSUs. |
nsampl |
The number of samples to be drawn from the frame. |
writeFiles |
A flag to write in the work directory the outputs of the function. Default is TRUE. |
A list containing (i) the CV distribution in the domains, (ii) the bias distribution in the domains, (iii) the dataframe containing the sampling estimates by domain
Giulio Barcaroli
## Not run: library(readr) pop <- read_rds("https://github.com/barcaroli/R2BEAT_workflows/blob/master/pop.RDS?raw=true") library(R2BEAT) cv <- as.data.frame(list(DOM=c("DOM1","DOM2"), CV1=c(0.02,0.03), CV2=c(0.03,0.05), CV3=c(0.03,0.05), CV4=c(0.05,0.08))) samp_frame <- pop id_PSU <- "municipality" id_SSU <- "id_ind" strata_var <- "stratum" target_vars <- c("income_hh","active","inactive","unemployed") # more than one deff_var <- "stratum" domain_var <- "region" delta = 1 # households = survey units minimum <- 50 # minimum number of SSUs to be interviewed in each selected PSU deff_sugg <- 1.5 # suggestion for the deff value inp <- prepareInputToAllocation1(samp_frame, id_PSU, id_SSU, strata_var, target_vars, deff_var, domain_var, minimum, delta, deff_sugg) inp$desfile$MINIMUM <- 50 alloc <- beat.2st(stratif = inp$strata, errors = cv, des_file = inp$des_file, psu_file = inp$psu_file, rho = inp$rho, deft_start = NULL, effst = inp$effst, minPSUstrat = 2, minnumstrat = 50 ) sample_1st <- select_PSU(alloc, type="ALLOC", pps=TRUE) df=pop df$one <- 1 PSU_code="municipality" SSU_code="id_ind" target_vars <- c("income_hh", "active", "inactive", "unemployed") PSU_sampled <- sample_1st$sample_PSU eval <- eval_2stage(df, PSU_code, SSU_code, domain_var, target_vars, PSU_sampled, nsampl=10, writeFiles=TRUE) eval$coeff_var eval$rel_bias ## End(Not run)
## Not run: library(readr) pop <- read_rds("https://github.com/barcaroli/R2BEAT_workflows/blob/master/pop.RDS?raw=true") library(R2BEAT) cv <- as.data.frame(list(DOM=c("DOM1","DOM2"), CV1=c(0.02,0.03), CV2=c(0.03,0.05), CV3=c(0.03,0.05), CV4=c(0.05,0.08))) samp_frame <- pop id_PSU <- "municipality" id_SSU <- "id_ind" strata_var <- "stratum" target_vars <- c("income_hh","active","inactive","unemployed") # more than one deff_var <- "stratum" domain_var <- "region" delta = 1 # households = survey units minimum <- 50 # minimum number of SSUs to be interviewed in each selected PSU deff_sugg <- 1.5 # suggestion for the deff value inp <- prepareInputToAllocation1(samp_frame, id_PSU, id_SSU, strata_var, target_vars, deff_var, domain_var, minimum, delta, deff_sugg) inp$desfile$MINIMUM <- 50 alloc <- beat.2st(stratif = inp$strata, errors = cv, des_file = inp$des_file, psu_file = inp$psu_file, rho = inp$rho, deft_start = NULL, effst = inp$effst, minPSUstrat = 2, minnumstrat = 50 ) sample_1st <- select_PSU(alloc, type="ALLOC", pps=TRUE) df=pop df$one <- 1 PSU_code="municipality" SSU_code="id_ind" target_vars <- c("income_hh", "active", "inactive", "unemployed") PSU_sampled <- sample_1st$sample_PSU eval <- eval_2stage(df, PSU_code, SSU_code, domain_var, target_vars, PSU_sampled, nsampl=10, writeFiles=TRUE) eval$coeff_var eval$rel_bias ## End(Not run)
Function to report the expected coefficients of variation for target variables Ys in a 'strata' dataset given an allocation 'alloc' and the current set of precision constraints
expected_CV(strata, errors, alloc)
expected_CV(strata, errors, alloc)
strata |
name of the dataframe containing information in the sampling strata. |
errors |
name of the dataframe |
alloc |
vector containing the allocation of sampling units. |
a dataframe containing the maximum expected coefficients of variation in each domain level for each target variable
load("./data/sample.RData") target_vars <- c("active","inactive","unemployed","income_hh") strata <- R2BEAT:::prepareInputToAllocation_beat.1st(samp_frame = samp, ID = "id_hh", stratum = "stratum_label", dom = "region", target = target_vars) strata$CENS <- as.numeric(strata$CENS) strata$COST <- as.numeric(strata$COST) strata$CENS <- 0 cv <- as.data.frame(list(DOM = c("DOM1","DOM2"), CV1 = c(0.05,0.10), CV2 = c(0.05,0.10), CV3 = c(0.05,0.10), CV4 = c(0.05,0.10))) allocation <- beat.1st(strata,cv) alloc <- allocation$alloc$ALLOC[-nrow(allocation$alloc)] exp_cv <- expected_CV(strata,cv,alloc) exp_cv
load("./data/sample.RData") target_vars <- c("active","inactive","unemployed","income_hh") strata <- R2BEAT:::prepareInputToAllocation_beat.1st(samp_frame = samp, ID = "id_hh", stratum = "stratum_label", dom = "region", target = target_vars) strata$CENS <- as.numeric(strata$CENS) strata$COST <- as.numeric(strata$COST) strata$CENS <- 0 cv <- as.data.frame(list(DOM = c("DOM1","DOM2"), CV1 = c(0.05,0.10), CV2 = c(0.05,0.10), CV3 = c(0.05,0.10), CV4 = c(0.05,0.10))) allocation <- beat.1st(strata,cv) alloc <- allocation$alloc$ALLOC[-nrow(allocation$alloc)] exp_cv <- expected_CV(strata,cv,alloc) exp_cv
Prepares the following input dataframes for R2BEAT two-stages sample design starting from ReGenesees design and/or calibrated objects: 1. strata 2. deff 3. effst 4. rho
input_to_beat.2st_1(RGdes, RGcal, id_PSU, id_SSU, strata_vars, target_vars, deff_vars, domain_vars)
input_to_beat.2st_1(RGdes, RGcal, id_PSU, id_SSU, strata_vars, target_vars, deff_vars, domain_vars)
RGdes |
ReGenesees design object. |
RGcal |
ReGenesees calibrated object. |
id_PSU |
variables used as identifiers in ReGenesees objects. |
id_SSU |
variables used as identifiers in ReGenesees objects. |
strata_vars |
stratification variables used in ReGenesees objects. |
target_vars |
target variables. |
deff_vars |
stratification variables to be used when calculating deff. |
domain_vars |
the variables used to identify the domain of interest. |
Giulio Barcaroli
## Not run: library(readr) des <- read_rds("https://github.com/barcaroli/R2BEAT_workflows/blob/master/des.rds?raw=true") cal <- read_rds("https://github.com/barcaroli/R2BEAT_workflows/blob/master/cal.rds?raw=true") library(R2BEAT) RGdes <- des # ReGenesees design object RGcal <- cal # ReGenesees calibrated object strata_vars <- c("stratum") # variables of stratification target_vars <- c("income_hh", "active", "inactive", "unemployed") # target variables deff_vars <- "stratum" # stratification variables for calculating deff and effst # (n.b: must coincide or be a subset of variables of stratification) id_PSU <- c("municipality") # identification variable of PSUs id_SSU <- c("id_hh") # identification variable of SSUs domain_vars <- c("region") # domain variables inp1 <- input_to_beat.2st_1(RGdes, RGcal, id_PSU, id_SSU, strata_vars, target_vars, deff_vars, domain_vars) inp1$strata inp1$deff inp1$effst inp1$rho ## End(Not run)
## Not run: library(readr) des <- read_rds("https://github.com/barcaroli/R2BEAT_workflows/blob/master/des.rds?raw=true") cal <- read_rds("https://github.com/barcaroli/R2BEAT_workflows/blob/master/cal.rds?raw=true") library(R2BEAT) RGdes <- des # ReGenesees design object RGcal <- cal # ReGenesees calibrated object strata_vars <- c("stratum") # variables of stratification target_vars <- c("income_hh", "active", "inactive", "unemployed") # target variables deff_vars <- "stratum" # stratification variables for calculating deff and effst # (n.b: must coincide or be a subset of variables of stratification) id_PSU <- c("municipality") # identification variable of PSUs id_SSU <- c("id_hh") # identification variable of SSUs domain_vars <- c("region") # domain variables inp1 <- input_to_beat.2st_1(RGdes, RGcal, id_PSU, id_SSU, strata_vars, target_vars, deff_vars, domain_vars) inp1$strata inp1$deff inp1$effst inp1$rho ## End(Not run)
Prepares the design file for two-stage sample design on the basis of a dataset containing information on each PSU
input_to_beat.2st_2(psu,psu_id,stratum_var,mos_var,delta,minimum)
input_to_beat.2st_2(psu,psu_id,stratum_var,mos_var,delta,minimum)
psu |
Dataframe containing information on each PSU. |
psu_id |
Identifier of each PSU in PSU dataframe. |
stratum_var |
Identifier of stratum in PSU dataframe. |
mos_var |
Variable containing the number of selection units in each PSU. |
delta |
Average number of final number of SSU per each selection unit. |
minimum |
Minimum number of selection units to be interviewed in each PSU. |
Giulio Barcaroli
## Not run: library(readr) psu <- read_rds("https://github.com/barcaroli/R2BEAT_workflows/blob/master/psu.rds?raw=true") head(psu) library(R2BEAT) psu_id="municipality" # Identifier of the PSU stratum_var="stratum" # Identifier of the stratum mos_var="ind" # Variable to be used as 'measure of size' delta=1 # Average number of SSUs for each selection unit minimum <- 50 # Minimum number of SSUs to be selected in each PSU inp2 <- input_to_beat.2st_2(psu, psu_id, stratum_var, mos_var, delta, minimum) head(inp2$psu_file) head(inp2$des_file) ## End(Not run)
## Not run: library(readr) psu <- read_rds("https://github.com/barcaroli/R2BEAT_workflows/blob/master/psu.rds?raw=true") head(psu) library(R2BEAT) psu_id="municipality" # Identifier of the PSU stratum_var="stratum" # Identifier of the stratum mos_var="ind" # Variable to be used as 'measure of size' delta=1 # Average number of SSUs for each selection unit minimum <- 50 # Minimum number of SSUs to be selected in each PSU inp2 <- input_to_beat.2st_2(psu, psu_id, stratum_var, mos_var, delta, minimum) head(inp2$psu_file) head(inp2$des_file) ## End(Not run)
This function allows to plot the results of the simulation carried out by using the function 'sensitivity'.
plot_sens ( x, min, max)
plot_sens ( x, min, max)
x |
The result of the 'sensitivity' function. |
min |
minimum value of the parameter. |
max |
maximum value of the parameter. |
A list containing the (i) vector of allocated PSUs in the iterations and (ii) the vector of allocated SSUs in the iterations
Giulio Barcaroli
## Not run: library(readr) pop <- read_rds("https://github.com/barcaroli/R2BEAT_workflows/blob/master/pop.RDS?raw=true") library(R2BEAT) cv <- as.data.frame(list(DOM=c("DOM1","DOM2"), CV1=c(0.02,0.03), CV2=c(0.03,0.05), CV3=c(0.03,0.05), CV4=c(0.04,0.08))) cv # parameters samp_frame <- pop errors <- cv id_PSU <- "municipality" id_SSU <- "id_ind" strata_var <- "stratum" target_vars <- c("income_hh","active","inactive","unemployed") # more than one deff_var <- "stratum" domain_var <- "region" minimum <- 50 # minimum number of SSUs to be interviewed in each selected PSU # average dimension of the SSU in terms of elementary survey units #delta = nrow(pop) /length(unique(pop$id_hh)) delta = 1 # average dimension of the SSU in terms of elementary survey units deff_sugg <- 1.5 min <- 30 max <- 80 sens <- sensitivity_min_SSU ( samp_frame, errors, id_PSU, id_SSU, strata_var, target_vars, deff_var, domain_var, minimum, delta, deff_sugg, min, max) plot_sens(sens,min,max) ## End(Not run)
## Not run: library(readr) pop <- read_rds("https://github.com/barcaroli/R2BEAT_workflows/blob/master/pop.RDS?raw=true") library(R2BEAT) cv <- as.data.frame(list(DOM=c("DOM1","DOM2"), CV1=c(0.02,0.03), CV2=c(0.03,0.05), CV3=c(0.03,0.05), CV4=c(0.04,0.08))) cv # parameters samp_frame <- pop errors <- cv id_PSU <- "municipality" id_SSU <- "id_ind" strata_var <- "stratum" target_vars <- c("income_hh","active","inactive","unemployed") # more than one deff_var <- "stratum" domain_var <- "region" minimum <- 50 # minimum number of SSUs to be interviewed in each selected PSU # average dimension of the SSU in terms of elementary survey units #delta = nrow(pop) /length(unique(pop$id_hh)) delta = 1 # average dimension of the SSU in terms of elementary survey units deff_sugg <- 1.5 min <- 30 max <- 80 sens <- sensitivity_min_SSU ( samp_frame, errors, id_PSU, id_SSU, strata_var, target_vars, deff_var, domain_var, minimum, delta, deff_sugg, min, max) plot_sens(sens,min,max) ## End(Not run)
The function returns a dataframe, starting from the sampling frame (either universe or sample of a previous survey) with strata information.
prepareInputToAllocation_beat.1st( samp_frame, ID, stratum, dom, target, samp_weight = NULL )
prepareInputToAllocation_beat.1st( samp_frame, ID, stratum, dom, target, samp_weight = NULL )
samp_frame: |
A dataframe with the sampling frame (either universe or sample of another survey). |
ID: |
name of the variable which is the identifier of the units in the sampling frame. |
stratum: |
either name of the variable in samp_frame which is taken as the stratum, or the name of the variables which have to be concatenated to obtain the stratum. In this last case, the variables used to build the stratum are retained.. |
dom: |
name of the variable(s) in samp_frame which are the domain(s) of the estimates. |
target: |
variable(s) in samp_frame which is the target of the estimate. |
samp_weight: |
name of the variable in the sample of another survey specifying the sampling weights. Default is NULL, meaning that population is taken as the reference. |
A dataframe with strata information. The output dataframe can be used as input dataframe stratif for R2BEAT one-stage sample design (beat.1st)
# ---------------- # from population # ---------------- data("pop") samp_frame=pop ID="id_ind" stratum="stratum" dom=c("region", "province" ) target=c("unemployed") a <- prepareInputToAllocation_beat.1st(samp_frame = samp_frame, ID=ID, stratum=stratum, dom=dom, target=target) # ---------------- # from sample # ---------------- data("sample") samp_frame=samp ID="id_ind" stratum="stratum" dom=c("region", "province" ) target=c("unemployed") samp_weight="weight" b <- prepareInputToAllocation_beat.1st(samp_frame = samp_frame, ID=ID, stratum=stratum, dom=dom, target=target, samp_weight = samp_weight)
# ---------------- # from population # ---------------- data("pop") samp_frame=pop ID="id_ind" stratum="stratum" dom=c("region", "province" ) target=c("unemployed") a <- prepareInputToAllocation_beat.1st(samp_frame = samp_frame, ID=ID, stratum=stratum, dom=dom, target=target) # ---------------- # from sample # ---------------- data("sample") samp_frame=samp ID="id_ind" stratum="stratum" dom=c("region", "province" ) target=c("unemployed") samp_weight="weight" b <- prepareInputToAllocation_beat.1st(samp_frame = samp_frame, ID=ID, stratum=stratum, dom=dom, target=target, samp_weight = samp_weight)
In case of scenario 1 (no previous round of the survey available), prepares the following input dataframes for R2BEAT two-stages sample design starting from the sampling frame: 1. strata 2. deff 3. effst 4. rho 5. PSU_file 6. des_file
prepareInputToAllocation1 ( samp_frame, id_PSU, id_SSU, strata_var, target_vars, deff_var, domain_var, minimum, delta, deff_sugg)
prepareInputToAllocation1 ( samp_frame, id_PSU, id_SSU, strata_var, target_vars, deff_var, domain_var, minimum, delta, deff_sugg)
samp_frame |
The dataframe containing sampling units in the reference population. |
id_PSU |
variables used as identifiers in sampling frame. |
id_SSU |
variables used as identifiers in sampling frame. |
strata_var |
stratification variable used in sampling frame. |
target_vars |
target variables. |
deff_var |
stratification variable to be used when calculating deff. |
domain_var |
the variable used to identify the domain of interest. |
minimum |
minimum number of SSU to be selected from a PSU. |
delta |
average number of analysis units per sampling unit. |
deff_sugg |
suggested value of the deff. |
A list containing (i) the vector of allocated PSUs in the iterations and (ii) the vector of allocated SSUs in the iterations
Giulio Barcaroli
## Not run: library(readr) pop <- read_rds("https://github.com/barcaroli/R2BEAT_workflows/blob/master/pop.RDS?raw=true") library(R2BEAT) # parameters samp_frame <- pop id_PSU <- "municipality" id_SSU <- "id_ind" strata_var <- "stratum" target_vars <- c("income_hh","active","inactive","unemployed") deff_var <- "stratum" domain_var <- "region" minimum <- 50 # minimum number of SSUs to be interviewed in each selected PSU # average dimension of the SSU in terms of elementary survey units # delta = nrow(pop) /length(unique(pop$id_hh)) delta = 1 deff_sugg <- 1.5 # prepare inputs inp <- prepareInputToAllocation1(samp_frame, id_PSU, id_SSU, strata_var, target_vars, deff_var, domain_var, minimum, delta, deff_sugg) ## End(Not run)
## Not run: library(readr) pop <- read_rds("https://github.com/barcaroli/R2BEAT_workflows/blob/master/pop.RDS?raw=true") library(R2BEAT) # parameters samp_frame <- pop id_PSU <- "municipality" id_SSU <- "id_ind" strata_var <- "stratum" target_vars <- c("income_hh","active","inactive","unemployed") deff_var <- "stratum" domain_var <- "region" minimum <- 50 # minimum number of SSUs to be interviewed in each selected PSU # average dimension of the SSU in terms of elementary survey units # delta = nrow(pop) /length(unique(pop$id_hh)) delta = 1 deff_sugg <- 1.5 # prepare inputs inp <- prepareInputToAllocation1(samp_frame, id_PSU, id_SSU, strata_var, target_vars, deff_var, domain_var, minimum, delta, deff_sugg) ## End(Not run)
In case of scenario 2 (at least one previous round of the survey available), prepares the following input dataframes for R2BEAT two-stages sample design starting from the sampling frame: 1. strata 2. deff 3. effst 4. rho 5. PSU_file 6. des_file
prepareInputToAllocation2 ( samp_frame, RGdes, RGcal, id_PSU, id_SSU, strata_var, target_vars, deff_var, domain_var, delta, minimum)
prepareInputToAllocation2 ( samp_frame, RGdes, RGcal, id_PSU, id_SSU, strata_var, target_vars, deff_var, domain_var, delta, minimum)
samp_frame |
The dataframe containing sampling units in the reference population. |
RGdes |
The 'design' ReGenesees object. |
RGcal |
The 'calibration' ReGenesees object. |
id_PSU |
variables used as identifiers in sampling frame. |
id_SSU |
variables used as identifiers in sampling frame. |
strata_var |
stratification variable used in sampling frame. |
target_vars |
target variables. |
deff_var |
stratification variable to be used when calculating deff. |
domain_var |
the variable used to identify the domain of interest. |
delta |
average number of analysis units per sampling unit. |
minimum |
minimum number of SSU to be selected from a PSU. |
A list containing: (1) strata, (2) deff, (3) effst, (4) rho, (5) PSU_file, (6) des_file
Giulio Barcaroli
## Not run: library(readr) samp <- read_rds("https://github.com/barcaroli/R2BEAT_workflows/blob/master/sample.rds?raw=true") pop <- read_rds("https://github.com/barcaroli/R2BEAT_workflows/blob/master/pop.RDS?raw=true") library(R2BEAT) str(samp) ## Sample design description library(ReGenesees) samp$stratum_2 <- as.factor(samp$stratum_2) sample.des <- e.svydesign(samp, ids= ~ municipality + id_hh, strata = ~ stratum_2, weights = ~ weight, self.rep.str = ~ SR, check.data = TRUE) ## Find and collapse lonely strata ls <- find.lon.strata(sample.des) if (!is.null(ls)) sample.des <- collapse.strata(sample.des) ## Calibration with known totals totals <- pop.template(sample.des, calmodel = ~ sex : cl_age, partition = ~ region) totals <- fill.template(pop, totals, mem.frac = 10) sample.cal <- e.calibrate(sample.des, totals, calmodel = ~ sex : cl_age, partition = ~ region, calfun = "logit", bounds = c(0.3, 2.6), aggregate.stage = 2, force = FALSE) samp_frame <- pop RGdes <- sample.des RGcal <- sample.cal strata_var <- c("stratum") target_vars <- c("income_hh", "active", "inactive", "unemployed") weight_var <- "weight" deff_var <- "stratum" id_PSU <- c("municipality") id_SSU <- c("id_hh") domain_var <- c("region") delta <- 1 minimum <- 50 inp <- prepareInputToAllocation2 ( samp_frame, RGdes, RGcal, id_PSU, id_SSU, strata_var, target_vars, deff_var, domain_var, delta, minimum) head(inp$strata) head(inp$deff) head(inp$effst) head(inp$rho) head(inp$psu_file) head(inp$des_file) ## End(Not run)
## Not run: library(readr) samp <- read_rds("https://github.com/barcaroli/R2BEAT_workflows/blob/master/sample.rds?raw=true") pop <- read_rds("https://github.com/barcaroli/R2BEAT_workflows/blob/master/pop.RDS?raw=true") library(R2BEAT) str(samp) ## Sample design description library(ReGenesees) samp$stratum_2 <- as.factor(samp$stratum_2) sample.des <- e.svydesign(samp, ids= ~ municipality + id_hh, strata = ~ stratum_2, weights = ~ weight, self.rep.str = ~ SR, check.data = TRUE) ## Find and collapse lonely strata ls <- find.lon.strata(sample.des) if (!is.null(ls)) sample.des <- collapse.strata(sample.des) ## Calibration with known totals totals <- pop.template(sample.des, calmodel = ~ sex : cl_age, partition = ~ region) totals <- fill.template(pop, totals, mem.frac = 10) sample.cal <- e.calibrate(sample.des, totals, calmodel = ~ sex : cl_age, partition = ~ region, calfun = "logit", bounds = c(0.3, 2.6), aggregate.stage = 2, force = FALSE) samp_frame <- pop RGdes <- sample.des RGcal <- sample.cal strata_var <- c("stratum") target_vars <- c("income_hh", "active", "inactive", "unemployed") weight_var <- "weight" deff_var <- "stratum" id_PSU <- c("municipality") id_SSU <- c("id_hh") domain_var <- c("region") delta <- 1 minimum <- 50 inp <- prepareInputToAllocation2 ( samp_frame, RGdes, RGcal, id_PSU, id_SSU, strata_var, target_vars, deff_var, domain_var, delta, minimum) head(inp$strata) head(inp$deff) head(inp$effst) head(inp$rho) head(inp$psu_file) head(inp$des_file) ## End(Not run)
Example data frame containing information on Primary Stage Units (PSUs) stratification.
data(beat.example)
data(beat.example)
The PSU_strat data frame contains a row for each Primary Stage Units (PSUs) with the following variables:
Identifier of the stratum (numeric)
Measure of size of the primary stage unit (numeric)
Identifier of the primary stage unit (numeric)
Note: the names of the variables must be the ones indicated above.
## Not run: # Load example data data(beat.example) PSU_strat str(PSU_strat) ## End(Not run)
## Not run: # Load example data data(beat.example) PSU_strat str(PSU_strat) ## End(Not run)
Example data frame containing intraclass correlation in Self Representative (SR) and Non Self Representative (NSR) strata.
data(beat.example)
data(beat.example)
The intraclass correlation coefficienta () data frame contains a row per each stratum with the following variables:
Identifier of the stratum (numeric)
intraclass correlation of the elementary units for each primary stage unit of the self representing area belonging to the stratum for the first variable.
intraclass correlation of the elementary units for each primary stage unit of the self representing area belonging to the stratum for the j-th variable.
intraclass correlation of the elementary units for each primary stage unit of the self representing area belonging to the stratum for the n-th variable.
intraclass correlation of the elementary units for each primary stage unit of the non self representing area belonging to the stratum for the first variable.
intraclass correlation of the elementary units for each primary stage unit of the non self representing area belonging to the stratum for the j-th variable.
intraclass correlation of the elementary units for each primary stage unit of the non self representing area belonging to the stratum for the n-th variable.
Note: the names of the variables must be the ones indicated above.
Intraclass correlation, , provide a measure of the cluster heterogeneity and they have a direct impact on the design effect (design).
It can be indirectly computed from the design effect and the average minimum number of interviews in the Primary Stage Units (PSUs).
The ideal situation is when all the clusters in which the population is divided are more heterogeneous possible within them.
At the limit, if each cluster were a reduced copy of the population then it would be sufficient to extract one just to have the same information that would be obtained from a complete survey.
Then, more similar the units in the cluster are, higher the sample size must be (Cochran, 1977, Chapter 8).
By definition, in SR strata , is equal to 1, because there is just a single PSU in SR strata.
In NSR strata usually,
is usual higher than 1, because a double stage of selection is needed.
Cochran, W. (1977) Sampling Techniques. John Wiley & Sons, Inc., New York.
## Not run: # Load example data data(beat.example) rho str(rho) ## End(Not run)
## Not run: # Load example data data(beat.example) rho str(rho) ## End(Not run)
Select sample of primary stage units (PSU) on the basis of the PSU allocated in the allocation step.
select_PSU(alloc, type="ALLOC", pps=TRUE, plot=TRUE)
select_PSU(alloc, type="ALLOC", pps=TRUE, plot=TRUE)
alloc |
Output of the allocation step (beat.st) |
type |
Type of SSU allocation ("ALLOC" = optimal, "PROP" = proportional to population size, "EQUAL" = equal size in each stratum) |
pps |
Type of PSU selection in strata (pps = TRUE –> proportional to size, pps= FALSE–> simple random sampling) |
plot |
If TRUE, a plot of PSUs and SSUs ditribution is produced |
A list including: universe_PSU, sample_PSU
Alessio Guandalini
## Not run: library(readr) pop <- read_rds("https://github.com/barcaroli/R2BEAT_workflows/blob/master/pop.RDS?raw=true") library(R2BEAT) cv <- as.data.frame(list(DOM=c("DOM1","DOM2"), CV1=c(0.02,0.03), CV2=c(0.03,0.06), CV3=c(0.03,0.06), CV4=c(0.05,0.08))) cv samp_frame <- pop samp_frame$one <- 1 id_PSU <- "municipality" id_SSU <- "id_ind" strata_var <- "stratum" target_vars <- c("income_hh","active","inactive","unemployed") deff_var <- "stratum" domain_var <- "region" delta = 1 # households = survey units minimum <- 50 # minimum number of SSUs to be interviewed in each selected PSU deff_sugg <- 1.5 # suggestion for the deff value inp <- prepareInputToAllocation1(samp_frame, id_PSU, id_SSU, strata_var, target_vars, deff_var, domain_var, minimum, delta, deff_sugg) inp$desfile$MINIMUM <- 50 alloc <- beat.2st(stratif = inp$strata, errors = cv, des_file = inp$des_file, psu_file = inp$psu_file, rho = inp$rho, deft_start = NULL, effst = inp$effst, minPSUstrat = 2, minnumstrat = 50 ) sample_1st <- select_PSU(alloc, type="ALLOC", pps=TRUE, plot=TRUE) sample_1st$PSU_stats ## End(Not run)
## Not run: library(readr) pop <- read_rds("https://github.com/barcaroli/R2BEAT_workflows/blob/master/pop.RDS?raw=true") library(R2BEAT) cv <- as.data.frame(list(DOM=c("DOM1","DOM2"), CV1=c(0.02,0.03), CV2=c(0.03,0.06), CV3=c(0.03,0.06), CV4=c(0.05,0.08))) cv samp_frame <- pop samp_frame$one <- 1 id_PSU <- "municipality" id_SSU <- "id_ind" strata_var <- "stratum" target_vars <- c("income_hh","active","inactive","unemployed") deff_var <- "stratum" domain_var <- "region" delta = 1 # households = survey units minimum <- 50 # minimum number of SSUs to be interviewed in each selected PSU deff_sugg <- 1.5 # suggestion for the deff value inp <- prepareInputToAllocation1(samp_frame, id_PSU, id_SSU, strata_var, target_vars, deff_var, domain_var, minimum, delta, deff_sugg) inp$desfile$MINIMUM <- 50 alloc <- beat.2st(stratif = inp$strata, errors = cv, des_file = inp$des_file, psu_file = inp$psu_file, rho = inp$rho, deft_start = NULL, effst = inp$effst, minPSUstrat = 2, minnumstrat = 50 ) sample_1st <- select_PSU(alloc, type="ALLOC", pps=TRUE, plot=TRUE) sample_1st$PSU_stats ## End(Not run)
Select sample of primary stage units (PSU) on the basis of the PSU allocated in the allocation step. This function differs from 'selectPSU' in that PSUs are not organized in sub-strata, but directly sampled with probability proportional to size in each sampling stratum. It also allows an implicit stratification by giving a set of ordering variables for each PSU.
select_PSU2(alloc, type="ALLOC", var_ord=NULL, des_file=des_file, psu_file = psu_file)
select_PSU2(alloc, type="ALLOC", var_ord=NULL, des_file=des_file, psu_file = psu_file)
alloc |
Output of the allocation step (beat.st) |
type |
Type of SSU allocation ("ALLOC" = optimal, "PROP" = proportional to population size, "EQUAL" = equal size in each stratum) |
var_ord |
dataframe containing for all PSUs a set of variables to be used to order the PSUs |
des_file |
dataframe containing information on sampling strata |
psu_file |
dataframe containing the list of Primary Sampling Units |
list containing: (i) information on sampled PSUs (ii) list of selected PSUs
Giulio Barcaroli
## Not run: library(readr) pop <- read_rds("https://github.com/barcaroli/R2BEAT_workflows/blob/master/pop.RDS?raw=true") library(R2BEAT) cv <- as.data.frame(list(DOM=c("DOM1","DOM2"), CV1=c(0.02,0.03), CV2=c(0.03,0.06), CV3=c(0.03,0.06), CV4=c(0.05,0.08))) cv samp_frame <- pop samp_frame$one <- 1 id_PSU <- "municipality" id_SSU <- "id_ind" strata_var <- "stratum" target_vars <- c("income_hh","active","inactive","unemployed") deff_var <- "stratum" domain_var <- "region" delta = 1 # households = survey units minimum <- 50 # minimum number of SSUs to be interviewed in each selected PSU deff_sugg <- 1.5 # suggestion for the deff value inp <- prepareInputToAllocation1(samp_frame, id_PSU, id_SSU, strata_var, target_vars, deff_var, domain_var, minimum, delta, deff_sugg) inp$desfile$MINIMUM <- 50 alloc <- beat.2st(stratif = inp$strata, errors = cv, des_file = inp$des_file, psu_file = inp$psu_file, rho = inp$rho, deft_start = NULL, effst = inp$effst, minPSUstrat = 2, minnumstrat = 50 ) sample_1st <- select_PSU2(alloc, type="ALLOC", var_ord=NULL, des_file=des_file) head(sample_1st) ## End(Not run)
## Not run: library(readr) pop <- read_rds("https://github.com/barcaroli/R2BEAT_workflows/blob/master/pop.RDS?raw=true") library(R2BEAT) cv <- as.data.frame(list(DOM=c("DOM1","DOM2"), CV1=c(0.02,0.03), CV2=c(0.03,0.06), CV3=c(0.03,0.06), CV4=c(0.05,0.08))) cv samp_frame <- pop samp_frame$one <- 1 id_PSU <- "municipality" id_SSU <- "id_ind" strata_var <- "stratum" target_vars <- c("income_hh","active","inactive","unemployed") deff_var <- "stratum" domain_var <- "region" delta = 1 # households = survey units minimum <- 50 # minimum number of SSUs to be interviewed in each selected PSU deff_sugg <- 1.5 # suggestion for the deff value inp <- prepareInputToAllocation1(samp_frame, id_PSU, id_SSU, strata_var, target_vars, deff_var, domain_var, minimum, delta, deff_sugg) inp$desfile$MINIMUM <- 50 alloc <- beat.2st(stratif = inp$strata, errors = cv, des_file = inp$des_file, psu_file = inp$psu_file, rho = inp$rho, deft_start = NULL, effst = inp$effst, minPSUstrat = 2, minnumstrat = 50 ) sample_1st <- select_PSU2(alloc, type="ALLOC", var_ord=NULL, des_file=des_file) head(sample_1st) ## End(Not run)
Select sample of secondary stage units (SSU) from the population frame on the basis of the SSU allocated to each selected PSU
select_SSU(df,PSU_code,SSU_code,PSU_sampled)
select_SSU(df,PSU_code,SSU_code,PSU_sampled)
df |
Dataframe containing sampling units (SSUs) |
PSU_code |
Identifier of each PSU in dataframe containing sampling units |
SSU_code |
Identifier of each SSU in dataframe containing sampling units |
PSU_sampled |
Dataframe containing selected PSUs |
A dataframe containing the units selected in the sample
Giulio Barcaroli
## Not run: library(readr) pop <- read_rds("https://github.com/barcaroli/R2BEAT_workflows/blob/master/pop.RDS?raw=true") library(R2BEAT) cv <- as.data.frame(list(DOM=c("DOM1","DOM2"), CV1=c(0.02,0.03), CV2=c(0.03,0.06), CV3=c(0.03,0.06), CV4=c(0.05,0.08))) cv samp_frame <- pop samp_frame$one <- 1 id_PSU <- "municipality" id_SSU <- "id_ind" strata_var <- "stratum" target_vars <- c("income_hh","active","inactive","unemployed") deff_var <- "stratum" domain_var <- "region" delta = 1 # households = survey units minimum <- 50 # minimum number of SSUs to be interviewed in each selected PSU deff_sugg <- 1.5 # suggestion for the deff value inp <- prepareInputToAllocation1(samp_frame, id_PSU, id_SSU, strata_var, target_vars, deff_var, domain_var, minimum, delta, deff_sugg) inp$desfile$MINIMUM <- 50 alloc <- beat.2st(stratif = inp$strata, errors = cv, des_file = inp$des_file, psu_file = inp$psu_file, rho = inp$rho, deft_start = NULL, effst = inp$effst, minPSUstrat = 2, minnumstrat = 50 ) sample_1st <- select_PSU(alloc, type="ALLOC", pps=TRUE, plot=TRUE) samp <- select_SSU(df=pop, PSU_code="municipality", SSU_code="id_ind", PSU_sampled=sample_1st$sample_PSU) ## End(Not run)
## Not run: library(readr) pop <- read_rds("https://github.com/barcaroli/R2BEAT_workflows/blob/master/pop.RDS?raw=true") library(R2BEAT) cv <- as.data.frame(list(DOM=c("DOM1","DOM2"), CV1=c(0.02,0.03), CV2=c(0.03,0.06), CV3=c(0.03,0.06), CV4=c(0.05,0.08))) cv samp_frame <- pop samp_frame$one <- 1 id_PSU <- "municipality" id_SSU <- "id_ind" strata_var <- "stratum" target_vars <- c("income_hh","active","inactive","unemployed") deff_var <- "stratum" domain_var <- "region" delta = 1 # households = survey units minimum <- 50 # minimum number of SSUs to be interviewed in each selected PSU deff_sugg <- 1.5 # suggestion for the deff value inp <- prepareInputToAllocation1(samp_frame, id_PSU, id_SSU, strata_var, target_vars, deff_var, domain_var, minimum, delta, deff_sugg) inp$desfile$MINIMUM <- 50 alloc <- beat.2st(stratif = inp$strata, errors = cv, des_file = inp$des_file, psu_file = inp$psu_file, rho = inp$rho, deft_start = NULL, effst = inp$effst, minPSUstrat = 2, minnumstrat = 50 ) sample_1st <- select_PSU(alloc, type="ALLOC", pps=TRUE, plot=TRUE) samp <- select_SSU(df=pop, PSU_code="municipality", SSU_code="id_ind", PSU_sampled=sample_1st$sample_PSU) ## End(Not run)
This function allows to obtain a better presentation of sensitivity information, i.e. with the names of the variables and of the domains
sens_names(s,target_vars,strata)
sens_names(s,target_vars,strata)
s |
The sensitivity output of the allocation step. |
target_vars |
target variables. |
strata |
strata dataframe used in the allocation step.. |
A dataframe containing Sensitivity information
Giulio Barcaroli
This function allows to analyse the different results in terms of first stage size (number of PSUs) and second stage size (number of SSUs), when varying the values of the minimum number of SSU per single PSU) The name of the parameter has to be given, together with the minimum and maximum value. On the basis of these minimum and maximum values, 10 different values will be used for carrying out the allocation. The output will be a graphical one. To be used only in the scenario when no previous rounds of the survey are available, and a frame complete with values of target variables is available.
sensitivity_min_SSU (samp_frame, errors, id_PSU, id_SSU, strata_var, target_vars, deff_var, domain_var, delta, deff_sugg, min, max, plot)
sensitivity_min_SSU (samp_frame, errors, id_PSU, id_SSU, strata_var, target_vars, deff_var, domain_var, delta, deff_sugg, min, max, plot)
samp_frame |
The dataframe containing sampling units in the reference population. |
errors |
Precision constraints. |
id_PSU |
variables used as identifiers in sampling frame. |
id_SSU |
variables used as identifiers in sampling frame. |
strata_var |
stratification variable used in sampling frame. |
target_vars |
target variables. |
deff_var |
stratification variable to be used when calculating deff. |
domain_var |
the variable used to identify the domain of interest. |
delta |
average number of analysis units per sampling unit. |
deff_sugg |
suggestion for deff value. |
min |
minimum value of the parameter. |
max |
maximum value of the parameter. |
plot |
plot (TRUE/FALSE) the final result. |
A list containing the (i) vector of allocated PSUs in the iterations and (ii) the vector of allocated SSUs in the iterations
Giulio Barcaroli
## Not run: library(readr) pop <- read_rds("https://github.com/barcaroli/R2BEAT_workflows/blob/master/pop.RDS?raw=true") library(R2BEAT) cv <- as.data.frame(list(DOM=c("DOM1","DOM2"), CV1=c(0.03,0.04), CV2=c(0.06,0.08), CV3=c(0.06,0.08), CV4=c(0.06,0.08))) cv # parameters samp_frame <- pop errors <- cv id_PSU <- "municipality" id_SSU <- "id_ind" strata_var <- "stratum" target_vars <- c("income_hh","active","inactive","unemployed") # more than one deff_var <- "stratum" domain_var <- "region" # average dimension of the SSU in terms of elementary survey units #delta = nrow(pop) /length(unique(pop$id_hh)) delta = 1 # average dimension of the SSU in terms of elementary survey units deff_sugg <- 1.5 # deff (suggested) sensitivity_min_SSU( samp_frame, errors, id_PSU, id_SSU, strata_var, target_vars, deff_var, domain_var, delta, deff_sugg, min=1, max=2) ## End(Not run)
## Not run: library(readr) pop <- read_rds("https://github.com/barcaroli/R2BEAT_workflows/blob/master/pop.RDS?raw=true") library(R2BEAT) cv <- as.data.frame(list(DOM=c("DOM1","DOM2"), CV1=c(0.03,0.04), CV2=c(0.06,0.08), CV3=c(0.06,0.08), CV4=c(0.06,0.08))) cv # parameters samp_frame <- pop errors <- cv id_PSU <- "municipality" id_SSU <- "id_ind" strata_var <- "stratum" target_vars <- c("income_hh","active","inactive","unemployed") # more than one deff_var <- "stratum" domain_var <- "region" # average dimension of the SSU in terms of elementary survey units #delta = nrow(pop) /length(unique(pop$id_hh)) delta = 1 # average dimension of the SSU in terms of elementary survey units deff_sugg <- 1.5 # deff (suggested) sensitivity_min_SSU( samp_frame, errors, id_PSU, id_SSU, strata_var, target_vars, deff_var, domain_var, delta, deff_sugg, min=1, max=2) ## End(Not run)
This function is similar to the function sensitivity_min_SSU, as it allows to analyse the different results in terms of first stage size (number of PSUs) and second stage size (number of SSUs), when varying the values of the minimum number of SSU per single PSU) The name of the parameter has to be given, together with the minimum and maximum value. Differently from sensitivity_min_SSU, it requires in input all the outputs already prepared by prepareInputToAllocation1 or prepareInputToAllocation2.
sensitivity_min_SSU2 (strata, des_file, psu_file, rho, effst, errors, min, max, plot)
sensitivity_min_SSU2 (strata, des_file, psu_file, rho, effst, errors, min, max, plot)
strata |
Data frame of survey strata, for more details see, e.g.,strata. |
des_file |
Data frame containing information on sampling design variables, for more details see, e.g.,design. |
psu_file |
Data frame containing information on primary stage units stratification, for more details see, e.g.,PSU_strat. |
rho |
Data frame of survey strata, for more details see, e.g.,rho. |
effst |
Data frame of survey strata, for taking into account the estimator effect on each variable, for more details see, e.g.,effst. |
errors |
Data frame of coefficients of variation for each domain, for more details see, e.g.,errors. |
min |
starting value for the minimum value of SSUs per PSU. |
max |
ending value for the minimum value of SSUs per PSU. |
plot |
plot (TRUE/FALSE) the final result. |
A list containing the (i) vector of allocated PSUs in the iterations and (ii) the vector of allocated SSUs in the iterations
Giulio Barcaroli
## Not run: library(readr) samp <- read_rds("https://github.com/barcaroli/R2BEAT_workflows/blob/master/sample.rds?raw=true") pop <- read_rds("https://github.com/barcaroli/R2BEAT_workflows/blob/master/pop.RDS?raw=true") library(R2BEAT) str(samp) ## Sample design description library(ReGenesees) samp$stratum_2 <- as.factor(samp$stratum_2) sample.des <- e.svydesign(samp, ids= ~ municipality + id_hh, strata = ~ stratum_2, weights = ~ weight, self.rep.str = ~ SR, check.data = TRUE) ## Find and collapse lonely strata ls <- find.lon.strata(sample.des) if (!is.null(ls)) sample.des <- collapse.strata(sample.des) ## Calibration with known totals totals <- pop.template(sample.des, calmodel = ~ sex : cl_age, partition = ~ region) totals <- fill.template(pop, totals, mem.frac = 10) sample.cal <- e.calibrate(sample.des, totals, calmodel = ~ sex : cl_age, partition = ~ region, calfun = "logit", bounds = c(0.3, 2.6), aggregate.stage = 2, force = FALSE) samp_frame <- pop RGdes <- sample.des RGcal <- sample.cal strata_var <- c("stratum") target_vars <- c("income_hh", "active", "inactive", "unemployed") weight_var <- "weight" deff_var <- "stratum" id_PSU <- c("municipality") id_SSU <- c("id_hh") domain_var <- c("region") delta <- 1 minimum <- 50 inp <- prepareInputToAllocation2 ( samp_frame, RGdes, RGcal, id_PSU, id_SSU, strata_var, target_vars, deff_var, domain_var, delta, minimum) cv <- as.data.frame(list(DOM=c("DOM1","DOM2"), CV1=c(0.02,0.03), CV2=c(0.03,0.06), CV3=c(0.03,0.06), CV4=c(0.05,0.08))) sens <- sensitivity_min_SSU2 (strata = inp$strata, des_file = inp$des_file, psu_file = inp$psu_file, rho = inp$rho, effst = inp$effst, errors = cv, min = 15, max = 25, plot = TRUE) ## End(Not run)
## Not run: library(readr) samp <- read_rds("https://github.com/barcaroli/R2BEAT_workflows/blob/master/sample.rds?raw=true") pop <- read_rds("https://github.com/barcaroli/R2BEAT_workflows/blob/master/pop.RDS?raw=true") library(R2BEAT) str(samp) ## Sample design description library(ReGenesees) samp$stratum_2 <- as.factor(samp$stratum_2) sample.des <- e.svydesign(samp, ids= ~ municipality + id_hh, strata = ~ stratum_2, weights = ~ weight, self.rep.str = ~ SR, check.data = TRUE) ## Find and collapse lonely strata ls <- find.lon.strata(sample.des) if (!is.null(ls)) sample.des <- collapse.strata(sample.des) ## Calibration with known totals totals <- pop.template(sample.des, calmodel = ~ sex : cl_age, partition = ~ region) totals <- fill.template(pop, totals, mem.frac = 10) sample.cal <- e.calibrate(sample.des, totals, calmodel = ~ sex : cl_age, partition = ~ region, calfun = "logit", bounds = c(0.3, 2.6), aggregate.stage = 2, force = FALSE) samp_frame <- pop RGdes <- sample.des RGcal <- sample.cal strata_var <- c("stratum") target_vars <- c("income_hh", "active", "inactive", "unemployed") weight_var <- "weight" deff_var <- "stratum" id_PSU <- c("municipality") id_SSU <- c("id_hh") domain_var <- c("region") delta <- 1 minimum <- 50 inp <- prepareInputToAllocation2 ( samp_frame, RGdes, RGcal, id_PSU, id_SSU, strata_var, target_vars, deff_var, domain_var, delta, minimum) cv <- as.data.frame(list(DOM=c("DOM1","DOM2"), CV1=c(0.02,0.03), CV2=c(0.03,0.06), CV3=c(0.03,0.06), CV4=c(0.05,0.08))) sens <- sensitivity_min_SSU2 (strata = inp$strata, des_file = inp$des_file, psu_file = inp$psu_file, rho = inp$rho, effst = inp$effst, errors = cv, min = 15, max = 25, plot = TRUE) ## End(Not run)
Example data frame containing information on strata characteristics.
data(beat.example)
data(beat.example)
The Strata data frame contains a row per each stratum with the following variables:
Identifier of the stratum (numeric).
Stratum population size (numeric).
Mean in the stratum of the first variable (numeric).
Mean in the stratum of the j-th variable (numeric).
Mean in the stratum of the last variable (numeric).
Standard deviation in the stratum of the first variable (numeric).
Standard deviation in the stratum of the j-th variable (numeric).
Standard deviation in the stratum of the last variable (numeric).
flag (1 indicates a take all stratum, 0 a sampling stratum, usually 0 ) (numeric).
Cost per interview in each stratum, usually 0 (numeric).
Domain value to which the stratum belongs for the first type of domain (factor or numeric).
Domain value to which the stratum belongs for the a-th type of domain (factor or numeric).
Domain value to which the stratum belongs for the k-th type of domain (factor or numeric).
Note: the names of the variables must be the ones indicated above.
## Not run: # Load example data data(beat.example) strata str(strata) ## End(Not run)
## Not run: # Load example data data(beat.example) strata str(strata) ## End(Not run)