Title: | Data Simulation Based on Latent Factors |
---|---|
Description: | Generates data based on latent factor models. Data can be continuous, polytomous, dichotomous, or mixed. Skews, cross-loadings, wording effects, population errors, and local dependencies can be added. All parameters can be manipulated. Data categorization is based on Garrido, Abad, and Ponsoda (2011) <doi:10.1177/0013164410389489>. |
Authors: | Alexander Christensen [aut, cre] , Luis Eduardo Garrido [aut] , Maria Dolores Nieto Canaveras [aut], Hudson Golino [aut] , Marcos Jimenez [aut], Francisco Abad [ctb], Eduardo Garcia-Garzon [ctb], Vithor Franco [aut] |
Maintainer: | Alexander Christensen <[email protected]> |
License: | GPL (>= 3.0) |
Version: | 0.0.7 |
Built: | 2024-10-28 05:39:31 UTC |
Source: | https://github.com/alexchristensen/latentfactor |
Generates data based on latent factor models. Data can be continuous, polytomous, dichotomous, or mixed. Skew, cross-loadings, and population error can be added. All parameters can be manipulated. Data categorization is based on Garrido, Abad, and Ponsoda (2011).
Alexander P. Christensen <[email protected]>, Maria Dolores Nieto Canaveras <[email protected]>, Hudson Golino <[email protected]>, Luis Eduardo Garrido <[email protected]>
Christensen, A. P., Garrido, L. E., & Golino, H. (2022).
Unique variable analysis: A network psychometrics method to detect local dependence.
PsyArXiv
Garrido, L. E., Abad, F. J., & Ponsoda, V. (2011).
Performance of Velicer’s minimum average partial factor retention method with categorical variables.
Educational and Psychological Measurement, 71(3), 551-570.
Golino, H., Shi, D., Christensen, A. P., Garrido, L. E., Nieto, M. D., Sadana, R., ... & Martinez-Molina, A. (2020). Investigating the performance of exploratory graph analysis and traditional techniques to identify the number of latent factors: A simulation and tutorial. Psychological Methods, 25(3), 292-320.
simulate_factors
DataIntended to add substantial cross-loadings to simulated data from simulate_factors
.
See examples to get started
add_cross_loadings( lf_object, proportion_cross_loadings, proportion_cross_loadings_range = NULL, magnitude_cross_loadings, magnitude_cross_loadings_range = NULL, leave_cross_loadings = FALSE )
add_cross_loadings( lf_object, proportion_cross_loadings, proportion_cross_loadings_range = NULL, magnitude_cross_loadings, magnitude_cross_loadings_range = NULL, leave_cross_loadings = FALSE )
lf_object |
Data object from |
proportion_cross_loadings |
Numeric (length = 1 or |
proportion_cross_loadings_range |
Numeric (length = 2). Range of proportion of variables that should be cross-loaded randomly onto one other factor. Accepts number of variables to cross-load onto one other factor as well |
magnitude_cross_loadings |
Numeric (length = 1, |
magnitude_cross_loadings_range |
Numeric (length = 2).
The range of the magnitude or size of the cross-loadings.
Defaults to |
leave_cross_loadings |
Boolean.
Should cross-loadings be kept?
Defaults to |
Returns a list containing the same parameters as the original
lf_object
but with updated data
, population_correlation
,
and parameters
(specifically, loadings
matrix). Also returns
original lf_object
in original_results
Alexander P. Christensen <[email protected]>, Hudson Golino <[email protected]>, Luis Eduardo Garrido <[email protected]>
Christensen, A. P., Garrido, L. E., & Golino, H. (2022). Unique variable analysis: A network psychometrics method to detect local dependence. PsyArXiv
# Generate factor data two_factor <- simulate_factors( factors = 2, # factors = 2 variables = 6, # variables per factor = 6 loadings = 0.55, # loadings between = 0.45 to 0.65 cross_loadings = 0.05, # cross-loadings N(0, 0.05) correlations = 0.30, # correlation between factors = 0.30 sample_size = 1000 # number of cases = 1000 ) # Add substantial cross-loadings two_factor_CL <- add_cross_loadings( lf_object = two_factor, proportion_cross_loadings = 0.25, magnitude_cross_loadings = 0.35 ) # Randomly vary proportions two_factor_CL <- add_cross_loadings( lf_object = two_factor, proportion_cross_loadings_range = c(0, 0.25), magnitude_cross_loadings = 0.35 ) # Randomly vary magnitudes two_factor_CL <- add_cross_loadings( lf_object = two_factor, proportion_cross_loadings = 0.25, magnitude_cross_loadings_range = c(0.35, 0.45) ) # Set number of cross-loadings per factor (rather than proportion) two_factor_CL <- add_cross_loadings( lf_object = two_factor, proportion_cross_loadings = 2, magnitude_cross_loadings = 0.35 )
# Generate factor data two_factor <- simulate_factors( factors = 2, # factors = 2 variables = 6, # variables per factor = 6 loadings = 0.55, # loadings between = 0.45 to 0.65 cross_loadings = 0.05, # cross-loadings N(0, 0.05) correlations = 0.30, # correlation between factors = 0.30 sample_size = 1000 # number of cases = 1000 ) # Add substantial cross-loadings two_factor_CL <- add_cross_loadings( lf_object = two_factor, proportion_cross_loadings = 0.25, magnitude_cross_loadings = 0.35 ) # Randomly vary proportions two_factor_CL <- add_cross_loadings( lf_object = two_factor, proportion_cross_loadings_range = c(0, 0.25), magnitude_cross_loadings = 0.35 ) # Randomly vary magnitudes two_factor_CL <- add_cross_loadings( lf_object = two_factor, proportion_cross_loadings = 0.25, magnitude_cross_loadings_range = c(0.35, 0.45) ) # Set number of cross-loadings per factor (rather than proportion) two_factor_CL <- add_cross_loadings( lf_object = two_factor, proportion_cross_loadings = 2, magnitude_cross_loadings = 0.35 )
simulate_factors
DataAdds local dependence to simulated data from simulate_factors
.
See examples to get started
add_local_dependence( lf_object, method = c("correlate_residuals", "minor_factors", "threshold_shifts"), proportion_LD, proportion_LD_range = NULL, add_residuals = NULL, add_residuals_range = NULL, allow_multiple = FALSE )
add_local_dependence( lf_object, method = c("correlate_residuals", "minor_factors", "threshold_shifts"), proportion_LD, proportion_LD_range = NULL, add_residuals = NULL, add_residuals_range = NULL, allow_multiple = FALSE )
lf_object |
Data object from |
method |
Character (length = 1).
Method to generate local dependence between variables.
Only
|
proportion_LD |
Numeric (length = 1 or |
proportion_LD_range |
Numeric (length = 2).
Range of proportion of variables that are randomly selected from
a random uniform distribution. Accepts number of locally dependent values as well.
Defaults to |
add_residuals |
Numeric (length = 1, |
add_residuals_range |
Numeric (length = 2).
Range of the residuals to add to the correlation matrix are randomly selected from
a random uniform distribution.
Defaults to |
allow_multiple |
Boolean.
Whether a variable should be allowed to be locally dependent with
more than one other variable.
Defaults to |
Returns a list containing:
data |
Simulated data from the specified factor model |
population_correlation |
Population correlation matrix with local dependence added |
original_correlation |
Original population correlation matrix before local dependence was added |
correlated_residuals |
A data frame with the first two columns specifying the variables that are locally dependent and the third column specifying the magnitude of the added residual for each locally dependent pair |
original_results |
Original |
Alexander P. Christensen <[email protected]>, Hudson Golino <[email protected]>, Luis Eduardo Garrido <[email protected]>
Christensen, A. P., Garrido, L. E., & Golino, H. (2023). Unique variable analysis: A network psychometrics method to detect local dependence. Multivariate Behavioral Research, 1–18.
# Generate factor data two_factor <- simulate_factors( factors = 2, # factors = 2 variables = 6, # variables per factor = 6 loadings = 0.55, # loadings between = 0.45 to 0.65 cross_loadings = 0.05, # cross-loadings N(0, 0.05) correlations = 0.30, # correlation between factors = 0.30 sample_size = 1000 # number of cases = 1000 ) # Add local dependence two_factor_LD <- add_local_dependence( lf_object = two_factor, proportion_LD = 0.25, add_residuals = 0.20, allow_multiple = FALSE ) # Randomly vary proportions two_factor_LD <- add_local_dependence( lf_object = two_factor, proportion_LD_range = c(0.10, 0.50), add_residuals = 0.20, allow_multiple = FALSE ) # Randomly vary residuals two_factor_LD <- add_local_dependence( lf_object = two_factor, proportion_LD = 0.25, add_residuals_range = c(0.20, 0.40), allow_multiple = FALSE ) # Randomly vary proportions, residuals, and allow multiple two_factor_LD <- add_local_dependence( lf_object = two_factor, proportion_LD_range = c(0.10, 0.50), add_residuals_range = c(0.20, 0.40), allow_multiple = TRUE )
# Generate factor data two_factor <- simulate_factors( factors = 2, # factors = 2 variables = 6, # variables per factor = 6 loadings = 0.55, # loadings between = 0.45 to 0.65 cross_loadings = 0.05, # cross-loadings N(0, 0.05) correlations = 0.30, # correlation between factors = 0.30 sample_size = 1000 # number of cases = 1000 ) # Add local dependence two_factor_LD <- add_local_dependence( lf_object = two_factor, proportion_LD = 0.25, add_residuals = 0.20, allow_multiple = FALSE ) # Randomly vary proportions two_factor_LD <- add_local_dependence( lf_object = two_factor, proportion_LD_range = c(0.10, 0.50), add_residuals = 0.20, allow_multiple = FALSE ) # Randomly vary residuals two_factor_LD <- add_local_dependence( lf_object = two_factor, proportion_LD = 0.25, add_residuals_range = c(0.20, 0.40), allow_multiple = FALSE ) # Randomly vary proportions, residuals, and allow multiple two_factor_LD <- add_local_dependence( lf_object = two_factor, proportion_LD_range = c(0.10, 0.50), add_residuals_range = c(0.20, 0.40), allow_multiple = TRUE )
simulate_factors
DataAdds methods factors to simulated data from simulate_factors
.
See examples to get started
add_method_factors( lf_object, proportion_negative = 0.5, proportion_negative_range = NULL, methods_factors, methods_loadings, methods_loadings_range = 0, methods_correlations, methods_correlations_range = NULL )
add_method_factors( lf_object, proportion_negative = 0.5, proportion_negative_range = NULL, methods_factors, methods_loadings, methods_loadings_range = 0, methods_correlations, methods_correlations_range = NULL )
lf_object |
Data object from |
proportion_negative |
Numeric (length = 1 or |
proportion_negative_range |
Numeric (length = 2).
Range of proportion of variables that are randomly selected from
a uniform distribution. Accepts number of number of variables as well.
Defaults to |
methods_factors |
Numeric |
methods_loadings |
Numeric |
methods_loadings_range |
Numeric |
methods_correlations |
Numeric |
methods_correlations_range |
Numeric |
Returns a list containing:
data |
Biased data simulated data from the specified factor model |
unbiased_data |
The corresponding unbiased data prior to replacing values
to generate the (biased) |
parameters |
Bias-adjusted parameters of the |
original_results |
Original |
Alexander P. Christensen <[email protected]>, Luis Eduardo Garrido <[email protected]>
Garcia-Pardina, A., Abad, F. J., Christensen, A. P., Golino, H., & Garrido, L. E. (2024). Dimensionality assessment in the presence of wording effects: A network psychometric and factorial approach. Behavior Research Methods.
# Generate factor data two_factor <- simulate_factors( factors = 2, # factors = 2 variables = 6, # variables per factor = 6 loadings = 0.55, # loadings between = 0.45 to 0.65 cross_loadings = 0.05, # cross-loadings N(0, 0.05) correlations = 0.30, # correlation between factors = 0.30 sample_size = 1000, # number of cases = 1000 variable_categories = 5 # 5-point Likert scale ) # Add methods factors two_factor_methods_effect <- add_method_factors( lf_object = two_factor, proportion_negative = 0.50, methods_loadings = 0.20, methods_loadings_range = 0.10 )
# Generate factor data two_factor <- simulate_factors( factors = 2, # factors = 2 variables = 6, # variables per factor = 6 loadings = 0.55, # loadings between = 0.45 to 0.65 cross_loadings = 0.05, # cross-loadings N(0, 0.05) correlations = 0.30, # correlation between factors = 0.30 sample_size = 1000, # number of cases = 1000 variable_categories = 5 # 5-point Likert scale ) # Add methods factors two_factor_methods_effect <- add_method_factors( lf_object = two_factor, proportion_negative = 0.50, methods_loadings = 0.20, methods_loadings_range = 0.10 )
simulate_factors
DataAdds population error to simulated data from simulate_factors
.
See examples to get started
add_population_error( lf_object, cfa_method = c("minres", "ml"), fit = c("cfi", "rmsea", "rmsr", "raw"), misfit = c("close", "acceptable"), error_method = c("cudeck", "yuan"), tolerance = 0.01, convergence_iterations = 10, leave_cross_loadings = FALSE )
add_population_error( lf_object, cfa_method = c("minres", "ml"), fit = c("cfi", "rmsea", "rmsr", "raw"), misfit = c("close", "acceptable"), error_method = c("cudeck", "yuan"), tolerance = 0.01, convergence_iterations = 10, leave_cross_loadings = FALSE )
lf_object |
Data object from |
cfa_method |
Character (length = 1).
Method to generate population error.
Defaults to
|
fit |
Character (length = 1).
Fit index to control population error.
Defaults to
|
misfit |
Character or numeric (length = 1).
Magnitude of error to add.
Defaults to
While numbers can be used, they are not recommended. They can be used to specify misfit but the level of misfit will vary depending on the factor structure |
error_method |
Character (length = 1).
Method to control population error.
Defaults to
|
tolerance |
Numeric (length = 1).
Tolerance of SRMR difference between population error
correlation matrix and the original population correlation
matrix. Ensures that appropriate population error
was added. Similarly, verifies that the MAE of the
loadings are not greater than the specified amount,
ensuring proper convergence.
Defaults to |
convergence_iterations |
Numeric (length = 1).
Number of iterations to reach parameter convergence
within the specified 'tolerance'.
Defaults to |
leave_cross_loadings |
Boolean.
Should cross-loadings be kept?
Defaults to |
Returns a list containing:
data |
Simulated data from the specified factor model |
population_correlation |
Population correlation matrix with local dependence added |
population_error |
A list containing the parameters used to generate population error:
|
original_results |
Original |
bifactor
authors
Marcos Jimenez,
Francisco J. Abad,
Eduardo Garcia-Garzon,
Vithor R. Franco,
Luis Eduardo Garrido <[email protected]>
latentFactoR
authors
Alexander P. Christensen <[email protected]>,
Hudson Golino <[email protected]>,
Luis Eduardo Garrido <[email protected]>,
Marcos Jimenez,
Francisco J. Abad,
Eduardo Garcia-Garzon,
Vithor R. Franco
Cudeck, R., & Browne, M.W. (1992). Constructing a covariance matrix that yields a specified minimizer and a specified minimum discrepancy function value. Psychometrika, 57, 357–369.
# Generate factor data two_factor <- simulate_factors( factors = 2, # factors = 2 variables = 6, # variables per factor = 6 loadings = 0.55, # loadings between = 0.45 to 0.65 cross_loadings = 0.05, # cross-loadings N(0, 0.05) correlations = 0.30, # correlation between factors = 0.30 sample_size = 1000 # number of cases = 1000 ) # Add small population error using Cudeck method two_factor_Cudeck <- add_population_error( lf_object = two_factor, cfa_method = "minres", fit = "rmsr", misfit = "close", error_method = "cudeck" ) # Add small population error using Yuan method two_factor_Yuan <- add_population_error( lf_object = two_factor, cfa_method = "minres", fit = "rmsr", misfit = "close", error_method = "yuan" )
# Generate factor data two_factor <- simulate_factors( factors = 2, # factors = 2 variables = 6, # variables per factor = 6 loadings = 0.55, # loadings between = 0.45 to 0.65 cross_loadings = 0.05, # cross-loadings N(0, 0.05) correlations = 0.30, # correlation between factors = 0.30 sample_size = 1000 # number of cases = 1000 ) # Add small population error using Cudeck method two_factor_Cudeck <- add_population_error( lf_object = two_factor, cfa_method = "minres", fit = "rmsr", misfit = "close", error_method = "cudeck" ) # Add small population error using Yuan method two_factor_Yuan <- add_population_error( lf_object = two_factor, cfa_method = "minres", fit = "rmsr", misfit = "close", error_method = "yuan" )
simulate_factors
DataAdds wording effects to simulated data from simulate_factors
.
See examples to get started
add_wording_effects( lf_object, method = c("acquiescence", "difficulty", "random_careless", "straight_line", "mixed"), proportion_negative = 0.5, proportion_negative_range = NULL, proportion_biased_cases = 0.1, proportion_biased_variables = 1, proportion_biased_variables_range = NULL, proportion_biased_person = 1, proportion_biased_person_range = NULL )
add_wording_effects( lf_object, method = c("acquiescence", "difficulty", "random_careless", "straight_line", "mixed"), proportion_negative = 0.5, proportion_negative_range = NULL, proportion_biased_cases = 0.1, proportion_biased_variables = 1, proportion_biased_variables_range = NULL, proportion_biased_person = 1, proportion_biased_person_range = NULL )
lf_object |
Data object from |
method |
Character (length = 1). Method to generate wording effect to add to the data. Description of methods:
|
proportion_negative |
Numeric (length = 1 or |
proportion_negative_range |
Numeric (length = 2).
Range of proportion of variables that are randomly selected from
a uniform distribution. Accepts number of number of variables as well.
Defaults to |
proportion_biased_cases |
Numeric (length = 1).
Proportion of cases that should be biased with wording effects.
Also accepts number of cases to be biased. The first n number of cases,
up to the corresponding proportion, will be biased.
Defaults to |
proportion_biased_variables |
Numeric (length = 1 or |
proportion_biased_variables_range |
Numeric (length = 2).
Range of proportion of variables that should be biased with wording effects.
Values are drawn randomly from a uniform distribution.
Defaults to |
proportion_biased_person |
Numeric (length = 1 or |
proportion_biased_person_range |
Numeric (length = 2).
Range to randomly draw bias from a uniform distribution. Allows for random
person-specific bias to be obtained.
Defaults to |
Returns a list containing:
data |
Biased data simulated data from the specified factor model |
unbiased_data |
The corresponding unbiased data prior to replacing values
to generate the (biased) |
biased_sample_size |
The number of cases that have biased data |
adjusted_results |
Bias-adjusted |
original_results |
Original |
Alexander P. Christensen <[email protected]>, Luis Eduardo Garrido <[email protected]>
Garcia-Pardina, A., Abad, F. J., Christensen, A. P., Golino, H., & Garrido, L. E. (2022). Dimensionality assessment in the presence of wording effects: A network psychometric and factorial approach. PsyArXiv.
Garrido, L. E., Golino, H., Christensen, A. P., Martinez-Molina, A., Arias, V. B., Guerra-Pena, K., ... & Abad, F. J. (2022). A systematic evaluation of wording effects modeling under the exploratory structural equation modeling framework. PsyArXiv.
# Generate factor data two_factor <- simulate_factors( factors = 2, # factors = 2 variables = 6, # variables per factor = 6 loadings = 0.55, # loadings between = 0.45 to 0.65 cross_loadings = 0.05, # cross-loadings N(0, 0.05) correlations = 0.30, # correlation between factors = 0.30 sample_size = 1000, # number of cases = 1000 variable_categories = 5 # 5-point Likert scale ) # Add wording effects using acquiescence method two_factor_acquiescence <- add_wording_effects( lf_object = two_factor, proportion_negative = 0.50, proportion_biased_cases = 0.10, method = "acquiescence" ) # Add wording effects using difficulty method two_factor_difficulty <- add_wording_effects( lf_object = two_factor, proportion_negative = 0.50, proportion_biased_cases = 0.10, method = "difficulty" ) # Add wording effects using random careless method two_factor_random_careless <- add_wording_effects( lf_object = two_factor, proportion_negative = 0.50, proportion_biased_cases = 0.10, method = "random_careless" ) # Add wording effects using straight line method two_factor_random_careless <- add_wording_effects( lf_object = two_factor, proportion_negative = 0.50, proportion_biased_cases = 0.10, method = "straight_line" ) # Add wording effects using mixed method two_factor_mixed <- add_wording_effects( lf_object = two_factor, proportion_negative = 0.50, proportion_biased_cases = 0.10, method = "mixed" ) # Add wording effects using acquiescence and straight line method two_factor_multiple <- add_wording_effects( lf_object = two_factor, proportion_negative = 0.50, proportion_biased_cases = 0.10, method = c("acquiescence", "straight_line") )
# Generate factor data two_factor <- simulate_factors( factors = 2, # factors = 2 variables = 6, # variables per factor = 6 loadings = 0.55, # loadings between = 0.45 to 0.65 cross_loadings = 0.05, # cross-loadings N(0, 0.05) correlations = 0.30, # correlation between factors = 0.30 sample_size = 1000, # number of cases = 1000 variable_categories = 5 # 5-point Likert scale ) # Add wording effects using acquiescence method two_factor_acquiescence <- add_wording_effects( lf_object = two_factor, proportion_negative = 0.50, proportion_biased_cases = 0.10, method = "acquiescence" ) # Add wording effects using difficulty method two_factor_difficulty <- add_wording_effects( lf_object = two_factor, proportion_negative = 0.50, proportion_biased_cases = 0.10, method = "difficulty" ) # Add wording effects using random careless method two_factor_random_careless <- add_wording_effects( lf_object = two_factor, proportion_negative = 0.50, proportion_biased_cases = 0.10, method = "random_careless" ) # Add wording effects using straight line method two_factor_random_careless <- add_wording_effects( lf_object = two_factor, proportion_negative = 0.50, proportion_biased_cases = 0.10, method = "straight_line" ) # Add wording effects using mixed method two_factor_mixed <- add_wording_effects( lf_object = two_factor, proportion_negative = 0.50, proportion_biased_cases = 0.10, method = "mixed" ) # Add wording effects using acquiescence and straight line method two_factor_multiple <- add_wording_effects( lf_object = two_factor, proportion_negative = 0.50, proportion_biased_cases = 0.10, method = c("acquiescence", "straight_line") )
Categorizes continuous data based on Garrido, Abad and Ponsoda (2011; see references). Categorical data with 2 to 6 categories can include skew between -2 to 2 in increments of 0.05
categorize(data, categories, skew_value = 0)
categorize(data, categories, skew_value = 0)
data |
Numeric (length = n).
A vector of continuous data with n values.
For matrices, use |
categories |
Numeric (length = 1). Number of categories to create. Between 2 and 6 categories can be used with skew |
skew_value |
Numeric (length = 1).
Value of skew.
Ranges between -2 to 2 in increments of 0.05.
Skews not in this sequence will be converted to
the nearest value in this sequence.
Defaults to |
Returns a numeric vector of the categorize data
Maria Dolores Nieto Canaveras <[email protected]>, Luis Eduardo Garrido <[email protected]>, Hudson Golino <[email protected]>, Alexander P. Christensen <[email protected]>
Garrido, L. E., Abad, F. J., & Ponsoda, V. (2011).
Performance of Velicer’s minimum average partial factor retention method with categorical variables.
Educational and Psychological Measurement, 71(3), 551-570.
Golino, H., Shi, D., Christensen, A. P., Garrido, L. E., Nieto, M. D., Sadana, R., ... & Martinez-Molina, A. (2020). Investigating the performance of exploratory graph analysis and traditional techniques to identify the number of latent factors: A simulation and tutorial. Psychological Methods, 25(3), 292-320.
# Dichotomous data (no skew) dichotomous <- categorize( data = rnorm(1000), categories = 2 ) # Dichotomous data (with positive skew) dichotomous_skew <- categorize( data = rnorm(1000), categories = 2, skew_value = 1.25 ) # 5-point Likert scale (no skew) five_likert <- categorize( data = rnorm(1000), categories = 5 ) # 5-point Likert scale (negative skew) five_likert <- categorize( data = rnorm(1000), categories = 5, skew_value = -0.45 )
# Dichotomous data (no skew) dichotomous <- categorize( data = rnorm(1000), categories = 2 ) # Dichotomous data (with positive skew) dichotomous_skew <- categorize( data = rnorm(1000), categories = 2, skew_value = 1.25 ) # 5-point Likert scale (no skew) five_likert <- categorize( data = rnorm(1000), categories = 5 ) # 5-point Likert scale (negative skew) five_likert <- categorize( data = rnorm(1000), categories = 5, skew_value = -0.45 )
simulate_factors
Data to Zipf's DistributionZipf's distribution is commonly found for text data. Closely related to the Pareto and power-law distributions, the Zipf's distribution produces highly skewed data. This transformation is intended to mirror the data generating process of Zipf's law seen in semantic network and topic modeling data.
data_to_zipfs(lf_object, beta = 2.7, alpha = 1, dichotomous = FALSE)
data_to_zipfs(lf_object, beta = 2.7, alpha = 1, dichotomous = FALSE)
lf_object |
Data object from |
beta |
Numeric (length = 1).
Sets the shift in rank.
Defaults to |
alpha |
Numeric (length = 1).
Sets the power of the rank.
Defaults to |
dichotomous |
Boolean (length = 1).
Whether data should be dichotomized rather
than frequencies (e.g., semantic network analysis).
Defaults to |
The formula used to transform data is (Piantadosi, 2014):
f(r) proportional to 1 / (r + beta)^alpha
where f(r) is the rth most frequency, r is the rank-order of the data, beta is a shift in the rank (following Mandelbrot, 1953, 1962), and alpha is the power of the rank with greater values suggesting greater differences between the largest frequency to the next, and so forth.
The function will transform continuous data output from simulate_factors
.
See examples to get started
Returns a list containing:
data |
Simulated data that has been transform to follow Zipf's distribution |
RMSE |
A vector of root mean square errors for transformed data and data assumed to follow theoretical Zipf's distribution and Spearman's correlation matrix of the transformed data compared to the original population correlation matrix |
spearman_correlation |
Spearman's correlation matrix of the transformed data |
original_correlation |
Original population correlation matrix before the data were transformed |
original_results |
Original |
Alexander P. Christensen <[email protected]>, Hudson Golino <[email protected]>, Luis Eduardo Garrido <[email protected]>
Mandelbrot, B. (1953). An informational theory of the statistical structure of language. Communication Theory, 84, 486–502.
Mandelbrot, B. (1962). On the theory of word frequencies and on related Markovian models of discourse. Structure of Language and its Mathematical Aspects, 190–219.
Piantadosi, S. T. (2014). Zipf’s word frequency law in natural language: A critical review and future directions. Psychonomic Bulletin & Review, 21(5), 1112-1130.
Zipf, G. (1936). The psychobiology of language. London, UK: Routledge.
Zipf, G. (1949). Human behavior and the principle of least effort. New York, NY: Addison-Wesley.
# Generate factor data two_factor <- simulate_factors( factors = 2, # factors = 2 variables = 6, # variables per factor = 6 loadings = 0.55, # loadings between = 0.45 to 0.65 cross_loadings = 0.05, # cross-loadings N(0, 0.05) correlations = 0.30, # correlation between factors = 0.30 sample_size = 1000 # number of cases = 1000 ) # Transform data to Mandelbrot's Zipf's two_factor_zipfs <- data_to_zipfs( lf_object = two_factor, beta = 2.7, alpha = 1 ) # Transform data to Mandelbrot's Zipf's (dichotomous) two_factor_zipfs_binary <- data_to_zipfs( lf_object = two_factor, beta = 2.7, alpha = 1, dichotomous = TRUE )
# Generate factor data two_factor <- simulate_factors( factors = 2, # factors = 2 variables = 6, # variables per factor = 6 loadings = 0.55, # loadings between = 0.45 to 0.65 cross_loadings = 0.05, # cross-loadings N(0, 0.05) correlations = 0.30, # correlation between factors = 0.30 sample_size = 1000 # number of cases = 1000 ) # Transform data to Mandelbrot's Zipf's two_factor_zipfs <- data_to_zipfs( lf_object = two_factor, beta = 2.7, alpha = 1 ) # Transform data to Mandelbrot's Zipf's (dichotomous) two_factor_zipfs_binary <- data_to_zipfs( lf_object = two_factor, beta = 2.7, alpha = 1, dichotomous = TRUE )
Estimates the number of dimensions in data using Empirical Kaiser Criterion (Braeken & Van Assen, 2017). See examples to get started
EKC(data, sample_size)
EKC(data, sample_size)
data |
Matrix or data frame. Either a dataset with all numeric values (rows = cases, columns = variables) or a symmetric correlation matrix |
sample_size |
Numeric (length = 1).
If input into |
Returns a list containing:
dimensions |
Number of dimensions identified |
eigenvalues |
Eigenvalues |
reference |
Reference values compared against eigenvalues |
Alexander P. Christensen <[email protected]>, Hudson Golino <[email protected]>, Luis Eduardo Garrido <[email protected]>
Braeken, J., & Van Assen, M. A. (2017). An empirical Kaiser criterion. Psychological Methods, 22(3), 450–466.
# Generate factor data two_factor <- simulate_factors( factors = 2, # factors = 2 variables = 6, # variables per factor = 6 loadings = 0.55, # loadings between = 0.45 to 0.65 cross_loadings = 0.05, # cross-loadings N(0, 0.05) correlations = 0.30, # correlation between factors = 0.30 sample_size = 1000 # number of cases = 1000 ) # Perform Empirical Kaiser Criterion EKC(two_factor$data)
# Generate factor data two_factor <- simulate_factors( factors = 2, # factors = 2 variables = 6, # variables per factor = 6 loadings = 0.55, # loadings between = 0.45 to 0.65 cross_loadings = 0.05, # cross-loadings N(0, 0.05) correlations = 0.30, # correlation between factors = 0.30 sample_size = 1000 # number of cases = 1000 ) # Perform Empirical Kaiser Criterion EKC(two_factor$data)
A general function to estimate an Exploratory Structural
Equation Model (ESEM) using the lavaan
package.
With latentFactoR
objects,
the function requires fewer inputs
ESEM( data, factors, variables, estimator = c("MLR", "WLSMV"), fit_measures = NULL, variable_polarity = NULL, wording_factor = c("none", "CTCM1", "CTCM1_each", "RI", "RI_each"), CTCM1_polarity = c("negative", "positive"), ... )
ESEM( data, factors, variables, estimator = c("MLR", "WLSMV"), fit_measures = NULL, variable_polarity = NULL, wording_factor = c("none", "CTCM1", "CTCM1_each", "RI", "RI_each"), CTCM1_polarity = c("negative", "positive"), ... )
data |
Numeric matrix, data frame, or |
factors |
Numeric (length = 1). Number of ESEM factors to estimate |
variables |
Numeric (length = 1 or |
estimator |
Character.
Estimator to be used in |
fit_measures |
Character.
Fit measures to be computed using If scaled values are available (not |
variable_polarity |
Numeric/character (length = 1 or total variables).
Whether all (length = 1) or each variable (length = total variables) are
positive ( |
wording_factor |
Character (length = 1).
Whether wording factor(s) should be estimated.
Defaults to
|
CTCM1_polarity |
Character.
Polarity of the CTCM1 wording factor(s).
Defaults to |
... |
Additional arguments to be passed on to
|
Returns a list containing:
model |
Estimated ESEM model |
fit |
Fit measures of estimated ESEM model |
Alexander P. Christensen <[email protected]>, Luis Eduardo Garrido <[email protected]>
# Generate factor data two_factor <- simulate_factors( factors = 2, # factors = 2 variables = 6, # variables per factor = 6 loadings = 0.55, # loadings between = 0.45 to 0.65 cross_loadings = 0.05, # cross-loadings N(0, 0.05) correlations = 0.30, # correlation between factors = 0.30 sample_size = 1000, # number of cases = 1000 variable_categories = 5 # 5-point Likert scale ) ## Not run: # Estimate ESEM model with no wording effects esem_no_wording_effects <- ESEM( data = two_factor, estimator = "WLSMV" ) # Add wording effects using acquiescence method two_factor_acquiescence <- add_wording_effects( lf_object = two_factor, proportion_negative = 0.50, proportion_biased_cases = 0.10, method = "acquiescence" ) # Estimate ESEM model with wording effects esem_wording_effects <- ESEM( data = two_factor_acquiescence, estimator = "WLSMV", wording_factor = "RI" ) ## End(Not run)
# Generate factor data two_factor <- simulate_factors( factors = 2, # factors = 2 variables = 6, # variables per factor = 6 loadings = 0.55, # loadings between = 0.45 to 0.65 cross_loadings = 0.05, # cross-loadings N(0, 0.05) correlations = 0.30, # correlation between factors = 0.30 sample_size = 1000, # number of cases = 1000 variable_categories = 5 # 5-point Likert scale ) ## Not run: # Estimate ESEM model with no wording effects esem_no_wording_effects <- ESEM( data = two_factor, estimator = "WLSMV" ) # Add wording effects using acquiescence method two_factor_acquiescence <- add_wording_effects( lf_object = two_factor, proportion_negative = 0.50, proportion_biased_cases = 0.10, method = "acquiescence" ) # Estimate ESEM model with wording effects esem_wording_effects <- ESEM( data = two_factor_acquiescence, estimator = "WLSMV", wording_factor = "RI" ) ## End(Not run)
Estimates dimensions using Exploratory Graph Analysis
(EGA
),
Empirical Kaiser Criterion (EKC
),
Factor Forest (factor_forest
),
Exploratory Factor Analysis with out-of-sample prediction (fspe
),
Next Eigenvalue Sufficiency Test (NEST
), and
parallel analysis (fa.parallel
)
estimate_dimensions( data, sample_size, EGA_args = list(corr = "auto", uni.method = "louvain", model = "glasso", consensus.method = "most_common", plot.EGA = FALSE), FF_args = list(maximum_factors = 8, PA_correlation = "cor"), FSPE_args = list(maxK = 8, rep = 1, method = "PE", pbar = FALSE), NEST_args = list(iterations = 1000, maximum_iterations = 500, alpha = 0.05, convergence = 0.00001), PA_args = list(fm = "minres", fa = "both", cor = "cor", n.iter = 20, sim = FALSE, plot = FALSE) )
estimate_dimensions( data, sample_size, EGA_args = list(corr = "auto", uni.method = "louvain", model = "glasso", consensus.method = "most_common", plot.EGA = FALSE), FF_args = list(maximum_factors = 8, PA_correlation = "cor"), FSPE_args = list(maxK = 8, rep = 1, method = "PE", pbar = FALSE), NEST_args = list(iterations = 1000, maximum_iterations = 500, alpha = 0.05, convergence = 0.00001), PA_args = list(fm = "minres", fa = "both", cor = "cor", n.iter = 20, sim = FALSE, plot = FALSE) )
data |
Matrix or data frame. Either a dataset with all numeric values (rows = cases, columns = variables) or a symmetric correlation matrix |
sample_size |
Numeric (length = 1).
If input into |
EGA_args |
List.
List of arguments to be passed along to
|
FF_args |
List.
List of arguments to be passed along to
|
FSPE_args |
List.
List of arguments to be passed along to
|
NEST_args |
List.
List of arguments to be passed along to
|
PA_args |
List.
List of arguments to be passed along to
|
Returns a list containing:
dimensions |
Dimensions estimated from each method |
A list of each methods output (see their respective functions for their outputs)
Maria Dolores Nieto Canaveras <[email protected]>, Alexander P. Christensen <[email protected]>, Hudson Golino <[email protected]>, Luis Eduardo Garrido <[email protected]>
# Generate factor data two_factor <- simulate_factors( factors = 2, # factors = 2 variables = 6, # variables per factor = 6 loadings = 0.55, # loadings between = 0.45 to 0.65 cross_loadings = 0.05, # cross-loadings N(0, 0.05) correlations = 0.30, # correlation between factors = 0.30 sample_size = 1000 # number of cases = 1000 ) ## Not run: # Estimate dimensions estimate_dimensions(two_factor$data) ## End(Not run)
# Generate factor data two_factor <- simulate_factors( factors = 2, # factors = 2 variables = 6, # variables per factor = 6 loadings = 0.55, # loadings between = 0.45 to 0.65 cross_loadings = 0.05, # cross-loadings N(0, 0.05) correlations = 0.30, # correlation between factors = 0.30 sample_size = 1000 # number of cases = 1000 ) ## Not run: # Estimate dimensions estimate_dimensions(two_factor$data) ## End(Not run)
Estimates the number of dimensions in data using the pre-trained Random Forest model from Goretzko and Buhner (2020, 2022). See examples to get started
factor_forest( data, sample_size, maximum_factors = 8, PA_correlation = c("cor", "poly", "tet") )
factor_forest( data, sample_size, maximum_factors = 8, PA_correlation = c("cor", "poly", "tet") )
data |
Matrix or data frame. Either a dataset with all numeric values (rows = cases, columns = variables) or a symmetric correlation matrix |
sample_size |
Numeric (length = 1).
If input into |
maximum_factors |
Numeric (length = 1).
Maximum number of factors to search over.
Defaults to |
PA_correlation |
Character (length = 1).
Type of correlation used in
|
Returns a list containing:
dimensions |
Number of dimensions identified |
probabilities |
Probability that the number of dimensions is most likely |
# Authors of Factor Forest
David Goretzko and Markus Buhner
# Authors of {latentFactoR}
Alexander P. Christensen <[email protected]>,
Hudson Golino <[email protected]>,
Luis Eduardo Garrido <[email protected]>
Goretzko, D., & Buhner, M. (2022). Factor retention using machine learning with ordinal data. Applied Psychological Measurement, 01466216221089345.
Goretzko, D., & Buhner, M. (2020). One model to rule them all? Using machine learning algorithms to determine the number of factors in exploratory factor analysis. Psychological Methods, 25(6), 776-786.
# Generate factor data two_factor <- simulate_factors( factors = 2, # factors = 2 variables = 6, # variables per factor = 6 loadings = 0.55, # loadings between = 0.45 to 0.65 cross_loadings = 0.05, # cross-loadings N(0, 0.05) correlations = 0.30, # correlation between factors = 0.30 sample_size = 1000 # number of cases = 1000 ) ## Not run: # Perform Factor Forest factor_forest(two_factor$data) ## End(Not run)
# Generate factor data two_factor <- simulate_factors( factors = 2, # factors = 2 variables = 6, # variables per factor = 6 loadings = 0.55, # loadings between = 0.45 to 0.65 cross_loadings = 0.05, # cross-loadings N(0, 0.05) correlations = 0.30, # correlation between factors = 0.30 sample_size = 1000 # number of cases = 1000 ) ## Not run: # Perform Factor Forest factor_forest(two_factor$data) ## End(Not run)
Estimates the number of dimensions in data using NEST (Achim, 2017). See examples to get started
NEST( data, sample_size, iterations = 1000, maximum_iterations = 500, alpha = 0.05, convergence = 0.00001 )
NEST( data, sample_size, iterations = 1000, maximum_iterations = 500, alpha = 0.05, convergence = 0.00001 )
data |
Matrix or data frame. Either a dataset with all numeric values (rows = cases, columns = variables) or a symmetric correlation matrix |
sample_size |
Numeric (length = 1).
If input into |
iterations |
Numeric (length = 1).
Number of iterations to estimate rank.
Defaults to |
maximum_iterations |
Numeric (length = 1).
Maximum umber of iterations to obtain convergence
of eigenvalues.
Defaults to |
alpha |
Numeric (length = 1).
Significance level for determine sufficient eigenvalues.
Defaults to |
convergence |
Numeric (length = 1). Value necessary to be less than or equal to when establishing convergence of eigenvalues |
Returns a list containing:
dimensions |
Number of dimensions identified |
loadings |
Loading matrix |
converged |
Whether estimation converged. If |
Alexander P. Christensen <[email protected]>, Hudson Golino <[email protected]>, Luis Eduardo Garrido <[email protected]>
Achim, A. (2017). Testing the number of required dimensions in exploratory factor analysis. The Quantitative Methods for Psychology, 13(1), 64–74.
Brandenburg, N., & Papenberg, M. (2022). Reassessment of innovative methods to determine the number of factors: A simulation-Based comparison of Exploratory Graph Analysis and Next Eigenvalue Sufficiency Test. Psychological Methods.
# Generate factor data two_factor <- simulate_factors( factors = 2, # factors = 2 variables = 6, # variables per factor = 6 loadings = 0.55, # loadings between = 0.45 to 0.65 cross_loadings = 0.05, # cross-loadings N(0, 0.05) correlations = 0.30, # correlation between factors = 0.30 sample_size = 1000 # number of cases = 1000 ) ## Not run: # Perform NEST NEST(two_factor$data) ## End(Not run)
# Generate factor data two_factor <- simulate_factors( factors = 2, # factors = 2 variables = 6, # variables per factor = 6 loadings = 0.55, # loadings between = 0.45 to 0.65 cross_loadings = 0.05, # cross-loadings N(0, 0.05) correlations = 0.30, # correlation between factors = 0.30 sample_size = 1000 # number of cases = 1000 ) ## Not run: # Perform NEST NEST(two_factor$data) ## End(Not run)
Zipf's distribution is commonly found for text data. Closely related to the Pareto and power-law distributions, the Zipf's distribution produces highly skewed data. This function obtains the best fitting parameters to Zipf's distribution
obtain_zipfs_parameters(data)
obtain_zipfs_parameters(data)
data |
Numeric vector, matrix, or data frame. Numeric data to determine Zipf's distribution parameters |
The best parameters are optimized by minimizing the aboslute difference between the original frequencies and the frequencies obtained by the beta and alpha parameters in the following formula (Piantadosi, 2014):
f(r) proportional to 1 / (r + beta)^alpha
where f(r) is the rth most frequency, r is the rank-order of the data, beta is a shift in the rank (following Mandelbrot, 1953, 1962), and alpha is the power of the rank with greater values suggesting greater differences between the largest frequency to the next, and so forth.
Returns a vector containing the estimated beta
and
alpha
parameters. Also contains zipfs_sse
which corresponds
to the sum of square error between frequencies based
on the parameter values estimated and the original data frequencies
Alexander P. Christensen <[email protected]>, Hudson Golino <[email protected]>, Luis Eduardo Garrido <[email protected]>
Mandelbrot, B. (1953). An informational theory of the statistical structure of language. Communication Theory, 84, 486–502.
Mandelbrot, B. (1962). On the theory of word frequencies and on related Markovian models of discourse. Structure of Language and its Mathematical Aspects, 190–219.
Piantadosi, S. T. (2014). Zipf’s word frequency law in natural language: A critical review and future directions. Psychonomic Bulletin & Review, 21(5), 1112-1130.
# Generate factor data two_factor <- simulate_factors( factors = 2, # factors = 2 variables = 6, # variables per factor = 6 loadings = 0.55, # loadings between = 0.45 to 0.65 cross_loadings = 0.05, # cross-loadings N(0, 0.05) correlations = 0.30, # correlation between factors = 0.30 sample_size = 1000 # number of cases = 1000 ) # Transform data to Mandelbrot's Zipf's two_factor_zipfs <- data_to_zipfs( lf_object = two_factor, beta = 2.7, alpha = 1 ) # Obtain Zipf's distribution parameters obtain_zipfs_parameters(two_factor_zipfs$data)
# Generate factor data two_factor <- simulate_factors( factors = 2, # factors = 2 variables = 6, # variables per factor = 6 loadings = 0.55, # loadings between = 0.45 to 0.65 cross_loadings = 0.05, # cross-loadings N(0, 0.05) correlations = 0.30, # correlation between factors = 0.30 sample_size = 1000 # number of cases = 1000 ) # Transform data to Mandelbrot's Zipf's two_factor_zipfs <- data_to_zipfs( lf_object = two_factor, beta = 2.7, alpha = 1 ) # Obtain Zipf's distribution parameters obtain_zipfs_parameters(two_factor_zipfs$data)
Simulates data from a latent factor model based on many manipulable parameters. Parameters do not have default values and must each be set. See examples to get started
simulate_factors( factors, variables, variables_range = NULL, loadings, loadings_range = NULL, cross_loadings, cross_loadings_range = NULL, correlations, correlations_range = NULL, sample_size, variable_categories = Inf, categorical_limit = 7, skew = 0, skew_range = NULL )
simulate_factors( factors, variables, variables_range = NULL, loadings, loadings_range = NULL, cross_loadings, cross_loadings_range = NULL, correlations, correlations_range = NULL, sample_size, variable_categories = Inf, categorical_limit = 7, skew = 0, skew_range = NULL )
factors |
Numeric (length = 1). Number of factors |
variables |
Numeric (length = 1 or |
variables_range |
Numeric (length = 2). Range of variables to randomly select from a random uniform distribution. Minimum three variables per factor |
loadings |
Numeric or matrix (length = 1, |
loadings_range |
Numeric (length = 2). Range of loadings to randomly select from a random uniform distribution. General effect sizes range from small (0.40), moderate (0.55), to large (0.70) |
cross_loadings |
Numeric or matrix(length = 1, |
cross_loadings_range |
Numeric (length = 2). Range of cross-loadings to randomly select from a random uniform distribution |
correlations |
Numeric (length = 1 or |
correlations_range |
Numeric (length = 2). Range of correlations to randomly select from a random uniform distribution. General effect sizes range from orthogonal (0.00), small (0.30), moderate (0.50), to large (0.70) |
sample_size |
Numeric (length = 1).
Number of cases to generate from a random multivariate normal distribution using
|
variable_categories |
Numeric (length = 1 or total variables ( |
categorical_limit |
Numeric (length = 1).
Values greater than input value are considered continuous.
Defaults to |
skew |
Numeric (length = 1 or categorical variables).
Skew to be included in categorical variables. It is randomly sampled from provided values.
Can be a single value or as many values as there are (total) variables.
Current skew implementation is between -2 and 2 in increments of 0.05.
Skews that are not in this sequence will be converted to their nearest
value in the sequence. Not recommended to use with |
skew_range |
Numeric (length = 2).
Randomly selects skews within in the range.
Somewhat redundant with |
Returns a list containing:
data |
Simulated data from the specified factor model |
population_correlation |
Population correlation matrix |
parameters |
A list containing the parameters used to generate the data:
|
Maria Dolores Nieto Canaveras <[email protected]>, Alexander P. Christensen <[email protected]>, Hudson Golino <[email protected]>, Luis Eduardo Garrido <[email protected]>
Garrido, L. E., Abad, F. J., & Ponsoda, V. (2011).
Performance of Velicer’s minimum average partial factor retention method with categorical variables.
Educational and Psychological Measurement, 71(3), 551-570.
Golino, H., Shi, D., Christensen, A. P., Garrido, L. E., Nieto, M. D., Sadana, R., ... & Martinez-Molina, A. (2020). Investigating the performance of exploratory graph analysis and traditional techniques to identify the number of latent factors: A simulation and tutorial. Psychological Methods, 25(3), 292-320.
# Generate factor data two_factor <- simulate_factors( factors = 2, # factors = 2 variables = 6, # variables per factor = 6 loadings = 0.55, # loadings between = 0.45 to 0.65 cross_loadings = 0.05, # cross-loadings N(0, 0.05) correlations = 0.30, # correlation between factors = 0.30 sample_size = 1000 # number of cases = 1000 ) # Randomly vary loadings two_factor_loadings <- simulate_factors( factors = 2, # factors = 2 variables = 6, # variables per factor = 6 loadings_range = c(0.30, 0.80), # loadings between = 0.30 to 0.80 cross_loadings = 0.05, # cross-loadings N(0, 0.05) correlations = 0.30, # correlation between factors = 0.30 sample_size = 1000 # number of cases = 1000 ) # Generate dichotomous data two_factor_dichotomous <- simulate_factors( factors = 2, # factors = 2 variables = 6, # variables per factor = 6 loadings = 0.55, # loadings between = 0.45 to 0.65 cross_loadings = 0.05, # cross-loadings N(0, 0.05) correlations = 0.30, # correlation between factors = 0.30 sample_size = 1000, # number of cases = 1000 variable_categories = 2 # dichotomous data ) # Generate dichotomous data with skew two_factor_dichotomous_skew <- simulate_factors( factors = 2, # factors = 2 variables = 6, # variables per factor = 6 loadings = 0.55, # loadings between = 0.45 to 0.65 cross_loadings = 0.05, # cross-loadings N(0, 0.05) correlations = 0.30, # correlation between factors = 0.30 sample_size = 1000, # number of cases = 1000 variable_categories = 2, # dichotomous data skew = 1 # all variables with have a skew of 1 ) # Generate dichotomous data with variable skew two_factor_dichotomous_skew <- simulate_factors( factors = 2, # factors = 2 variables = 6, # variables per factor = 6 loadings = 0.55, # loadings between = 0.45 to 0.65 cross_loadings = 0.05, # cross-loadings N(0, 0.05) correlations = 0.30, # correlation between factors = 0.30 sample_size = 1000, # number of cases = 1000 variable_categories = 2, # dichotomous data skew_range = c(-2, 2) # skew = -2 to 2 (increments of 0.05) )
# Generate factor data two_factor <- simulate_factors( factors = 2, # factors = 2 variables = 6, # variables per factor = 6 loadings = 0.55, # loadings between = 0.45 to 0.65 cross_loadings = 0.05, # cross-loadings N(0, 0.05) correlations = 0.30, # correlation between factors = 0.30 sample_size = 1000 # number of cases = 1000 ) # Randomly vary loadings two_factor_loadings <- simulate_factors( factors = 2, # factors = 2 variables = 6, # variables per factor = 6 loadings_range = c(0.30, 0.80), # loadings between = 0.30 to 0.80 cross_loadings = 0.05, # cross-loadings N(0, 0.05) correlations = 0.30, # correlation between factors = 0.30 sample_size = 1000 # number of cases = 1000 ) # Generate dichotomous data two_factor_dichotomous <- simulate_factors( factors = 2, # factors = 2 variables = 6, # variables per factor = 6 loadings = 0.55, # loadings between = 0.45 to 0.65 cross_loadings = 0.05, # cross-loadings N(0, 0.05) correlations = 0.30, # correlation between factors = 0.30 sample_size = 1000, # number of cases = 1000 variable_categories = 2 # dichotomous data ) # Generate dichotomous data with skew two_factor_dichotomous_skew <- simulate_factors( factors = 2, # factors = 2 variables = 6, # variables per factor = 6 loadings = 0.55, # loadings between = 0.45 to 0.65 cross_loadings = 0.05, # cross-loadings N(0, 0.05) correlations = 0.30, # correlation between factors = 0.30 sample_size = 1000, # number of cases = 1000 variable_categories = 2, # dichotomous data skew = 1 # all variables with have a skew of 1 ) # Generate dichotomous data with variable skew two_factor_dichotomous_skew <- simulate_factors( factors = 2, # factors = 2 variables = 6, # variables per factor = 6 loadings = 0.55, # loadings between = 0.45 to 0.65 cross_loadings = 0.05, # cross-loadings N(0, 0.05) correlations = 0.30, # correlation between factors = 0.30 sample_size = 1000, # number of cases = 1000 variable_categories = 2, # dichotomous data skew_range = c(-2, 2) # skew = -2 to 2 (increments of 0.05) )
Tables for skew based on the number of categories (2, 3, 4, 5, or 6) in the data
data(skew_tables)
data(skew_tables)
A list (length = 5)
data("skew_tables")
data("skew_tables")