| Title: | Smooth L0 Penalty Approximations for Gaussian Graphical Models |
|---|---|
| Description: | Provides smooth approximations to the L0 norm penalty for estimating sparse Gaussian graphical models (GGMs). Network estimation is performed using the Local Linear Approximation (LLA) framework (Fan & Li, 2001 <doi:10.1198/016214501753382273>; Zou & Li, 2008 <doi:10.1214/009053607000000802>) with five penalty functions: arctangent (Wang & Zhu, 2016 <doi:10.1155/2016/6495417>), EXP (Wang, Fan, & Zhu, 2018 <doi:10.1007/s10463-016-0588-3>), Gumbel, Log (Candes, Wakin, & Boyd, 2008 <doi:10.1007/s00041-008-9045-x>), and Weibull. Adaptive penalty parameters for EXP, Gumbel, and Weibull are estimated via maximum likelihood, and model selection uses information criteria including AIC, BIC, and EBIC (Extended BIC). Simulation functions generate multivariate normal data from GGMs with stochastic block model or small-world (Watts-Strogatz) network structures. |
| Authors: | Alexander Christensen [aut, cre] (ORCID: <https://orcid.org/0000-0002-9798-7037>), Jeongwon Choi [ctb] (ORCID: <https://orcid.org/0000-0001-6087-2124>), John Fox [cph, ctb] (Original implementation of polyserial correlations in auto_correlate.R), Yves Rosseel [cph, ctb] (Original implementation of rmsea_ci in network_fit.R), Alexander Robitzsch [cph, ctb] (C++ implementation of Drezner-Wesolowsky bivariate normal CDF in polychoric_matrix.c), David Blackman [ctb] (Original xoshiro.c implementation), Sebastiano Vigna [ctb] (Original xoshiro.c implementation), John Burkardt [cph, ctb] (Original ziggurat.c implementation) |
| Maintainer: | Alexander Christensen <[email protected]> |
| License: | AGPL (>= 3.0) |
| Version: | 0.1.0 |
| Built: | 2026-05-29 19:54:38 UTC |
| Source: | https://github.com/alexchristensen/l0ggm |
Implements L0 norm regularization penalties for Gaussian graphical models
Alexander P. Christensen <[email protected]>
Useful links:
Report bugs at https://github.com/AlexChristensen/L0ggm/issues
Automatically computes the proper correlations between
continuous and categorical variables. NA values are not treated
as categories
auto_correlate( data, corr = c("kendall", "pearson", "spearman"), ordinal_categories = 7, forcePD = TRUE, na_data = c("pairwise", "listwise"), empty_method = c("none", "zero", "all"), empty_value = c("none", "point_five", "one_over"), forceReturn = FALSE, verbose = FALSE, ... )auto_correlate( data, corr = c("kendall", "pearson", "spearman"), ordinal_categories = 7, forcePD = TRUE, na_data = c("pairwise", "listwise"), empty_method = c("none", "zero", "all"), empty_value = c("none", "point_five", "one_over"), forceReturn = FALSE, verbose = FALSE, ... )
data |
Matrix or data frame. Should consist only of variables to be used in the analysis |
corr |
Character (length = 1).
The standard correlation method to be used.
Defaults to |
ordinal_categories |
Numeric (length = 1).
Up to the number of categories before a variable is considered continuous.
Defaults to |
forcePD |
Boolean (length = 1).
Whether positive definite matrix should be enforced.
Defaults to |
na_data |
Character (length = 1).
How should missing data be handled?
Defaults to
|
empty_method |
Character (length = 1).
Method for empty cell correction in
|
empty_value |
Character (length = 1).
Value to add to the joint frequency table cells in
|
forceReturn |
Boolean (length = 1).
Whether correlation matrix should be forced to return.
Defaults to |
verbose |
Boolean (length = 1).
Whether messages should be printed.
Defaults to |
... |
Not actually used but makes it easier for general functionality in the package |
A symmetric numeric matrix of dimension , where
is the number of variables in data, with values in
and ones on the diagonal. The correlation method used for
each pair of variables depends on their types when corr = "pearson"
(the default): polychoric for ordinal-ordinal pairs, polyserial for
ordinal-continuous pairs, and Pearson's for continuous-continuous pairs,
where ordinal means having at most ordinal_categories unique
values. When corr is "kendall" or "spearman",
cor is used for all pairs regardless of variable
type. Row and column names are inherited from data. When
forcePD = TRUE (default) and the resulting matrix is not positive
definite, the nearest positive definite matrix is returned via
nearPD.
Alexander P. Christensen <[email protected]>
# Obtain correlations R <- auto_correlate(basic_smallworld)# Obtain correlations R <- auto_correlate(basic_smallworld)
A toy small-world network example (n = 500)
data(basic_smallworld)data(basic_smallworld)
A 500 20 continuous response matrix
# Generated using: # basic_smallworld <- simulate_smallworld( # nodes = 20, # 20 nodes in the network # density = 0.30, # moderate initial lattice connectivity # rewire = 0.20, # 20% rewiring probability # sample_size = 500 # number of cases = 500 # )$data data("basic_smallworld")# Generated using: # basic_smallworld <- simulate_smallworld( # nodes = 20, # 20 nodes in the network # density = 0.30, # moderate initial lattice connectivity # rewire = 0.20, # 20% rewiring probability # sample_size = 500 # number of cases = 500 # )$data data("basic_smallworld")
Categorizes continuous data based on Garrido, Abad and Ponsoda (2011; see references). Categorical data with 2 to 6 categories can include skew between -2 to 2 in increments of 0.05
categorize(data, categories, skew_value = 0)categorize(data, categories, skew_value = 0)
data |
Numeric (length = n).
A vector of continuous data with n values.
For matrices, use |
categories |
Numeric (length = 1). Number of categories to create. Between 2 and 6 categories can be used with skew |
skew_value |
Numeric (length = 1).
Value of skew.
Ranges between -2 to 2 in increments of 0.05.
Skews not in this sequence will be converted to
the nearest value in this sequence.
Defaults to |
An integer vector of length n with values from 1 to
categories, giving the category assignment for each observation.
Category 1 corresponds to the lowest values of data and
category categories to the highest. When categories > 6,
cut is used and skew is not applied.
Maria Dolores Nieto Canaveras <[email protected]>, Luis Eduardo Garrido <[email protected]>, Hudson Golino <[email protected]>, Alexander P. Christensen <[email protected]>
Garrido, L. E., Abad, F. J., & Ponsoda, V. (2011).
Performance of Velicer’s minimum average partial factor retention method with categorical variables.
Educational and Psychological Measurement, 71(3), 551-570.
Golino, H., Shi, D., Christensen, A. P., Garrido, L. E., Nieto, M. D., Sadana, R., ... & Martinez-Molina, A. (2020). Investigating the performance of exploratory graph analysis and traditional techniques to identify the number of latent factors: A simulation and tutorial. Psychological Methods, 25(3), 292-320.
# Dichotomous data (no skew) dichotomous <- categorize( data = rnorm(1000), categories = 2 ) # Dichotomous data (with positive skew) dichotomous_skew <- categorize( data = rnorm(1000), categories = 2, skew_value = 1.25 ) # 5-point Likert scale (no skew) five_likert <- categorize( data = rnorm(1000), categories = 5 ) # 5-point Likert scale (negative skew) five_likert <- categorize( data = rnorm(1000), categories = 5, skew_value = -0.45 )# Dichotomous data (no skew) dichotomous <- categorize( data = rnorm(1000), categories = 2 ) # Dichotomous data (with positive skew) dichotomous_skew <- categorize( data = rnorm(1000), categories = 2, skew_value = 1.25 ) # 5-point Likert scale (no skew) five_likert <- categorize( data = rnorm(1000), categories = 5 ) # 5-point Likert scale (negative skew) five_likert <- categorize( data = rnorm(1000), categories = 5, skew_value = -0.45 )
Computes many commonly used confusion matrix metrics
edge_confusion( base, comparison, metric = c("all", "sen", "spec", "ppv", "npv", "fdr", "fom", "ba", "f1", "csi", "mcc"), full.names = FALSE )edge_confusion( base, comparison, metric = c("all", "sen", "spec", "ppv", "npv", "fdr", "fom", "ba", "f1", "csi", "mcc"), full.names = FALSE )
base |
Matrix or data frame.
Network that will be treated as the "ground truth" such that
a false positive represents an edge that is present in |
comparison |
Matrix or data frame.
Network that will be treated as the estimator such that a
false positive represents an edge that is present in this network
but not in |
metric |
Character vector.
Defaults to
|
full.names |
Boolean (length = 1).
Whether full or abbreviated names should be used.
Defaults to |
A named numeric vector of the requested confusion matrix metrics.
Values are generally in , with the exception of "mcc"
(Matthews Correlation Coefficient), which ranges from to
where 1 indicates perfect agreement, 0 indicates chance-level performance,
and -1 indicates perfect disagreement. The vector contains only the elements
specified by metric (all ten metrics when metric = "all",
which is the default). Names are abbreviated (e.g., "sen") when
full.names = FALSE (default), or expanded
(e.g., "Sensitivity") when full.names = TRUE. Any metric
whose denominator is zero (e.g., sensitivity when no edges exist in
base) is returned as NA.
Alexander P. Christensen <[email protected]>
# Set split split <- sample( 1:nrow(basic_smallworld), round(nrow(basic_smallworld) / 2) ) # Estimate networks split1 <- network_estimation(basic_smallworld[split,]) split2 <- network_estimation(basic_smallworld[-split,]) # Estimate metrics edge_confusion(split1, split2)# Set split split <- sample( 1:nrow(basic_smallworld), round(nrow(basic_smallworld) / 2) ) # Estimate networks split1 <- network_estimation(basic_smallworld[split,]) split2 <- network_estimation(basic_smallworld[-split,]) # Estimate metrics edge_confusion(split1, split2)
A general function to estimate Gaussian graphical models using L0 penalty approximations. All penalties are implemented using either single-pass or full Local Linear Approximation (LLA: Fan & Li, 2001; Zou & Li, 2008)
network_estimation( data, n = NULL, corr = c("auto", "pearson", "spearman"), na_data = c("pairwise", "listwise"), penalty = c("atan", "exp", "gumbel", "log", "weibull"), gamma = NULL, adaptive = TRUE, nlambda = 50, lambda_min_ratio = 0.01, penalize_diagonal = TRUE, ic = c("AIC", "AICc", "BIC", "BIC0", "EBIC", "MBIC"), ebic_gamma = 0.5, fast = TRUE, LLA = FALSE, LLA_threshold = 0.001, LLA_iter = 10000, network_only = TRUE, verbose = FALSE, ... )network_estimation( data, n = NULL, corr = c("auto", "pearson", "spearman"), na_data = c("pairwise", "listwise"), penalty = c("atan", "exp", "gumbel", "log", "weibull"), gamma = NULL, adaptive = TRUE, nlambda = 50, lambda_min_ratio = 0.01, penalize_diagonal = TRUE, ic = c("AIC", "AICc", "BIC", "BIC0", "EBIC", "MBIC"), ebic_gamma = 0.5, fast = TRUE, LLA = FALSE, LLA_threshold = 0.001, LLA_iter = 10000, network_only = TRUE, verbose = FALSE, ... )
data |
Matrix or data frame. Should consist only of variables to be used in the analysis |
n |
Numeric (length = 1).
Sample size must be provided if |
corr |
Character (length = 1).
Method to compute correlations.
Defaults to
For other similarity measures, compute them first and input them
into |
na_data |
Character (length = 1).
How should missing data be handled?
Defaults to
|
penalty |
Character (length = 1).
Defaults to
|
gamma |
Numeric (length = 1). Adjusts the shape of the penalty. Defaults:
|
adaptive |
Boolean (length = 1).
Whether data-adaptive (gamma) parameters should be used.
Defaults to
When |
nlambda |
Numeric (length = 1).
Number of lambda values to test.
Defaults to |
lambda_min_ratio |
Numeric (length = 1).
Ratio of lowest lambda value compared to maximal lambda.
Defaults to |
penalize_diagonal |
Boolean (length = 1).
Should the diagonal be penalized?
Defaults to |
ic |
Character (length = 1). What information criterion should be used for model selection? Available options include:
Term definitions:
Defaults to |
ebic_gamma |
Numeric (length = 1)
Value to set gamma parameter in EBIC (see above).
Defaults to Only used if |
fast |
Boolean (length = 1).
Whether the The fast results may differ by less than floating point of the original
GLASSO implemented by |
LLA |
Boolean (length = 1).
Should Local Linear Approximation be used to find optimal minimum?
Defaults to |
LLA_threshold |
Numeric (length = 1).
When performing the Local Linear Approximation, the maximum threshold
until convergence is met.
Defaults to |
LLA_iter |
Numeric (length = 1).
Maximum number of iterations to perform to reach convergence.
Defaults to |
network_only |
Boolean (length = 1).
Whether the network only should be output.
Defaults to |
verbose |
Boolean (length = 1).
Whether messages and (insignificant) warnings should be output.
Defaults to |
... |
Additional arguments to be passed on to |
When network_only = TRUE (default), returns a
numeric matrix of partial correlations representing the estimated Gaussian
graphical model. Off-diagonal entry is the partial correlation
between variables and controlling for all other variables,
with values in ; a value of zero indicates the absence of an
edge. Diagonal entries are zero. Row and column names are inherited from
data.
When network_only = FALSE, returns a named list with the following
elements:
networkThe partial correlation matrix
described above
KThe estimated inverse covariance
(precision) matrix at the optimal lambda. Diagonal entries are the
conditional precisions; off-diagonal entries are proportional to
partial covariances
RThe regularized covariance matrix
returned by GLASSO at the optimal lambda (the w component of the
GLASSO solution)
penaltyCharacter string naming the penalty function used
(one of "atan", "exp", "gumbel", "log",
"weibull")
lambdaNumeric scalar giving the regularization parameter value selected by the information criterion. Larger values correspond to sparser networks
gammaNumeric scalar giving the shape parameter of the
penalty actually used. For adaptive penalties (adaptive = TRUE),
this is the data-derived value; otherwise it is the default or
user-supplied value
correlationThe empirical correlation
matrix computed from data, used as input to the GLASSO
criterionCharacter string naming the information criterion
used for model selection (e.g., "bic")
ICNumeric scalar giving the value of the information criterion at the optimal lambda
Alexander P. Christensen <[email protected]>
Log penalty
Candes, E. J., Wakin, M. B., & Boyd, S. P. (2008).
Enhancing sparsity by reweighted l1 minimization.
Journal of Fourier Analysis and Applications, 14(5), 877–905.
BIC0
Dicker, L., Huang, B., & Lin, X. (2013).
Variable selection and estimation with the seamless-L0 penalty.
Statistica Sinica, 23(2), 929–962.
Local Linear Approximation
Fan, J., & Li, R. (2001).
Variable selection via nonconcave penalized likelihood and its oracle properties.
Journal of the American Statistical Association, 96(456), 1348–1360.
EXP penalty
Wang, Y., Fan, Q., & Zhu, L. (2018).
Variable selection and estimation using a continuous approximation to the L0 penalty.
Annals of the Institute of Statistical Mathematics, 70(1), 191–214.
Atan penalty
Wang, Y., & Zhu, L. (2016).
Variable selection and parameter estimation with the Atan regularization method.
Journal of Probability and Statistics, 2016, 1–12.
Seminal simulation in network psychometrics
Williams, D. R. (2020).
Beyond lasso: A survey of nonconvex regularization in Gaussian graphical models.
PsyArXiv.
One-step Local Linear Approximation
Zou, H., & Li, R. (2008).
One-step sparse estimates in nonconcave penalized likelihood models.
Annals of Statistics, 36(4), 1509–1533.
# Obtain default estimator (adaptive Weibull) weibull_network <- network_estimation( data = basic_smallworld, LLA = TRUE ) # Obtain Atan network atan_network <- network_estimation( data = basic_smallworld, penalty = "atan", LLA = TRUE ) # Obtain static EXP network exp_network <- network_estimation( data = basic_smallworld, penalty = "exp", adaptive = FALSE, LLA = TRUE )# Obtain default estimator (adaptive Weibull) weibull_network <- network_estimation( data = basic_smallworld, LLA = TRUE ) # Obtain Atan network atan_network <- network_estimation( data = basic_smallworld, penalty = "atan", LLA = TRUE ) # Obtain static EXP network exp_network <- network_estimation( data = basic_smallworld, penalty = "exp", adaptive = FALSE, LLA = TRUE )
Computes several traditional fit metrics for networks including
chi-square ()
root mean square error of approximation (RMSEA) with confidence intervals
confirmatory fit index (CFI)
Tucker-Lewis index (TLI)
standardized root mean residual (SRMR)
log-likelihood
Akaike's information criterion (AIC)
Bayesian information criterion (BIC)
network_fit(network, n, S, ci = 0.95)network_fit(network, n, S, ci = 0.95)
network |
Matrix or data frame. A p by p square network matrix |
n |
Numeric (length = 1). Sample size |
S |
Matrix or data frame.
A p by p square zero-order correlation matrix corresponding
with the input |
ci |
Numeric (length = 1).
Confidence interval for RMSEA.
Defaults to |
A named numeric vector of traditional and likelihood-based fit indices. The vector always contains the following elements:
chisqChi-square statistic
(), where is the maximum
likelihood discrepancy function between the model-implied and empirical
correlation matrices
dfDegrees of freedom: total number of unique off-diagonal
correlations minus the number of non-zero edges in network
chisq.p.valuep-value for the chi-square test of exact fit (H0: model-implied covariance equals the population covariance)
RMSEARoot mean square error of approximation. Values
0.05 indicate close fit; values 0.08 indicate
acceptable fit
RMSEA.XX.lower, RMSEA.XX.upper
Lower and upper
bounds of the ci-level confidence interval for RMSEA, where
XX is the integer percentage (e.g., RMSEA.95.lower and
RMSEA.95.upper for a 95% CI)
RMSEA.p.valuep-value for the one-sided test of close fit
(H0: RMSEA 0.05)
CFIComparative fit index, comparing the target model to
an independence (null) baseline. Values 0.95 indicate
acceptable fit
TLITucker-Lewis index (non-normed fit index). Values
0.95 indicate acceptable fit; can fall outside
for severely misspecified models
SRMRStandardized root mean residual: the root mean squared
difference between the model-implied and observed correlation matrices.
Values 0.08 indicate acceptable fit
logLikGaussian log-likelihood of the model-implied correlation matrix, assuming zero mean structure (means are not estimated)
AICAkaike's information criterion:
, where is the number of
non-zero edges in network
BICBayesian information criterion:
Alexander P. Christensen <[email protected]>
Epskamp, S., Rhemtulla, M., & Borsboom, D. (2017). Generalized network psychometrics: Combining network and latent variable models. Psychometrika, 82(4), 904–927.
# Obtain correlation matrix S <- auto_correlate(basic_smallworld) # Obtain Weibull network weibull_network <- network_estimation(data = basic_smallworld, LLA = TRUE) # Obtain fit (expects continuous variables!) network_fit(network = weibull_network, n = nrow(basic_smallworld), S = S) # Scaled metrics are not yet available for # dichotomous or polytomous data!# Obtain correlation matrix S <- auto_correlate(basic_smallworld) # Obtain Weibull network weibull_network <- network_estimation(data = basic_smallworld, LLA = TRUE) # Obtain fit (expects continuous variables!) network_fit(network = weibull_network, n = nrow(basic_smallworld), S = S) # Scaled metrics are not yet available for # dichotomous or polytomous data!
A fast implementation of polychoric correlations in C. Uses the Beasley-Springer-Moro algorithm (Boro & Springer, 1977; Moro, 1995) to estimate the inverse univariate normal CDF, the Drezner-Wesolosky approximation (Drezner & Wesolosky, 1990) to estimate the bivariate normal CDF, and Brent's method (Brent, 2013) for optimization of rho
polychoric_matrix( data, na_data = c("pairwise", "listwise"), empty_method = c("none", "zero", "all"), empty_value = c("none", "point_five", "one_over"), ... )polychoric_matrix( data, na_data = c("pairwise", "listwise"), empty_method = c("none", "zero", "all"), empty_value = c("none", "point_five", "one_over"), ... )
data |
Matrix or data frame.
A dataset with all ordinal values
(rows = cases, columns = variables).
Data are required to be between |
na_data |
Character (length = 1).
How should missing data be handled?
Defaults to
|
empty_method |
Character (length = 1). Method for empty cell correction. Available options:
|
empty_value |
Character (length = 1). Value to add to the joint frequency table cells. Accepts numeric values between 0 and 1 or specific methods:
|
... |
Not used but made available for easier argument passing |
A symmetric numeric matrix of dimension , where
is the number of variables (columns) in data. Each
off-diagonal entry is the polychoric correlation between
variables and — the estimated Pearson correlation between
the latent continuous variables assumed to underlie the observed ordinal
categories — with values in . Diagonal entries are .
Row and column names are inherited from data. If any variable has
zero variance, its corresponding row and column are set to NA and
a warning is issued.
Alexander P. Christensen <[email protected]> with assistance from GPT-4
Beasley-Moro-Springer algorithm
Beasley, J. D., & Springer, S. G. (1977).
Algorithm AS 111: The percentage points of the normal distribution.
Journal of the Royal Statistical Society. Series C (Applied Statistics), 26(1), 118-121.
Moro, B. (1995). The full monte. Risk 8 (February), 57-58.
Brent optimization
Brent, R. P. (2013).
Algorithms for minimization without derivatives.
Mineola, NY: Dover Publications, Inc.
Drezner-Wesolowsky bivariate normal approximation
Drezner, Z., & Wesolowsky, G. O. (1990).
On the computation of the bivariate normal integral.
Journal of Statistical Computation and Simulation, 35(1-2), 101-107.
# Categorize data basic_categorized <- apply( basic_smallworld, 2, categorize, categories = 5 ) # Compute polychoric correlation matrix correlations <- polychoric_matrix(basic_categorized) # Randomly assign missing data basic_categorized[sample(1:length(basic_categorized), 1000)] <- NA # Compute polychoric correlation matrix with pairwise missing pairwise_correlations <- polychoric_matrix( basic_categorized, na_data = "pairwise" ) # Compute polychoric correlation matrix with listwise missing pairwise_correlations <- polychoric_matrix( basic_categorized, na_data = "listwise" )# Categorize data basic_categorized <- apply( basic_smallworld, 2, categorize, categories = 5 ) # Compute polychoric correlation matrix correlations <- polychoric_matrix(basic_categorized) # Randomly assign missing data basic_categorized[sample(1:length(basic_categorized), 1000)] <- NA # Compute polychoric correlation matrix with pairwise missing pairwise_correlations <- polychoric_matrix( basic_categorized, na_data = "pairwise" ) # Compute polychoric correlation matrix with listwise missing pairwise_correlations <- polychoric_matrix( basic_categorized, na_data = "listwise" )
Converts a network matrix into a connected ring lattice whose
degree sequence exactly matches the original, while maximizing the average
local clustering coefficient. The resulting lattice is intended as a null
model for small-world network analyses. An adjacency matrix is derived
automatically from network (non-zero entries become edges), so the
function accepts any weighted or binary network directly.
Each pass randomly permutes the degree sequence onto ring positions, then
greedily assigns edges in strictly increasing ring-distance order (proximity
construction). Any residual degree deficit is resolved by a swap-repair
phase. Passes that yield a disconnected graph or an unsatisfied degree
sequence are discarded; among valid passes the one with the highest average
clustering coefficient is returned. shuffles independent passes are
attempted in total.
proxswap_lattice(network, weighted = FALSE, shuffles = 100)proxswap_lattice(network, weighted = FALSE, shuffles = 100)
network |
Matrix.
A square, symmetric numeric matrix representing a network (e.g., partial
correlations). Non-zero off-diagonal entries are treated as edges; the
binary adjacency is derived internally as |
weighted |
Logical (length = 1).
Whether to return a weighted ring lattice. When |
shuffles |
Numeric (length = 1).
Number of independent random permutation passes to attempt. Each pass
assigns the observed degree sequence to ring positions in a new random
order, runs the proximity construction, and applies swap repair if needed.
Only passes producing a connected graph with zero degree error are
retained; the one with the highest clustering coefficient is returned.
Defaults to |
## Algorithm
Pair precomputation. A ring distance matrix is computed as
, giving true circular
distances bounded by . All unique unordered pairs
at each distance are extracted
once (retaining only entries where ) and cached for reuse
across all passes.
Proximity construction. The observed degree sequence is randomly
permuted onto ring positions. Starting from an empty graph, edges are added
in distance order. At each distance band the eligible pairs (both endpoints
have remaining degree budget and are not yet connected) are sorted by
descending so that high-need
pairs receive short connections first. Pairs are then assigned sequentially
with per-pair budget re-checks. The pass exits early once all budgets reach
zero. This phase is implemented in compiled C via proximity_pass_c.
Swap repair. If any degree deficit remains after the proximity
pass, the highest-deficit node is connected to its nearest
available ring neighbour, scanning clockwise and counter-clockwise
positions in interleaved distance order. When no direct partner with
remaining budget exists, an edge swap is performed: a nearby unconnected
node is found, one of 's existing edges to is
removed, and a new edge is added. Node recovers its
budget for resolution in a subsequent iteration. This repeats until all
deficits are resolved or the iteration cap () is reached. This
phase is implemented in compiled C via swapping_pass_c.
Weight assignment. When weighted = TRUE, edge weights are
reassigned after the binary topology is finalised. The observed weights are
sorted by absolute value in descending order (largest magnitude first)
and mapped onto lattice edges ranked by ascending ring distance (i.e.,
), so that shorter (more local)
connections receive the largest-magnitude weights. Original signs are
preserved: the sorted weight vector — not its absolute values — is assigned
to the lattice edges. This directly implements the distance-weight principle
of Muldoon, Bridgeford, & Bassett (2016), using the ring's structural
distances rather than any network-derived proxy. Ties in ring distance are
broken at random.
Pass selection. A pass is valid only if the resulting graph is connected and has zero residual degree error. Among valid passes, the one with the highest average clustering coefficient is returned.
Empirical fallback. If no valid pass is found, or if the best
lattice clustering coefficient is lower than that of the original network,
the empirical adjacency (or weighted matrix, if weighted = TRUE) is
returned with a warning.
A square symmetric matrix of the same dimension as network,
with row and column names preserved, representing the resulting ring
lattice. When weighted = FALSE (default), entries are binary integers
(0L/1L); when weighted = TRUE, entries contain the
reassigned edge weights from network with original signs preserved.
The average clustering coefficient of the returned graph is attached as the
attribute "CC" and can be retrieved with attr(result, "CC").
When the empirical fallback is triggered (see Details), "CC" reflects
the empirical network's clustering coefficient rather than a lattice's.
Alexander P. Christensen <[email protected]>
Logic for weight assignments
Muldoon, S. F., Bridgeford, E. W., & Bassett, D. S. (2016).
Small-world propensity and weighted brain connectivity.
Scientific Reports, 6, 22057.
# Get network network <- network_estimation(basic_smallworld) # Construct binary ring lattice L <- proxswap_lattice(network) # Retrieve the attached clustering coefficient attr(L, "CC") # Degree sequences should match exactly cbind(target = colSums(network != 0), achieved = colSums(L)) # Construct weighted ring lattice L_weighted <- proxswap_lattice(network, weighted = TRUE) # Retrieve the attached clustering coefficient attr(L_weighted, "CC")# Get network network <- network_estimation(basic_smallworld) # Construct binary ring lattice L <- proxswap_lattice(network) # Retrieve the attached clustering coefficient attr(L, "CC") # Degree sequences should match exactly cbind(target = colSums(network != 0), achieved = colSums(L)) # Construct weighted ring lattice L_weighted <- proxswap_lattice(network, weighted = TRUE) # Retrieve the attached clustering coefficient attr(L_weighted, "CC")
Simulates data from a Gaussian Graphical Model (GGM) with a stochastic
block model (SBM) structure. Nodes are partitioned into communities, with
edge density controlled separately within and between communities via a
blocks x blocks density matrix. Absolute edge weights are drawn from
a Weibull distribution whose parameters are predicted from the network size,
sample size, and signal-to-noise ratio using a Seemingly Unrelated
Regression (SUR) model fitted to 194 empirical psychometric networks
(Huth et al., 2025). The resulting population partial correlation matrix is
used to generate multivariate normal (or skewed) data.
A diffusion parameter controls the minimum proportion of the
strongest edges that are reserved for within-community positions. Because
the remaining edges are shuffled randomly, the effective within-community
advantage will generally exceed 1 - diffusion; see Details.
Parameters do not have default values (except negative_proportion,
diffusion, target_condition, max_correlation, and
max_iterations) and must each be set. See Details and Examples to
get started.
simulate_sbm( nodes, blocks, density_matrix, snr = 1, diffusion = 0.3, diffusion_range = NULL, negative_proportion, sample_size, skew = 0, skew_range = NULL, target_condition = 30, max_correlation = 0.8, max_iterations = 100 )simulate_sbm( nodes, blocks, density_matrix, snr = 1, diffusion = 0.3, diffusion_range = NULL, negative_proportion, sample_size, skew = 0, skew_range = NULL, target_condition = 30, max_correlation = 0.8, max_iterations = 100 )
nodes |
Numeric (length = 1 or |
blocks |
Numeric (length = 1). Number of community blocks. |
density_matrix |
Matrix (dim = |
snr |
Numeric (length = 1).
Signal-to-noise ratio of the absolute partial correlation weights,
defined as |
diffusion |
Numeric (length = 1).
Controls the minimum proportion of the top-ranked edges (by absolute
weight) that are guaranteed to be placed within communities rather than
distributed freely across the network. Specifically, |
diffusion_range |
Numeric (length = 2).
If provided, overrides |
negative_proportion |
Numeric (length = 1).
Proportion of between-community edges assigned a negative sign (inhibitory
connections). Each between-community edge is independently signed negative
with this probability (i.e., a Bernoulli draw per edge), so the realized
proportion will vary around the specified value. Must be between 0 and 1.
Applies only to between-community edges; all within-community edges are
positive. If not provided, a value is sampled from a truncated normal
distribution reflecting the empirical distribution of true negative partial
correlations across 194 psychometric networks: mean = 0.34, SD = 0.086,
bounded to |
sample_size |
Numeric (length = 1).
Number of observations to generate from the population multivariate
distribution. Also influences the predicted Weibull scale parameter via
|
skew |
Numeric (length = 1 or |
skew_range |
Numeric (length = 2).
If provided, overrides |
target_condition |
Numeric (length = 1).
Target condition number (computed via |
max_correlation |
Numeric (length = 1).
Maximum allowed absolute pairwise correlation in the population correlation
matrix |
max_iterations |
Numeric (length = 1).
Maximum number of attempts to find a connected network with valid edge
weights before stopping with an error. The error message reports a
frequency table of rejection reasons to assist with diagnosing convergence
failures. Defaults to |
Constructing density_matrix
The density_matrix is a blocks x blocks symmetric matrix
where entry gives the within-block edge probability for
community , and entry (for ) gives the
between-block edge probability for communities and . The
simplest construction uses a uniform off-diagonal density with
block-specific diagonals:
# Uniform within (0.90) and between (0.20) density for 3 blocks dm <- matrix(0.20, nrow = 3, ncol = 3) diag(dm) <- 0.90
For asymmetric community structure, each diagonal entry can differ:
# Varying within-block density per community dm <- matrix(0.20, nrow = 3, ncol = 3) diag(dm) <- c(0.85, 0.90, 0.95)
For full pairwise control over between-block densities, specify the complete symmetric matrix directly:
dm <- matrix(c( 0.90, 0.20, 0.10, 0.20, 0.85, 0.30, 0.10, 0.30, 0.95 ), nrow = 3, ncol = 3)
Diffusion and within-community weight concentration
The diffusion parameter does not exactly fix the proportion of
top-ranked edges placed within communities. Instead, 1 - diffusion
of the within-community edge slots are filled deterministically from the
highest-weight draws. The remaining edge slots (within- and
between-community alike) are filled from a randomly shuffled pool of
lower-ranked weights, meaning some additional high-weight edges will land
within communities by chance. The realized within-community weight
advantage will therefore always be at least as large as implied by
1 - diffusion, and typically larger. The Q field in the
returned parameters list (Newman-Girvan modularity) provides a
post-hoc summary of the actual community contrast achieved.
A named list with four elements:
data |
Numeric matrix of dimension |
parameters |
A list of input, derived, and estimated parameters:
|
population |
Population-level network parameters:
|
convergence |
Iteration and conditioning diagnostics:
|
Alexander P. Christensen <[email protected]>
Seminal introduction to Stochastic Block Models
Holland, P. W., Laskey, K. B., & Leinhardt, S. (1983).
Stochastic blockmodels: First steps.
Social Networks, 5(2), 109–137.
Empirical network data used to fit the Weibull SUR model
Huth, K. B. S., Haslbeck, J. M. B., Keetelaar, S., Van Holst, R. J., &
Marsman, M. (2025). Statistical evidence in psychological networks.
Nature Human Behaviour.
Maximum ridge shrinkage bound
Peeters, C. F., van de Wiel, M. A., & van Wieringen, W. N. (2020).
The spectral condition number plot for regularization parameter evaluation.
Computational Statistics, 35(2), 629–646.
# Construct density matrix for 3 blocks with uniform densities dm <- matrix(0.20, nrow = 3, ncol = 3) diag(dm) <- 0.90 # Basic 3-block simulation with equal-sized communities result <- simulate_sbm( nodes = 6, # 6 nodes per block = 18 total blocks = 3, sample_size = 500, density_matrix = dm ) # Unequal block sizes result <- simulate_sbm( nodes = c(4, 6, 8), # 18 total nodes blocks = 3, sample_size = 500, density_matrix = dm ) # Varying within-block density per community dm_varying <- matrix(0.20, nrow = 3, ncol = 3) diag(dm_varying) <- c(0.85, 0.90, 0.95) result <- simulate_sbm( nodes = 6, blocks = 3, sample_size = 500, density_matrix = dm_varying ) # Full pairwise between-block density control dm_pairwise <- matrix(c( 0.90, 0.20, 0.10, 0.20, 0.85, 0.30, 0.10, 0.30, 0.95 ), nrow = 3, ncol = 3) result <- simulate_sbm( nodes = 6, blocks = 3, sample_size = 500, density_matrix = dm_pairwise ) # Fix the proportion of negative between-community edges result <- simulate_sbm( nodes = 6, blocks = 3, sample_size = 500, density_matrix = dm, negative_proportion = 0.20 ) # Introduce variability in diffusion across replications result <- simulate_sbm( nodes = 6, blocks = 3, sample_size = 500, density_matrix = dm, diffusion_range = c(0.30, 0.70) )# Construct density matrix for 3 blocks with uniform densities dm <- matrix(0.20, nrow = 3, ncol = 3) diag(dm) <- 0.90 # Basic 3-block simulation with equal-sized communities result <- simulate_sbm( nodes = 6, # 6 nodes per block = 18 total blocks = 3, sample_size = 500, density_matrix = dm ) # Unequal block sizes result <- simulate_sbm( nodes = c(4, 6, 8), # 18 total nodes blocks = 3, sample_size = 500, density_matrix = dm ) # Varying within-block density per community dm_varying <- matrix(0.20, nrow = 3, ncol = 3) diag(dm_varying) <- c(0.85, 0.90, 0.95) result <- simulate_sbm( nodes = 6, blocks = 3, sample_size = 500, density_matrix = dm_varying ) # Full pairwise between-block density control dm_pairwise <- matrix(c( 0.90, 0.20, 0.10, 0.20, 0.85, 0.30, 0.10, 0.30, 0.95 ), nrow = 3, ncol = 3) result <- simulate_sbm( nodes = 6, blocks = 3, sample_size = 500, density_matrix = dm_pairwise ) # Fix the proportion of negative between-community edges result <- simulate_sbm( nodes = 6, blocks = 3, sample_size = 500, density_matrix = dm, negative_proportion = 0.20 ) # Introduce variability in diffusion across replications result <- simulate_sbm( nodes = 6, blocks = 3, sample_size = 500, density_matrix = dm, diffusion_range = c(0.30, 0.70) )
Simulates data from a Gaussian Graphical Model (GGM) with a small-world
network structure. The generative process proceeds in three stages. First,
a ring lattice is constructed with neighbors + 1 nearest-neighbor
connections per node, where neighbors is derived from nodes
and density. Second, the lattice is randomly pruned to the target
density, introducing degree heterogeneity from the outset. Third,
edges are rewired with probability rewire, where rewired edges are
placed preferentially on node pairs with higher combined degree (degree-
weighted rewiring). This approach produces more realistic degree
distributions than standard Watts-Strogatz rewiring while preserving
the local clustering structure of the lattice. Edge weights are assigned
by structural priority: edges retained from the pruned lattice receive
larger partial correlation weights based on their original neighbor
distance, grounded in the empirical observation that shorter-distance
connections tend to carry stronger weights in psychometric networks.
The resulting network is used to generate multivariate normal (or skewed)
data. Parameters do not have default values (except
negative_proportion, snr, target_condition,
max_correlation, and max_iterations) and must each be set.
See Details and Examples to get started.
simulate_smallworld( nodes, density, rewire, snr = 1, negative_proportion, sample_size, skew = 0, skew_range = NULL, target_condition = 30, max_correlation = 0.8, max_iterations = 100 )simulate_smallworld( nodes, density, rewire, snr = 1, negative_proportion, sample_size, skew = 0, skew_range = NULL, target_condition = 30, max_correlation = 0.8, max_iterations = 100 )
nodes |
Numeric (length = 1).
Number of nodes in the network.
Minimum of three nodes. The total number of nodes should be between 8
and 54 to remain within the range of the empirical networks used to fit
the Weibull parameter model; values outside this range are accepted but
will trigger extrapolation and may produce a warning from
|
density |
Numeric (length = 1).
Target edge density of the network after pruning the initial ring lattice.
Controls the number of nearest neighbors |
rewire |
Numeric (length = 1). Probability of rewiring each edge. Unlike the standard Watts-Strogatz model, rewired edges are placed using degree-weighted random selection: node pairs with higher combined degree have a greater probability of receiving a rewired edge (see Details). Values near 0 preserve the pruned lattice structure; values near 1 produce approximately random networks. The small-world regime typically occurs at intermediate values (roughly 0.01 to 0.30). Must be between 0 and 1. |
snr |
Numeric (length = 1).
Signal-to-noise ratio of the absolute partial correlation weights,
defined as |
negative_proportion |
Numeric (length = 1).
Proportion of edges that are negative (inhibitory).
Must be between 0 and 0.50. The upper bound of 0.50 is a mathematical
constraint of the sign-flipping procedure used to assign negative edges
(see Details). If not provided, a value is sampled from a truncated
normal distribution reflecting the empirical distribution of true
negative partial correlations across 194 psychometric networks:
mean = 0.34, SD = 0.086, bounded to |
sample_size |
Numeric (length = 1).
Number of observations to generate from the population multivariate
normal distribution. Also influences the predicted Weibull scale
parameter via |
skew |
Numeric (length = 1 or |
skew_range |
Numeric (length = 2).
If provided, a skew value is drawn uniformly from this range for each
variable, overriding |
target_condition |
Numeric (length = 1).
Target condition number (using |
max_correlation |
Numeric (length = 1).
Maximum allowed absolute pairwise correlation in the population
correlation matrix |
max_iterations |
Numeric (length = 1).
Maximum number of attempts to find (1) a connected network structure,
(2) a network satisfying the small-world screening criterion, and (3) a
valid set of edge weights. The error message reports a frequency table
of rejection reasons to assist with diagnosing convergence failures.
Defaults to |
Lattice generation and pruning
The generative process begins by constructing a ring lattice with
neighbors + 1 nearest-neighbor connections per node via
sample_smallworld with p = 0. The
+1 overshoot ensures the lattice always has more edges than the
target density, guaranteeing that pruning can proceed. The lattice is
then randomly pruned to the target density by removing edges
uniformly at random, subject to the constraint that the pruned graph
remains connected. Because removal is random, different nodes lose
different numbers of edges, producing a heterogeneous degree distribution
before any rewiring occurs. This heterogeneity provides a non-uniform
prior for the subsequent degree-weighted rewiring step.
Degree-weighted rewiring
Each edge in the pruned lattice is independently selected for rewiring
with probability rewire. For each selected edge ,
node is kept fixed and the endpoint is redirected to a
new target node. Valid targets are restricted to node pairs involving
node that (1) are not currently connected, and (2) have never
been occupied during the current rewiring pass (i.e., were absent in
the original pruned lattice). Among valid targets, the new endpoint is
selected with probability proportional to , where
and are the current degrees of the two nodes in
the candidate pair. The square-root transformation moderates the
rich-get-richer tendency of linear preferential attachment, producing
degree heterogeneity consistent with the empirical range of psychometric
networks without generating extreme hubs. Node degrees are updated
incrementally after each rewire so that subsequent rewiring steps
reflect the current graph state.
Edge weight assignment
Edge weights are assigned by ranking edges according to their distance
in the pruned lattice before rewiring. Edges retained from the pruned
lattice receive their original neighbor distance (1 = nearest neighbor,
2 = second nearest, etc.); rewired edges are assigned a distance of
neighbors + 1, placing them at the bottom of the priority order.
Absolute edge weights are drawn from a Weibull distribution whose
parameters are predicted from the network size, sample size, and signal-
to-noise ratio using a Seemingly Unrelated Regression (SUR) model fitted
to 194 empirical psychometric networks (Huth et al., 2025). The sorted
(descending) Weibull weights are then mapped onto the distance ranking
so that the largest weights go to the shortest-distance edges. This
assignment is grounded in the empirical observation that shorter-distance
local connections tend to carry larger partial correlation weights in
psychometric networks.
Negative edge assignment
Negative edges are introduced by flipping the signs of a subset of
nodes — multiplying all edges incident to those nodes by -1. The number
of nodes to flip is derived from negative_proportion via the
inversion formula , where is the
target proportion of negative edges. This formula assumes that an edge
is negative if and only if exactly one of its endpoints is flipped,
giving an expected negative proportion of for flipped nodes out of total. The formula
requires , which is why negative_proportion is
bounded at 0.50.
A named list with four elements:
data |
Numeric matrix of dimension |
parameters |
A list of input, derived, and estimated parameters:
|
population |
Population-level network parameters:
|
convergence |
Iteration and conditioning diagnostics:
|
Alexander P. Christensen <[email protected]>
Seminal introduction to the Watts-Strogatz small-world model
Watts, D. J., & Strogatz, S. H. (1998).
Collective dynamics of 'small-world' networks.
Nature, 393(6684), 440–442.
Logic for weight assignments
Muldoon, S. F., Bridgeford, E. W., & Bassett, D. S. (2016).
Small-world propensity and weighted brain networks.
Scientific Reports, 6(1), 22057.
Analytical approximation of random-graph average path length
Newman, M. E. J., Strogatz, S. H., & Watts, D. J. (2001).
Random graphs with arbitrary degree distributions and their applications.
Physical Review E, 64(2), 026118.
Empirical network data used to fit the Weibull SUR model
Huth, K. B. S., Haslbeck, J. M. B., Keetelaar, S., Van Holst, R. J., &
Marsman, M. (2025). Statistical evidence in psychological networks.
Nature Human Behaviour.
Maximum ridge shrinkage bound
Peeters, C. F., van de Wiel, M. A., & van Wieringen, W. N. (2020).
The spectral condition number plot for regularization parameter evaluation.
Computational Statistics, 35(2), 629–646.
# Basic small-world network (moderate density, moderate rewiring) result <- simulate_smallworld( nodes = 20, density = 0.30, rewire = 0.20, sample_size = 500 ) # Lattice-like structure (low rewiring preserves local connectivity) result <- simulate_smallworld( nodes = 20, density = 0.30, rewire = 0.01, sample_size = 500 ) # Random-like structure (high rewiring destroys lattice regularity) result <- simulate_smallworld( nodes = 20, density = 0.30, rewire = 0.80, sample_size = 500 ) # Fix the proportion of negative edges result <- simulate_smallworld( nodes = 20, density = 0.30, rewire = 0.20, sample_size = 500, negative_proportion = 0.20 )# Basic small-world network (moderate density, moderate rewiring) result <- simulate_smallworld( nodes = 20, density = 0.30, rewire = 0.20, sample_size = 500 ) # Lattice-like structure (low rewiring preserves local connectivity) result <- simulate_smallworld( nodes = 20, density = 0.30, rewire = 0.01, sample_size = 500 ) # Random-like structure (high rewiring destroys lattice regularity) result <- simulate_smallworld( nodes = 20, density = 0.30, rewire = 0.80, sample_size = 500 ) # Fix the proportion of negative edges result <- simulate_smallworld( nodes = 20, density = 0.30, rewire = 0.20, sample_size = 500, negative_proportion = 0.20 )
Tables for skew based on the number of categories (2, 3, 4, 5, or 6) in the data
data(skew_tables)data(skew_tables)
A list (length = 5)
data("skew_tables")data("skew_tables")
Computes the small-worldness of a network using one of five
methods. All simulation-based methods generate degree-preserving random
graphs as the null baseline. The lattice reference network (where required)
is constructed using proxswap_lattice, which produces a
degree-preserving ring lattice that maximizes the clustering coefficient
smallworldness( network, lattice, method = c("analytical", "omega", "S", "SWI", "SWP"), weighted = FALSE, iter = 100 )smallworldness( network, lattice, method = c("analytical", "omega", "S", "SWI", "SWP"), weighted = FALSE, iter = 100 )
network |
Matrix or data frame. A square, symmetric numeric matrix representing a network (e.g., partial correlations). Absolute values are taken internally, so signed weights are handled automatically. Non-zero off-diagonal entries are treated as edges. |
lattice |
Matrix (optional).
A pre-computed lattice adjacency matrix to use as the regular-network
reference for the |
method |
Character (length = 1).
The method used to compute small-worldness.
Defaults to
|
weighted |
Logical (length = 1).
Whether to compute small-worldness on the weighted network. When
|
iter |
Numeric (length = 1).
Number of random graphs to generate when estimating the null baseline.
Defaults to |
Numeric (length = 1).
The small-worldness value computed using the chosen method:
"analytical" and "S" — indicates small-world structure
"omega" — Values near indicate small-world structure
(ranges between -1 and 1)
"SWI" — Values close to 1 indicate strong small-world structure;
values near 0 indicate lattice-like or random-like structure
(ranges between 0 and 1)
"SWP" — Values indicate strong small-world structure
(ranges between 0 and 1)
Alexander P. Christensen <[email protected]>
omega
Telesford, Q. K., Joyce, K. E., Hayasaka, S., Burdette, J. H., & Laurienti, P. J. (2011).
The ubiquity of small-world networks.
Brain Connectivity, 1(5), 367–375.
S and analytical
Humphries, M. D., & Gurney, K. (2008).
Network 'small-world-ness': A quantitative method for determining canonical network equivalence.
PLoS ONE, 3(4), e0002051.
SWI
Neal, Z. P. (2015).
Making big communities small: Using network science to understand the ecological and behavioral requirements for community social capital.
American Journal of Community Psychology, 55(3), 369–380.
SWP
Muldoon, S. F., Bridgeford, E. W., & Bassett, D. S. (2016).
Small-world propensity and weighted brain networks.
Scientific Reports, 6(1), 22057.
# Get network network <- network_estimation(basic_smallworld) # Compute SWP (default) swp <- smallworldness(network) # Compute omega omega <- smallworldness(network, method = "omega") # Compute analytical S S <- smallworldness(network, method = "analytical") # Compute simulated S S_sim <- smallworldness(network, method = "S") # Compute SWI swi <- smallworldness(network, method = "SWI") # Compute weighted SWP swp_w <- smallworldness(network, weighted = TRUE) # Compute weighted omega omega_w <- smallworldness(network, method = "omega", weighted = TRUE)# Get network network <- network_estimation(basic_smallworld) # Compute SWP (default) swp <- smallworldness(network) # Compute omega omega <- smallworldness(network, method = "omega") # Compute analytical S S <- smallworldness(network, method = "analytical") # Compute simulated S S_sim <- smallworldness(network, method = "S") # Compute SWI swi <- smallworldness(network, method = "SWI") # Compute weighted SWP swp_w <- smallworldness(network, weighted = TRUE) # Compute weighted omega omega_w <- smallworldness(network, method = "omega", weighted = TRUE)
Predicts the shape and scale parameters of a Weibull distribution that characterizes the absolute partial correlation edge weights of a psychometric network, given its number of nodes and sample size. Parameter estimates are derived from a Seemingly Unrelated Regression (SUR) model fitted to empirical network data from Huth et al. (2025), where absolute partial correlations were found to follow a Weibull distribution more consistently than Beta, Gamma, or log-normal alternatives.
weibull_parameters(nodes, sample_size, snr = 1, bootstrap = FALSE)weibull_parameters(nodes, sample_size, snr = 1, bootstrap = FALSE)
nodes |
Integer. The number of nodes (variables) in the network. Must be between 8 and 54, reflecting the range of the empirical networks used to fit the underlying SUR model. |
sample_size |
Integer. The sample size of the dataset from which the network is estimated. |
snr |
Numeric (length = 1).
Signal-to-noise ratio of partial correlations ( |
bootstrap |
Logical. If |
The shape and scale parameters are predicted from derived network descriptors that differ between the two SUR equations:
rlpThe reciprocal log of the number of nodes,
, capturing the diminishing marginal effect of network
size on edge weight distributions. Used in both the shape and scale
equations.
scalingThe standard error of partial correlations defined as
, where is the sample size and
is the number of nodes. Larger values indicate greater sampling
uncertainty in the partial correlation estimates. Used in the scale
equation only.
The two SUR equations have an asymmetric structure reflecting different
theoretical roles for sampling precision. Shape — which governs the
concentration of the edge weight distribution — is determined solely by
signal characteristics of the network via snr and rlp.
Scale — which governs the typical magnitude of edge weights — additionally
depends on scaling, as the expected size of partial correlations is
directly affected by estimation precision. This asymmetry is both
empirically supported (dropping scaling from the shape equation
costs ) and theoretically coherent.
These predictors enter two SUR equations whose coefficients are stored in
the internal weibull_weights dataset. SUR was used to account for the
correlated residuals between the shape and scale equations across networks
(residual correlation = 0.261). Model fit was strong: the shape equation
achieved (RMSE = 0.048) and the scale equation achieved
(RMSE = 0.011). Shape residuals were normally distributed
(Shapiro-Wilk W = 0.990, p = 0.173). Scale residuals showed a modest
departure from normality (Shapiro-Wilk W = 0.973, p < 0.001), consistent
with slight right skew in the scale outcome and test sensitivity at n = 194
rather than a substantive violation. Heteroskedasticity was detected in both
the shape equation (Breusch-Pagan p < 0.001) and the scale equation
(Breusch-Pagan p < 0.001); robust standard errors (HC3) were used for
inference. Multicollinearity among predictors in the scale equation was
negligible (VIF 1.60); the shape equation contains only two
predictors with no multicollinearity concern.
nodes influences predicted shape and scale via rlp.
sample_size influences predicted scale via scaling, and
both parameters via their joint contribution to snr when it is
estimated from data rather than supplied directly.
Empirically, shape values ranged from approximately 0.72 to 1.63 (M = 1.07, SD = 0.14) and scale values from approximately 0.03 to 0.19 (M = 0.10, SD = 0.03) across the 194 networks used to fit the model. Shape values near 1 indicate approximately exponential edge weight distributions; values above 1 indicate a rising hazard (mode-bearing distribution).
When bootstrap = TRUE, residuals are drawn via shuffle() –
a random sampling without replacement – from the empirical SUR residuals,
preserving the observed marginal residual distribution.
A named numeric vector of length 2:
shapeThe predicted Weibull shape parameter ().
scaleThe predicted Weibull scale parameter ().
Alexander P. Christensen <[email protected]>
Huth, K. B. S., Haslbeck, J. M. B., Keetelaar, S., Van Holst, R. J., & Marsman, M. (2025). Statistical evidence in psychological networks. Nature Human Behaviour.
# Predict parameters for a 10-node network with n = 500 weibull_parameters(nodes = 10, sample_size = 500) # With bootstrapped residuals for use in simulation weibull_parameters(nodes = 10, sample_size = 500, bootstrap = TRUE)# Predict parameters for a 10-node network with n = 500 weibull_parameters(nodes = 10, sample_size = 500) # With bootstrapped residuals for use in simulation weibull_parameters(nodes = 10, sample_size = 500, bootstrap = TRUE)
A named list encoding the Seemingly Unrelated Regression (SUR) model fitted
to Weibull shape and scale parameters derived from 194
empirical networks. The object is consumed internally by weibull_parameters
to generate data-driven Weibull parameter estimates given a network's
number of nodes and sample size.
data(weibull_weights)data(weibull_weights)
A named list with two elements, shape and scale,
each containing:
coefficientsA named numeric vector of regression
coefficients from the SUR model. The equations are asymmetric: the shape
equation includes an intercept and two predictors (snr, rlp);
the scale equation includes an intercept and three predictors
(snr, rlp, scaling).
Predictors are defined as follows:
snrSignal-to-noise ratio of the absolute partial correlations, computed as the mean divided by the standard deviation of the absolute edge weights. Used in both equations.
rlpReciprocal log of node count, ,
where is the number of nodes. Used in both equations.
scalingStandard error of partial correlations,
, where is the sample size and
is the number of nodes. Used in the scale equation only.
residualsA numeric vector of residuals from the fitted SUR
equation, used by weibull_parameters to introduce
empirically grounded variability when bootstrap = TRUE.
Absolute partial correlations from 222 deduplicated empirical networks (Huth et al., 2025) were fitted to Beta, Gamma, log-normal, and Weibull distributions via maximum likelihood. Weibull provided the best fit most consistently: it outperformed each alternative by more than 2 log-likelihood units far more often than the reverse (vs. Beta: 56–0; vs. Gamma: 15–2; vs. log-normal: 155–13; vs. Exponential: 54–0).
The resulting Weibull shape and scale parameters were then jointly modelled
as a function of network descriptors using Seemingly Unrelated Regression
(systemfit), which accounts for correlated residuals between the
shape and scale equations across networks (residual correlation = 0.261).
Prior to fitting, networks with fewer than eight nodes (), more
than 300,000 observations, or fewer than one observation per edge
() were excluded, as Huth et al. (2025) demonstrated
that networks in this regime show the weakest statistical evidence for edge
presence or absence, yielding unstable parameter estimates (n = 28 excluded).
This left n = 194 networks for analysis. Shape and scale parameters were
modelled on their original scales.
The two equations have an asymmetric structure. Shape — which governs the
concentration of the edge weight distribution — is predicted from snr
and rlp only, reflecting that the shape of the distribution is a
property of the network's signal structure independent of sampling precision.
Scale — which governs typical edge weight magnitude — additionally includes
scaling, as the expected size of partial correlations is directly
affected by estimation precision. Dropping scaling from the shape
equation costs and yields cleaner inference; all
shape predictors are significant under HC3-robust standard errors
(both ).
Variance inflation factors for the scale equation ( 1.60)
confirmed the absence of problematic multicollinearity; the shape equation
contains only two predictors with no multicollinearity concern.
Breusch-Pagan tests indicated statistically significant heteroskedasticity
in both equations; however, all predictors remained significant under
HC3-robust standard errors, indicating no material effect on inference.
Shape residuals were normally distributed (Shapiro-Wilk W = 0.990,
p = 0.173). Scale residuals showed a modest departure from normality
(Shapiro-Wilk W = 0.973, p < 0.001), consistent with slight right skew
in the scale outcome and test sensitivity at n = 194 rather than a
substantive violation, as confirmed by visual inspection of the residual
histogram. Model fit was strong: shape (RMSE = 0.048);
scale (RMSE = 0.011).
Huth, K. B. S., Haslbeck, J. M. B., Keetelaar, S., Van Holst, R. J., & Marsman, M. (2025). Statistical evidence in psychological networks. Nature Human Behaviour.
data("weibull_weights") # Inspect SUR coefficients for each equation weibull_weights$shape$coefficients weibull_weights$scale$coefficients # Predict Weibull parameters for a new network weibull_parameters(nodes = 12, sample_size = 500)data("weibull_weights") # Inspect SUR coefficients for each equation weibull_weights$shape$coefficients weibull_weights$scale$coefficients # Predict Weibull parameters for a new network weibull_parameters(nodes = 12, sample_size = 500)