With the binary response matrix, semantic networks can now be estimated. In the last few years, various computational approaches have been proposed to estimate semantic networks from verbal fluency data (Goñi et al., 2011; Kenett et al., 2013; Lerner, Ogrocki, & Thomas, 2009; Zemla & Austerweil, 2018). Moreover, there are a number of packages in R that are capable of estimating semantic networks [e.g., corpustools; Welbers & van Atteveldt (2018)] and networks more generally [e.g., igraph; (csardi2006igraph?) and qgraph; (epskamp2012qgraph?)]. As described earlier, this tutorial follows the approach developed by Kenett and colleagues to estimate semantic networks based on correlations of the associations profiles of verbal fluency responses across the sample (Borodkin, Kenett, Faust, & Mashal, 2016; Kenett, Beaty, Silvia, Anaki, & Faust, 2016; Kenett et al., 2013).
The SemNetCleaner, SemNeT, and NetworkToolbox packages in R will be used to execute this stage of the pipeline. The SemNetCleaner package will be used to further process the binary response matrix into a finalized format for network estimation. The SemNeT package (christensen2019semnet?) contains several functions for the analysis of semantic networks, including a function to compute the association profiles of verbal fluency responses. The NetworkToolbox package (christensen2019networktoolbox?) contains functions for network analysis more generally, including functions to estimate and analyze networks. This package will be used to estimate the semantic networks from the association profile matrices.
Kenett and colleagues’ approach begins by splitting the binary response matrix into groups. Next, for each group, only responses that are provided by two or more participants are retained (e.g., Borodkin et al., 2016). This is done to minimize spurious associations driven by idiosyncratic responses in the sample. Finally, binary response matrices are “equated” or their responses are matched such that each group only retains responses if they are given by all other groups (Kenett et al., 2013).
This step is particularly important because some groups may have a different number of responses (i.e., nodes), which can introduce confounding factors [e.g., biased comparison of network parameters; van Wijk, Stam, & Daffertshofer (2010)]. By equating the binary response matrices, the networks can be compared using the same nodes, ruling out alternative explanations of the results (e.g., difference in network structure) that could be due to differences in the number of nodes (Borodkin et al., 2016). Once this process is complete, the networks can be estimated using a network estimation method.
We continue with the example of the dataset analyzed by Christensen, Kenett, Cotter, Beaty, & Silvia (2018) that estimated and compared semantic networks of two groups—low and high openness to experience groups. While we focus on estimating and comparing two groups, the functions in our R packages are capable of handling more than two groups.
The binary response matrix (i.e., corr.clean$binary
)
from the preprocessing step contains the responses for both the low and
high openness to experience groups. To continue with our pipeline, we
need to separate the binary response matrix into two groups. This can be
done using the Group
variable with the following code:
# Attach 'Group' variable to the binary response matrix
behav <- cbind(open.animals$Group, corr.clean$binary)
# Create low and high openness to experience response matrices
low <- behav[which(behav[,1]==1),-1]
high <- behav[which(behav[,1]==2),-1]
The resulting matrices are the binary response matrices for the low and high openness to experience groups. For users who would like to use other network estimation methods that are not included in R, these binary response matrices can be exported using the following code:
# Save binary response matrices
write.csv(low, "low_BRM.csv", row.names = TRUE)
write.csv(high, "high_BRM.csv", row.names = TRUE)
Continuing with our pipeline, we aim to minimize the number of spurious associations in the network. This can be executed with the following code:
# Finalize matrices so that each response
# has been given by at least two participants
final.low <- finalize(low, minCase = 2)
final.high <- finalize(high, minCase = 2)
The function finalize
will remove responses (columns)
that have responses that are not given by a certain number of people.
The number of people that must give a response can be chosen using the
minCase
argument. This argument defaults to 2
,
which is consistent with our approach; however, users may wish to define
a higher number of minimum cases to avoid spurious associations. Next,
the responses are equated to control for differences in the number of
nodes. To do this, the following code can be used:
# Equate the responses across the networks
eq <- equate(final.low, final.high)
equate.low <- eq$final.low
equate.high <- eq$final.high
The equate
function will match the responses across any
number of groups. If there are more than two groups, then they simply
need to be entered (separated by commas) into the function. The output
of equate
are binary response matrices that have been
matched across groups. Each group’s matrix will be nested in the output
and labeled with the name of the object used as input (e.g., input =
final.low
and output = eq$final.low
).
Now that the binary response matrix has been separated into two groups based on our behavioral measure and the responses have been equated between the two groups, the networks can be estimated.
The network estimation method that Kenett et al. apply to estimate semantic networks are called correlation-based networks (Zemla & Austerweil, 2018). They are called correlation-based networks because they estimate the network based on how often responses co-occur across the group (Borodkin et al., 2016; Kenett et al., 2013). Common association measures that have been used with this approach are Pearson’s pairwise correlation (e.g., Kenett et al., 2013) and cosine similarity (e.g., Christensen, Kenett, Cotter, et al., 2018). Thus, the nodes in this network represent verbal fluency responses and the edges represent their association.
In our example of the work by Christensen, Kenett, Cotter, et al. (2018), the cosine similarity was used to compute the association profiles of the responses. We can apply this similarity measure with the following code:
# Compute cosine similarity for the 'low' and
# 'high' equated binary response matrices
cosine.low <- similarity(equate.low, method = "cosine")
cosine.high <- similarity(equate.high, method = "cosine")
The similarity
function in the SemNeT package
computes an association matrix from the equated response matrices. The
method
argument selects the association measure that is
used. Here, we use the "cosine"
similarity measure;
however, there are a number of other similarity measures, such as
Pearson’s correlation (method = "cor"
), that can be applied
(see ?similarity
for more options). With these association
matrices, a network estimation method can be applied.
To further minimize spurious relations, we proceed to apply a filter over other association matrix. The purpose of applying a network filtering method is to minimize spurious associations and retain the most relevant information in the network (Tumminello, Aste, Di Matteo, & Mantegna, 2005). Network estimation methods have certain criteria for retaining edges (e.g., statistical significance), which creates a more parsimonious model (Barfuss, Massara, Di Matteo, & Aste, 2016). For Kenett and colleagues approach, a family of network estimation methods known as Information Filtering Networks (Barfuss et al., 2016; Christensen, Kenett, Aste, Silvia, & Kwapil, 2018) have been applied.
The Information Filtering Networks methods apply various geometric constraints on the associations of the data to identify the most relevant information between nodes (e.g., edges) in a network (Christensen, Kenett, Aste, et al., 2018). Common Information Filtering Network approaches are the minimal spanning tree (Mantegna, 1999), planar maximally filtered graph (Tumminello et al., 2005), triangulated maximally filtered graph (Massara, Di Matteo, & Aste, 2016), and maximally filtered clique forest (Massara & Aste, 2019).
In Christensen, Kenett, Cotter, et al. (2018), the triangulated maximally filtered graph (TMFG) method was applied. The TMFG algorithm identifies the most important edges in a network by first connecting the four nodes that have the highest sum of edge weights (i.e., association) across all nodes. Next, the algorithm identifies and adds an additional node, which maximizes its sum of edge weights to the other connected nodes. The algorithm continues until every node is connected in the network (Massara et al., 2016; golino2018ega3?).
The resulting network has 3n − 6 number of edges (where n is the number of nodes) and is a planar network [i.e., it could be depicted on a theoretical plane without any edges crossing; Tumminello et al. (2005)]. Because the number of edges is a function of the number of nodes, networks with the same number of nodes will have the same number of edges. This is advantageous for comparing network structures because it reduces the confound of differences between networks being due to differences in the number of edges (Christensen, Kenett, Aste, et al., 2018; van Wijk et al., 2010). The TMFG method can be implemented, using the NetworkToolbox package in R,1 with the following code:
# Estimate 'low' and 'high' openness to experience networks
net.low <- TMFG(cosine.low)$A
net.high <- TMFG(cosine.high)$A
The output of these functions is a TMFG filtered semantic network for the low and high openness to experience groups. To save these networks outside of R so that other programs can be applied, the following code can be used:
# Save the networks
write.csv(net.low, "network_low.csv", row.names = FALSE)
write.csv(net.high, "network_high.csv", row.names = FALSE)
These networks are weighted, meaning that the edges correspond to the
magnitude of association between nodes. It’s common, however, for the
edges to be converted to binary values [i.e., 1 = edge present and 0 =
edge absent; Abbott, Austerweil, & Griffiths
(2015); Kenett et al. (2013); Kenett, Anaki, & Faust (2014)]. To convert a
weighted network into one that is unweighted, the binarize
function can be used:
It’s worth noting that, despite differences in edge weights, it has been shown that weighted and unweighted semantic networks typically correspond to one another (Abbott et al., 2015). When computing network measures in SemNeT, the edges will be binarized by default, meaning the statistics are computed for unweighted measures. There are options, however, to compute the weighted measures when the networks are left as weighted; therefore, it’s often preferred to keep the networks as weighted.
In this section, we discussed and applied one approach for estimating group-based semantic networks using functions in SemNetCleaner, SemNeT, and NetworkToolbox. In this process, the binary response matrix was split into groups, idiosyncratic responses were removed, and group binary response matrices were equated (using SemNetCleaner). Then, a similarity measure was applied to these group matrices (using SemNeT) and a network estimation method was applied (using NetworkToolbox).
Notably, there are other approaches for estimating semantic networks (e.g., Zemla & Austerweil, 2018). These other approaches fit seamlessly into our SemNA pipeline. For example, the binary response matrix from the preprocessing step can be used in another network estimation procedure. The output from the network estimation step are network(s) that are ready to be analyzed in the statistical analysis step of the pipeline. Effectively, this makes the network estimation step in the pipeline exchangeable with any other network estimation procedure.
Note that other filtering methods can also be applied using the NetworkToolbox including the minimal spanning tree, maximally filtered clique forest, and several thresholding methods.↩︎