Multilevel Approaches to Modeling Growth and Change in Latent Variables
Rich Jones
2024-01-27
Multilevel Curve of Factors
Sometimes there is too much longitudinal data to keep the data “wide”. Growth can be modeled more easily if we use a “long” data framework and a multilevel approach to growth.
But how to specify the model?
Note: This presentation treats factor indicators as continuous indicators.
I will work through an example using CESD responses collected from the New Haven Site of the EPESE (Established Populations for the Epidemiologic Study of the Elderly) study. The four-category responses of six CESD questions will be treated as continuous indicators of a common underlying trait.
These data are public and can be obtained from ICPSR.
Descriptives
psych::describe(df, type =3, skew =FALSE, ranges =TRUE, na.rm =TRUE)
For my first model, I’m going to use an outcome yz which is a baseline normalized z-score based on the observed items. It’s not a latent variable. But by normalizing at the baseline, it will give an approximation of the results we should expect when we move to the latent variable modeling framework.
yz0, yz3, and yz6 are the normalized scores for the mean of 6 CESD questions observed at the baseline and 3 and 6 year follow-up of the EPESE.
Next I’ll look at a multilevel modeling approach to measuring change in CESD symptoms. This means using the data in LONG format (rather than WIDE in the LGCM). Still looking at a standardized z-score for CESD symptoms: no latents yet for CESD.
TITLE: Your title goes here
DATA: FILE = "cesd.dat";
VARIABLE:
NAMES = id td yz female agec70;
MISSING=.;
Mplus Multilevel model
TITLE: MLM CESD score
DATA: FILE = cesd.dat ;
VARIABLE: NAMES = id td yz female agec70;
MISSING = . ;
WITHIN = td ;
BETWEEN = agec70 female ;
CLUSTER = id ;
ANALYSIS: TYPE = TWOLEVEL RANDOM ;
OUTPUT: TECH1;
MODEL: %WITHIN%
s | yz on td ;
%BETWEEN%
yz on agec70 female ;
s on agec70 female ;
yz with s ;
Mplus VERSION 8.11 (Mac)
MUTHEN & MUTHEN
04/10/2025 12:32 PM
INPUT INSTRUCTIONS
TITLE: MLM CESD score
DATA: FILE = cesd.dat ;
VARIABLE: NAMES = id td yz female agec70;
MISSING = . ;
WITHIN = td ;
BETWEEN = agec70 female ;
CLUSTER = id ;
ANALYSIS: TYPE = TWOLEVEL RANDOM ;
OUTPUT: TECH1;
MODEL: %WITHIN%
s | yz on td ;
%BETWEEN%
yz on agec70 female ;
s on agec70 female ;
yz with s ;
*** WARNING
One or more individual-level variables have no variation within a
cluster for the following clusters.
Variable Cluster IDs with no within-cluster variation
YZ 762 765 769 50 776 1497 1510 792 1538 1559 425 96 1590 836 1665 1666 1682 853
860 1716 457 889 899 1773 1777 1778 1783 1800 930 1843 1846 255 1872 1874 257
1883 259 1899 1923 501 969 1939 1969 990 1991 1992 1997 1999 1004 513 2024 1015
524 2070 2090 2092 2109 2159 2162 2165 564 2236 2242 25 2318 2330 2338 329 601
2371 2374 1179 603 2382 2393 611 2398 2402 2411 331 2452 8 159 2471 1241 653
163 2542 1270 1283 1293 1295 1301 2598 668 2617 2620 1316 2623 2625 1322 1323
2632 1328 2648 2650 2679 2684 173 2686 2690 361 2707 1356 363 709 2750 180 2783
1399 1404 195 740 396 1419 1948 2634 346 347 1477 1484 747 1561 781 384 1636
1641 812 1651 1663 1684 1692 1694 1696 399 843 1724 1737 1744 1745 857 1760
865 414 418 895 924 1869 23 1903 218 108 111 997 498 2023 2034 2045 2061 2067
2082 2086 1051 1062 526 2173 2179 260 122 2199 2216 1120 2252 1149 2290 2291
1156 1157 276 2321 1169 2325 278 279 2353 2357 284 588 2392 132 598 2432 2444
2469 615 2483 298 1258 1265 1278 2525 2526 630 2539 2547 2554 2577 639 2586
648 1325 1342 34 2680 675 2694 1383 1393 1396 703 2793 704 2798 73 1445
1 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS
MLM CESD score
SUMMARY OF ANALYSIS
Number of groups 1
Number of observations 6165
Number of dependent variables 1
Number of independent variables 3
Number of continuous latent variables 1
Observed dependent variables
Continuous
YZ
Observed independent variables
TD FEMALE AGEC70
Continuous latent variables
S
Variables with special functions
Cluster variable ID
Within variables
TD
Between variables
FEMALE AGEC70
Estimator MLR
Information matrix OBSERVED
Maximum number of iterations 100
Convergence criterion 0.100D-05
Maximum number of EM iterations 500
Convergence criteria for the EM algorithm
Loglikelihood change 0.100D-02
Relative loglikelihood change 0.100D-05
Derivative 0.100D-03
Minimum variance 0.100D-03
Maximum number of steepest descent iterations 20
Maximum number of iterations for H1 2000
Convergence criterion for H1 0.100D-03
Optimization algorithm EMA
Input data file(s)
cesd.dat
Input data format FREE
SUMMARY OF DATA
Number of missing data patterns 1
Number of clusters 2762
COVARIANCE COVERAGE OF DATA
Minimum covariance coverage value 0.100
PROPORTION OF DATA PRESENT
Covariance Coverage
YZ TD FEMALE AGEC70
________ ________ ________ ________
YZ 1.000
TD 1.000 1.000
FEMALE 1.000 1.000 1.000
AGEC70 1.000 1.000 1.000 1.000
UNIVARIATE SAMPLE STATISTICS
UNIVARIATE HIGHER-ORDER MOMENT DESCRIPTIVE STATISTICS
Variable/ Mean/ Skewness/ Minimum/ % with Percentiles
Sample Size Variance Kurtosis Maximum Min/Max 20%/60% 40%/80% Median
YZ 0.028 1.339 -0.908 28.34% -0.908 -0.377 -0.112
6165.000 1.022 1.522 3.867 0.44% -0.112 0.684
TD 0.236 0.395 0.000 44.25% 0.000 0.000 0.300
6165.000 0.057 -1.306 0.600 23.07% 0.300 0.600
FEMALE 0.584 -0.342 0.000 41.56% 0.000 0.000 1.000
2762.000 0.243 -1.883 1.000 58.44% 1.000 1.000
AGEC70 0.364 0.673 -0.300 34.65% -0.300 0.200 0.200
2762.000 0.419 -0.681 1.700 8.76% 0.200 1.200
THE MODEL ESTIMATION TERMINATED NORMALLY
MODEL FIT INFORMATION
Number of Free Parameters 10
Loglikelihood
H0 Value -8447.120
H0 Scaling Correction Factor 1.2863
for MLR
Information Criteria
Akaike (AIC) 16914.241
Bayesian (BIC) 16981.507
Sample-Size Adjusted BIC 16949.730
(n* = (n + 2) / 24)
MODEL RESULTS
Two-Tailed
Estimate S.E. Est./S.E. P-Value
Within Level
Residual Variances
YZ 0.595 0.028 21.362 0.000
Between Level
S ON
AGEC70 0.158 0.083 1.891 0.059
FEMALE -0.013 0.091 -0.142 0.887
YZ ON
AGEC70 0.113 0.030 3.813 0.000
FEMALE 0.217 0.036 5.951 0.000
YZ WITH
S -0.046 0.075 -0.609 0.543
Intercepts
YZ -0.177 0.027 -6.598 0.000
S 0.196 0.069 2.847 0.004
Residual Variances
YZ 0.400 0.040 9.943 0.000
S 0.328 0.220 1.493 0.135
QUALITY OF NUMERICAL RESULTS
Condition Number for the Information Matrix 0.230E-02
(ratio of smallest to largest eigenvalue)
TECHNICAL 1 OUTPUT
PARAMETER SPECIFICATION FOR WITHIN
NU
YZ TD
________ ________
0 0
LAMBDA
YZ TD
________ ________
YZ 0 0
TD 0 0
THETA
YZ TD
________ ________
YZ 0
TD 0 0
ALPHA
YZ TD
________ ________
0 0
BETA
YZ TD
________ ________
YZ 0 0
TD 0 0
PSI
YZ TD
________ ________
YZ 1
TD 0 0
PARAMETER SPECIFICATION FOR BETWEEN
NU
YZ FEMALE AGEC70
________ ________ ________
0 0 0
LAMBDA
S YZ FEMALE AGEC70
________ ________ ________ ________
YZ 0 0 0 0
FEMALE 0 0 0 0
AGEC70 0 0 0 0
THETA
YZ FEMALE AGEC70
________ ________ ________
YZ 0
FEMALE 0 0
AGEC70 0 0 0
ALPHA
S YZ FEMALE AGEC70
________ ________ ________ ________
2 3 0 0
BETA
S YZ FEMALE AGEC70
________ ________ ________ ________
S 0 0 4 5
YZ 0 0 6 7
FEMALE 0 0 0 0
AGEC70 0 0 0 0
PSI
S YZ FEMALE AGEC70
________ ________ ________ ________
S 8
YZ 9 10
FEMALE 0 0 0
AGEC70 0 0 0 0
STARTING VALUES FOR WITHIN
NU
YZ TD
________ ________
0.000 0.000
LAMBDA
YZ TD
________ ________
YZ 1.000 0.000
TD 0.000 1.000
THETA
YZ TD
________ ________
YZ 0.000
TD 0.000 0.000
ALPHA
YZ TD
________ ________
0.000 0.000
BETA
YZ TD
________ ________
YZ 0.000 0.000
TD 0.000 0.000
PSI
YZ TD
________ ________
YZ 0.511
TD 0.000 0.028
STARTING VALUES FOR BETWEEN
NU
YZ FEMALE AGEC70
________ ________ ________
0.000 0.000 0.000
LAMBDA
S YZ FEMALE AGEC70
________ ________ ________ ________
YZ 0.000 1.000 0.000 0.000
FEMALE 0.000 0.000 1.000 0.000
AGEC70 0.000 0.000 0.000 1.000
THETA
YZ FEMALE AGEC70
________ ________ ________
YZ 0.000
FEMALE 0.000 0.000
AGEC70 0.000 0.000 0.000
ALPHA
S YZ FEMALE AGEC70
________ ________ ________ ________
0.000 0.028 0.000 0.000
BETA
S YZ FEMALE AGEC70
________ ________ ________ ________
S 0.000 0.000 0.000 0.000
YZ 0.000 0.000 0.000 0.000
FEMALE 0.000 0.000 0.000 0.000
AGEC70 0.000 0.000 0.000 0.000
PSI
S YZ FEMALE AGEC70
________ ________ ________ ________
S 1.000
YZ 0.000 0.511
FEMALE 0.000 0.000 0.120
AGEC70 0.000 0.000 0.000 0.192
Beginning Time: 12:32:28
Ending Time: 12:32:29
Elapsed Time: 00:00:01
MUTHEN & MUTHEN
3463 Stoner Ave.
Los Angeles, CA 90066
Tel: (310) 391-9971
Fax: (310) 391-8971
Web: www.StatModel.com
Support: Support@StatModel.com
Copyright (c) 1998-2024 Muthen & Muthen
Collect fits
model1Results <- MplusAutomation::readModels(target="model1.out")model2Results <- MplusAutomation::readModels(target="model2.out")Fits <- MplusAutomation::SummaryTable(list(model1Results, model2Results),keepCols =c("Title","Parameters","NDependentVars","NContinuousLatentVars","Observations","LL", "AIC", "BIC", "ChiSqM_Value","ChiSqM_DF","RMSEA_Estimate","SRMR","CFI"))FitsT <-as.data.frame(t(Fits))# Add the variable names as the first columnFitsT$Feature <-rownames(FitsT)# Adjust the column namescolnames(FitsT) <-c("Model 1", "Model 2", "Feature")# Reset the rownamesrownames(FitsT) <-NULL# ReorderFitsT <- FitsT[c("Feature", "Model 1", "Model 2")]
Compare fits
Feature
Model 1
Model 2
Title
LGCM CESD score
MLM CESD score
Parameters
10
10
NDependentVars
3
1
NContinuousLatentVars
2
1
Observations
2762
6165
LL
-8447.12
-8447.12
AIC
16914.24
16914.24
BIC
16973.48
16981.51
ChiSqM_Value
6.378
NA
ChiSqM_DF
5
NA
RMSEA_Estimate
0.01
NA
SRMR
0.013
NA
CFI
0.998
NA
These models are equivalent by LL. The BICs are discussed on the next slide.
About fits
If you just looked at the BIC, you might think that there was a small advantage for LGCM over MLM. But the loglikelihood (\(LL\)) parameters reveal these models are equivalent. The BIC advantage comes from how the sample size is calculated.
The AIC is \(-2LL + 2r\) where \(r\) is the number of free parameters. The BIC is \(-2LL + r + ln(N)\) where \(N\) is the sample size. The BIC and the AIC will only be equal when \(r = ln(N)\). Moreover, because of how the data is structured for the MLM analysis, Mplus considers the sample size to be 6165 (person observations \(\times\) occasions) for the MLM and only 2762 (person observations) for the LGCM. Therefore, the BICs are not comparable between the LGCM and MLM models.
Let’s compare parameter estimates and see what’s the same and what’s different.
If the column BetweenWithin is NA, that means the result is from the LGCM.
Means and Intercepts parameters
paramHeader
param
est
se
est_se
pval
PARAMETER
BetweenWithin
Intercepts
YZ0
0.000
0.000
999.000
999.000
Intercepts YZ0
NA
Intercepts
YZ3
0.000
0.000
999.000
999.000
Intercepts YZ3
NA
Intercepts
YZ6
0.000
0.000
999.000
999.000
Intercepts YZ6
NA
Intercepts
I
-0.177
0.030
-5.871
0.000
Intercepts I
NA
Intercepts
S
0.196
0.075
2.607
0.009
Intercepts S
NA
Intercepts
YZ
-0.177
0.027
-6.598
0.000
Intercepts YZ
Between
Intercepts
S
0.196
0.069
2.847
0.004
Intercepts S
Between
The intercepts for the LEVEL (Intercept I for LGCM model and Intercept YZ for the MLM model) are the same, but the standard errors are smaller for the MLM model. Sme for the intercepts for the SLOPES.
Variances parameters
paramHeader
param
est
se
est_se
pval
PARAMETER
BetweenWithin
Residual.Variances
YZ0
0.595
0.021
28.943
0.000
Residual.Variances YZ0
NA
Residual.Variances
YZ3
0.595
0.021
28.943
0.000
Residual.Variances YZ3
NA
Residual.Variances
YZ6
0.595
0.021
28.943
0.000
Residual.Variances YZ6
NA
Residual.Variances
I
0.400
0.031
12.916
0.000
Residual.Variances I
NA
Residual.Variances
S
0.327
0.183
1.788
0.074
Residual.Variances S
NA
Residual.Variances
YZ
0.595
0.028
21.362
0.000
Residual.Variances YZ
Within
Residual.Variances
YZ
0.400
0.040
9.943
0.000
Residual.Variances YZ
Between
Residual.Variances
S
0.328
0.220
1.493
0.135
Residual.Variances S
Between
By constraining the residual variances to be equal in the LGCM, we obtain the same results set in the LGCM as the MLM model. For these parameters the standard errors are smaller for the LGCM model.
Multilevel factor model
Now I’ll turn to latent variable modeling for depression.
First I’ll use a multilevel confirmatory factor analysis model (MLCFA). Many examples of this kind of a model can be found in the literature. However, this model comes with some difficulties. A factor measurement model must be specified at the WITHIN and BETWEEN level. These need not be the same, but I am ignorant regarding guidance on whether, how, and why these measurement models should be different. I keep them the same.
Additionally, the MLCFA model allows for residual variances at the factor indicator level to be distributed at the WITHIN and BETWEEN levels. I am just as confused about what to do about that. As I will describe, I don’t allow for factor indicator residual variances at the BETWEEN level in my parameterization.
Modeling considerations: MLCFA
This is the model set-up for the multilevel factor analysis with regression on time in study as a random effect. Observed indicators of depression (y1-y6) are both within and between level variables (and not specified as WITHIN or BETWEEN). A within-level factor is specified (fw) that is identified by fixing the first factor loading (I run a series of preliminary models to find the factor loading that returns a total, single-level, baseline-only common latent variable variance of 1.0, and fix to that value). The fixed parameters are shown in purlple with “@” label. fw is regressed on time and this is declared a random effect. If we had within-level (i.e., time-varying) covariates, we would include them as illustrated with the dashed box and regressions “xw”.
At the between level, I have a between level common latent variable (fb). We assume the factor loadings at the between level and within level are equal. (This assumption is not necessary but I don’t have a reason to do anything else. I am not sure of the implications of having the measurement slopes be equal versus different at the BETWEEN and WITHIN levels). The item intercepts are modeled at the BETWEEN level. All but one are freely estimated. I fix one so that a mean for fb will be identified. The value to which the first item intercept is fixed is derived from the same preliminary models described previously for setting the metric of the factor loadings. The first item intercept is fixed to a value that returns a latent variable mean of 0 at baseline in a single level baseline only model.
The indicators have residual variances at the WITHIN and BETWEEN levels. I will fix these to 0 at the BETWEEN level because I am concerned that not doing so will rob the latent slope S of variance. But, we may have to play around with that.
Data setup
items <-c("sad", "blues", "depress", "happy", "enjoy", "hopeful")df.cesd <- df %>%select(which(names(df) %in%c("id", "td",items,"agec70","female"))) # Create new variables y1-y6 that correspond to the variables in itemsfor (i inseq_along(items)) { df.cesd[[paste0("y", i)]] <- df.cesd[[items[i]]]}df.cesd <- df.cesd %>%select(-all_of(items))
Descriptives for MLFA data
psych::describe(df.cesd, type =3, skew =FALSE, ranges =TRUE, na.rm =TRUE)
FW FEMALE AGEC70
FW 1.000 NA NA
FEMALE 0.068 0.243 NA
AGEC70 0.050 0.013 0.418
Loading and Intercept for item 1
Here I use a custom made function readSvalues chatGPT made for me. It’s in the guts of this QMD document. It’s better to use SVALUES because estimates are reported to six decimal places precision.
FW FEMALE AGEC70
FW 1.000 NA NA
FEMALE 0.068 0.243 NA
AGEC70 0.050 0.013 0.418
MLFA
TITLE: MLFA CESD EPESE
DATA: FILE = cesd.dat ;
VARIABLE: NAMES = id t female agec70 y1 y2 y3 y4 y5 y6 ;
MISSING = . ;
WITHIN = t ;
BETWEEN = agec70 female ;
CLUSTER = id ;
ANALYSIS: TYPE = TWOLEVEL RANDOM ;
MODEL: %WITHIN%
fw BY y1@0.51150 (l1);
fw by y2-y6* (l2-l6);
s | fw on t ;
%BETWEEN%
fb BY y1@0.51150 (l1);
fb by y2-y6* (l2-l6);
y1-y6@0;
fb on agec70 female ;
s on agec70 female ;
fb with s ;
[y1@1.35775];
[fb*];
The multilevel curve of factors (MLCOF) model might be totally new. But, as I show in the very last slide (spoiler alert) is equivalent to the MLCFA model as I have parameterized these models.
The MLGOF model works by using a “trick” to get the common factor intercept at the between level without having to specify a between level measurement model for the factor indicators. It’s an old trick that was used back in the day to access the meanstructure of structural equation models before these were more readily available to the programmer: regress on a constant.
Modeling considerations: MLC OF
This is the model set-up for the multilevel growth-of-factors model. This model is conceptually similar to the MLFA model, although we remove the BETWEEN level factor model for the items, and we “trick” Mplus into estimating a random intercept for the latent factor at the between level by regressing it on a constant (k) and declaring that regression to be random.
As with the MLFA model, observed indicators of depression (y1-y6) are both within and between level variables (and not specified as WITHIN or BETWEEN in Mplus input). A within-level factor is specified (fw) that is identified by fixing the first factor loading (as described in the MLFA model; fixed parameters are shown in purlple). fw is regressed on time and this is declared a random effect and assigned the label “s”. fw is regressed on a constant variable “k”, and this is also declared random and assigned the label “i”.
At the between level, we model the item intercepts (fixing the first item’s intercept as described in the MLFA model to identify the mean of i) and the between-level residual variances of y1-y6 are fixed to 0, for reasons described in MLFA model setup.
Data setup
Use MplusAutomation to prepare data set for Mplus. You actually have to compute the constant and add it to the data set output to Mplus.
# add constant to data framedf.cesd$k <-1MplusAutomation::prepareMplusData(df.cesd,"cesdlongk.dat")
TITLE: Your title goes here
DATA: FILE = "cesdlongk.dat";
VARIABLE:
NAMES = id td female agec70 y1 y2 y3 y4 y5 y6 k;
MISSING=.;
Mplus MLGOF model
Multilevel Growth of Factors is what I’m calling this model
TITLE: Growth of Factors MLM
DATA: FILE = cesdlongk.dat ;
VARIANCES = NOCHECK ; ! this is critical b/c of k
VARIABLE: NAMES = id td female agec70 y1 y2 y3 y4 y5 y6 k ;
MISSING = . ;
WITHIN = td k ;
BETWEEN = agec70 female ;
CLUSTER = id ;
ANALYSIS: TYPE = TWOLEVEL RANDOM ;
MODEL: %WITHIN%
fw BY y1@0.51150 (l1);
fw by y2-y6* (l2-l6);
s | fw on td ; ! random slope with respect to td
i | fw on k ; ! random intercept (constant)
%BETWEEN%
i s on agec70 female ;
i with s ;
y1-y6@0 ; ! force all residual variance to within or i, s
[y1@1.35775]; ! needed for identification of intercept(i)
[y2-y6*];
Mplus VERSION 8.11 (Mac)
MUTHEN & MUTHEN
04/10/2025 12:32 PM
INPUT INSTRUCTIONS
TITLE: Growth of Factors MLM
DATA: FILE = cesdlongk.dat ;
VARIANCES = NOCHECK ; ! this is critical b/c of k
VARIABLE: NAMES = id td female agec70 y1 y2 y3 y4 y5 y6 k ;
MISSING = . ;
WITHIN = td k ;
BETWEEN = agec70 female ;
CLUSTER = id ;
ANALYSIS: TYPE = TWOLEVEL RANDOM ;
MODEL: %WITHIN%
fw BY y1@0.51150 (l1);
fw by y2-y6* (l2-l6);
s | fw on td ; ! random slope with respect to td
i | fw on k ; ! random intercept (constant)
%BETWEEN%
i s on agec70 female ;
i with s ;
y1-y6@0 ; ! force all residual variance to within or i, s
[y1@1.35775]; ! needed for identification of intercept(i)
[y2-y6*];
*** WARNING
One or more individual-level variables have no variation within a
cluster for the following clusters.
Variable Cluster IDs with no within-cluster variation
Y1 154 319 484 815 1046 1528 1791 2164 2451 557 98
Y2 687 793 1071 1618 1669 2096 2104 2369 557 2205
Y3 154 2369
Y4 154 319 484 605 815 1012 1046 1528 1786 1849 2549 557 98
Y5 154 223 319 484 1046 1528 1741 2164 2451 2528 2557 557 886 98
Y6 29 67 143 154 239 341 360 373 448 510 553 605 687 707 933 938 979 1046 1115
1191 1282 1381 1406 1417 1479 1667 1687 1742 1756 1762 1764 1786 1809 1878 1895
1956 1987 2012 2096 2266 2292 2369 2375 2404 2428 2496 2523 2528 2756 557 802
2205 50 444 1783 905 952 1994 2452 180 2576 1325 1342
1 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS
Growth of Factors MLM
SUMMARY OF ANALYSIS
Number of groups 1
Number of observations 6165
Number of dependent variables 6
Number of independent variables 4
Number of continuous latent variables 3
Observed dependent variables
Continuous
Y1 Y2 Y3 Y4 Y5 Y6
Observed independent variables
TD FEMALE AGEC70 K
Continuous latent variables
FW S I
Variables with special functions
Cluster variable ID
Within variables
TD K
Between variables
FEMALE AGEC70
Estimator MLR
Information matrix OBSERVED
Maximum number of iterations 100
Convergence criterion 0.100D-05
Maximum number of EM iterations 500
Convergence criteria for the EM algorithm
Loglikelihood change 0.100D-02
Relative loglikelihood change 0.100D-05
Derivative 0.100D-03
Minimum variance 0.100D-03
Maximum number of steepest descent iterations 20
Maximum number of iterations for H1 2000
Convergence criterion for H1 0.100D-03
Optimization algorithm EMA
Input data file(s)
cesdlongk.dat
Input data format FREE
SUMMARY OF DATA
Number of missing data patterns 32
Number of clusters 2762
COVARIANCE COVERAGE OF DATA
Minimum covariance coverage value 0.100
PROPORTION OF DATA PRESENT
Covariance Coverage
Y1 Y2 Y3 Y4 Y5
________ ________ ________ ________ ________
Y1 0.993
Y2 0.984 0.991
Y3 0.989 0.986 0.994
Y4 0.982 0.977 0.981 0.985
Y5 0.981 0.976 0.981 0.975 0.983
Y6 0.920 0.918 0.921 0.915 0.913
TD 0.993 0.991 0.994 0.985 0.983
K 0.993 0.991 0.994 0.985 0.983
FEMALE 0.993 0.991 0.994 0.985 0.983
AGEC70 0.993 0.991 0.994 0.985 0.983
Covariance Coverage
Y6 TD K FEMALE AGEC70
________ ________ ________ ________ ________
Y6 0.923
TD 0.923 1.000
K 0.923 1.000 1.000
FEMALE 0.923 1.000 1.000 1.000
AGEC70 0.923 1.000 1.000 1.000 1.000
UNIVARIATE SAMPLE STATISTICS
UNIVARIATE HIGHER-ORDER MOMENT DESCRIPTIVE STATISTICS
Variable/ Mean/ Skewness/ Minimum/ % with Percentiles
Sample Size Variance Kurtosis Maximum Min/Max 20%/60% 40%/80% Median
Y1 1.472 1.794 1.000 64.36% 1.000 1.000 1.000
6119.000 0.564 3.045 4.000 4.13% 1.000 2.000
Y2 1.353 2.330 1.000 76.49% 1.000 1.000 1.000
6109.000 0.552 4.888 4.000 4.35% 1.000 2.000
Y3 1.522 1.704 1.000 63.41% 1.000 1.000 1.000
6130.000 0.676 2.312 4.000 5.76% 1.000 2.000
Y4 1.697 1.058 1.000 63.05% 1.000 1.000 1.000
6071.000 1.009 -0.352 4.000 7.15% 1.000 3.000
Y5 1.543 1.474 1.000 71.24% 1.000 1.000 1.000
6063.000 0.885 0.751 4.000 6.28% 1.000 2.000
Y6 1.944 0.721 1.000 56.44% 1.000 1.000 1.000
5693.000 1.412 -1.136 4.000 17.13% 2.000 3.000
TD 0.236 0.395 0.000 44.25% 0.000 0.000 0.300
6165.000 0.057 -1.306 0.600 23.07% 0.300 0.600
K 1.000 0.000 1.000 100.00% 1.000 1.000 1.000
6165.000 0.000 0.000 1.000 100.00% 1.000 1.000
FEMALE 0.584 -0.342 0.000 41.56% 0.000 0.000 1.000
2762.000 0.243 -1.883 1.000 58.44% 1.000 1.000
AGEC70 0.364 0.673 -0.300 34.65% -0.300 0.200 0.200
2762.000 0.419 -0.681 1.700 8.76% 0.200 1.200
WARNING: THE SAMPLE VARIANCE OF K IS 0.000.
THE MODEL ESTIMATION TERMINATED NORMALLY
MODEL FIT INFORMATION
Number of Free Parameters 26
Loglikelihood
H0 Value -42123.918
H0 Scaling Correction Factor 1.6585
for MLR
Information Criteria
Akaike (AIC) 84299.835
Bayesian (BIC) 84474.728
Sample-Size Adjusted BIC 84392.107
(n* = (n + 2) / 24)
MODEL RESULTS
Two-Tailed
Estimate S.E. Est./S.E. P-Value
Within Level
FW BY
Y1 0.512 0.000 999.000 999.000
Y2 0.504 0.015 33.521 0.000
Y3 0.608 0.016 38.118 0.000
Y4 0.606 0.023 26.600 0.000
Y5 0.554 0.022 24.956 0.000
Y6 0.413 0.023 17.916 0.000
Residual Variances
Y1 0.287 0.011 25.740 0.000
Y2 0.283 0.012 23.964 0.000
Y3 0.283 0.013 21.314 0.000
Y4 0.619 0.019 31.867 0.000
Y5 0.562 0.019 29.718 0.000
Y6 1.232 0.022 55.136 0.000
FW 0.527 0.046 11.351 0.000
Between Level
I ON
AGEC70 0.103 0.033 3.100 0.002
FEMALE 0.263 0.041 6.396 0.000
S ON
AGEC70 0.206 0.095 2.166 0.030
FEMALE 0.039 0.100 0.393 0.695
I WITH
S 0.000 0.101 -0.002 0.999
Means
Y1 1.358 0.000 999.000 999.000
Y2 1.240 0.008 150.954 0.000
Y3 1.384 0.010 145.026 0.000
Y4 1.561 0.014 114.172 0.000
Y5 1.421 0.012 117.583 0.000
Y6 1.854 0.017 107.374 0.000
Intercepts
S 0.184 0.074 2.478 0.013
I -0.005 0.031 -0.145 0.885
Variances
Y1 0.000 0.000 999.000 999.000
Y2 0.000 0.000 999.000 999.000
Y3 0.000 0.000 999.000 999.000
Y4 0.000 0.000 999.000 999.000
Y5 0.000 0.000 999.000 999.000
Y6 0.000 0.000 999.000 999.000
Residual Variances
S 0.178 0.296 0.602 0.547
I 0.500 0.058 8.671 0.000
QUALITY OF NUMERICAL RESULTS
Condition Number for the Information Matrix 0.369E-05
(ratio of smallest to largest eigenvalue)
Beginning Time: 12:32:31
Ending Time: 12:32:33
Elapsed Time: 00:00:02
MUTHEN & MUTHEN
3463 Stoner Ave.
Los Angeles, CA 90066
Tel: (310) 391-9971
Fax: (310) 391-8971
Web: www.StatModel.com
Support: Support@StatModel.com
Copyright (c) 1998-2024 Muthen & Muthen
This is the model set-up for the multiple indicators LGCM. It’s too much of a model for the Mplus DEMO version, as in our case it will involve 18 dependent variables (y1-y6 at each of 3 observations). These models are challenging to specify properly, and can take a long time to estimate with maximum likelihood methods because of the relatively large number of latent variables over which to integrate.
But, with only 3 observation time points, it is not too onerous to specify this model, and provides a useful comparison to our multilevel approaches.
I will use constraints on y1 (intercepts and measurement slopes) as in previous multilevel models to place the results on a comparable scale.
Data setup
items <-c("sad", "blues", "depress", "happy", "enjoy", "hopeful")df.cesd <- df %>%select(which(names(df) %in%c("id", "t",items,"agec70","female"))) # Create new variables y1-y6 that correspond to the variables in itemsfor (i inseq_along(items)) { df.cesd[[paste0("y", i)]] <- df.cesd[[items[i]]]}df.cesd <- df.cesd %>%select(-all_of(items)) # Convert 't' into a factor to prevent it from being treated as a functiondf.cesd$t <-as.factor(df.cesd$t)# Reshape the dataframe to long formatlong_df <- df.cesd %>%pivot_longer(cols =starts_with("y"),names_to ="y",values_to ="y_values" ) %>%# Ensure each combination of id and t has a unique row for each ygroup_by(id, t) %>%mutate(y_number =row_number()) %>%ungroup()# Create a time variable that interacts with 'y_number' and 't'long_df <- long_df %>%mutate(y_time =paste0("y", t, y_number))# Then, reshape back to wide format with new names for 'y' variableswide_df <- long_df %>%pivot_wider(id_cols =c(id, female, agec70),names_from = y_time,values_from = y_values,names_prefix ="" )
In the remainder of this presentation I will tabulate comparable parameter estimates from the various models we have considered
Table 1. Growth model levels and slopes: intercept estimates
Model
Level
se
Slope
se
MLM(Yz)
-0.177
0.027
0.196
0.069
MLCFA
-0.005
0.031
0.184
0.074
MLGOF
-0.005
0.031
0.184
0.074
MILGCM
-0.008
0.036
0.187
0.083
Table 2. Growth model levels and slopes: residual variances
Model
Level
se
Slope
se
MLM(Yz)
0.4
0.04
0.328
0.22
MLCFA
0.5
0.058
0.177
0.296
MLGOF
0.5
0.058
0.178
0.296
MILGCM
0.482
0.042
0.146
0.231
MLM(Yz) residual variances are from the BETWEEN model part.
Table 3. Growth model LEVEL parameter: effect of agec70 and female sex
Model
agec70
se
female
se
MLM(Yz)
0.113
0.03
0.217
0.036
MLCFA
0.103
0.033
0.263
0.041
MLGOF
0.103
0.033
0.263
0.041
MILGCM
0.104
0.032
0.265
0.042
Table 4. Growth model SLOPE parameter: effect of agec70 and female sex
Model
agec70
se
female
se
MLM(Yz)
0.158
0.083
-0.013
0.091
MLCFA
0.206
0.095
0.039
0.1
MLGOF
0.206
0.095
0.039
0.1
MILGCM
0.205
0.087
0.038
0.104
Table 5. Residual covariance of LEVEL and SLOPE
Model
est
se
MLM(Yz)
-0.046
0.075
MLCFA
0
0.101
MLGOF
0
0.101
MILGCM
0.009
0.078
Table 6. Fit information
MLFA
MLGOF
MILGCM no rescov
MILGCM
Observations
6165
6165
2762
2762
Parameters
26
26
26
38
LL
-42123.918
-42123.918
-42124.312
-42032.198
LLCorrectionFactor
1.6585
1.6585
NA
NA
AIC
84299.835
84299.835
84300.624
84140.396
BIC
84474.728
84474.728
84454.64
84365.497
elapsed_time
00:00:02
00:00:02
00:00:00
00:00:00
The multilevel confirmatory factor analysis (MLCFA) and multilevel growth of factors (MLGOF) models, as I have parameterized them, are equivalent. The MILGCM (no rescov) [the MILGCM model without the item-level residual covariances over time] has a very similar LL to these two models as well. The MILGCM provides the best fit according to loglikelihood (LL). The \(\chi^2\) difference test on LL for the MILGCM (no rescov) relative to MILGCM model indicates the residual covariances provide significant improvement in model fit (\(\chi^2 = 184, \text{df} = 12, P < .001\)).
Notes: (1) As discussed previously, BIC from “wide” and “long” data layouts are not comparable. (2) The multilevel model for the observed composite CESD score, MLM(Yz), is not comparable to the other models in terms of fit, and therefore is not tabulated.