- Academic Editor
Background: Obesity results from a chronic imbalance between energy
intake and energy expenditure. Total energy expenditure for all physiological
functions combined can be measured approximately by calorimeters. These devices
assess energy expenditure frequently (e.g., in 60-second epochs), resulting in
massive complex data that are nonlinear functions of time. To reduce the
prevalence of obesity, researchers often design targeted therapeutic
interventions to increase daily energy expenditure. Methods: We analyzed
previously collected data on the effects of oral interferon tau supplementation
on energy expenditure, as assessed with indirect calorimeters, in an animal model
for obesity and type 2 diabetes (Zucker diabetic fatty rats). In our statistical
analyses, we compared parametric polynomial mixed effects models and more
flexible semiparametric models involving spline regression. Results: We
found no effect of interferon tau dose (0 vs. 4
As sedentary lifestyles spread globally, obesity has become an increasing public health concern [1]. Sedentary lifestyles are growing rapidly with developing technology [2]. Obesity results from a chronic imbalance between food intake and energy expenditure, genetic predisposition, consumption of high fat diets, and inflammation [3]. Additionally, obesity contributes to adverse health outcomes such as insulin resistance, type 2 diabetes, obstructive sleep apnea, osteoarthritis, stroke, hypertension, and cancer [4]. As obesity becomes more prevalent, researchers seek to better understand the causal pathways leading to it. Energy expenditure is a key factor on these pathways and refers to the amount of energy used by the body for all physiological functions, such as movement, respiration, and digestion [5, 6, 7]. Energy expenditure has three components: resting metabolism, the thermic effect of feeding, and the thermic effect of physical activity [5, 7]. Resting metabolism makes up 60% to 70% of an individual’s daily energy expenditure [5]. The thermic effect of feeding, including digestion, accounts for up to 10% of daily energy expenditure [5]. Finally, the thermic effect of physical activity comprises 20% to 30% of daily energy expenditure [5].
Measuring energy expenditure accurately requires sensitive and sophisticated
instruments. One commonly used instrument is the open circuit calorimeter, such
as the computer-controlled Oxymax metabolic chamber for research animals
(Columbus Instruments, Ohio, USA). This instrument measures energy expenditure in
epochs of 60 seconds to five minutes during an observation period. The instrument
calculates an animal’s energy expenditure from its volumetric carbon dioxide
production (VCO

Plots of the heat production (kcal/kg BW/h) against time (in minutes and hours) over a 24 hour period for the 10-week-old ZDF rats. (a) shows the animal-specific trajectories of untransformed heat production in minutes. (b) shows the animal-specific trajectories of untransformed heat production in minutes by treatment group. (c) shows the animal-specific trajectories of untransformed heat production in hours. (d) shows the animal-specific trajectories of untransformed heat production in hours by treatment group. The blue lines in (a,c) are based on smoothing of the lines. In (b,d), C refers to the control group, L refers to the low dose group, and H refers to the high dose group.
Because energy expenditure affects the development of obesity [12, 13, 14], some researchers have sought to manipulate energy expenditure and its physiological effects as a way to prevent or reduce obesity. Interferon tau, an anti-inflammatory cytokine, is one proposed intervention for achieving this aim [15, 16]. In a previous study, we evaluated the impact of interferon tau on obesity-related outcomes in Zucker diabetic fatty (ZDF) rats [11]. The ZDF animal model has deficiencies in its leptin receptors and therefore researchers often use it for obesity and type 2 diabetes studies. The objective of this study is to provide an introduction to more flexible approaches to assessing intervention effects on high dimensional data frequently collected in biomedical studies such as the device-based measures of energy expenditure.
We obtained 18 male 23-day-old ZDF rats from Charles River Laboratories and fed them a Purina 5008 diet throughout the study. The Purina 5008 diet consisted of 23.5% crude protein, 6.0% fat, 34.9% starch, 2.6% sucrose, 0.5% glucose plus fructose, 6.8% minerals, and 3.8% fiber, yielding 17,364 kJ gross energy/kg [11]. We kept the study animals in a temperature- and humidity-controlled facility on a 12-h light: 12-h dark cycle. The Texas A&M University Animal Use and Care Committee approved the study (#2010-251).
At 28 days of age, we randomly assigned the rats to receive drinking water
(distilled and deionized H
Linear mixed effects models (LMEMs) can be used to analyze repeated measures data [17]. These models extend classical linear regression to correlated data. They provide powerful techniques for analyzing correlated data with complex variance structures, handling missing data, and incorporating nonlinear trends with log or higher order polynomial transformations. LMEMs take the following form:
where
In assessing the impact of oral interferon tau supplementation on energy expenditure, we estimated 12 separate LMEMs resulting from all combinations of energy expenditure transformation (raw or log transformed), unit of time (minutes or hours), and time term (linear, quadratic, or cubic). We intended the log transformations to make the data approximately normal. We performed all analyses using the R Crans Software version 4.2.0 (R Core Team, Vienna, Austria) [18].
Penalized spline regression is a flexible semiparametric approach to estimating mean functions in mixed effects models [19]. Mean functions represented by splines can be expressed easily as the best linear unbiased predictors of the mixed effects model [20]. Semiparametric mixed effects models (SMEMs) are also specified as in Eqn. 1. However, the elements of the random components matrices differ from LMEMs. SMEMs include spline basis functions as random effects in addition to subject-specific random effects. Thus, SMEMs can be written as classical mixed effects models that include nonparametric terms for curve smoothing.
We used two kinds of semiparametric functions in our SMEMs: truncated power basis functions (TPBFs) and cubic B-spline functions.
Truncated power basis functions are simple semiparametric functions that
approximate curves. We define a truncated power function at a given knot
where
where
Our cubic TPBF model of energy expenditure is
B-splines allow flexible approaches to analyzing data [21, 25]. B-splines are
piece-wise polynomial functions of order
B-spline basis functions are nonzero over the interval
In our analyses, we specified the B-spline models as
where
One assumption of classical regression models is that covariates are
independent. However, polynomial splines in regression models are not independent
because they are piece-wise functions used to approximate curves. Therefore, the
standard errors and confidence intervals for parameters in classical regression
models are not applicable in models involving splines. For inference in spline
regression models, nonparametric bootstrap methods can be used [28]. The
nonparametric bootstrap involves resampling the data to estimate variances of
model parameters without any distributional assumptions. To implement the
nonparametric bootstrap, we first resampled the original data with replacement
for each animal at different time points in the study. Next, we estimated model
coefficients with the resampled data, and then repeated the resampling and
estimation process b = 500 times. We computed the 95th percent bootstrap
confidence intervals using the percentile approach using
We also calculated the corresponding p-values for the estimated coefficients
under the null hypothesis of
We selected models with the smallest Akaike information criteria (AIC) [29, 30] values as the best fitting.
Summarizing heat production by minute increased variability and random noise in the data relative to summarizing by hour (Fig. 1a–d). Heat production also varied nonlinearly over time. Other device based measures of energy expenditure showed similar patterns as heat production. Therefore, we focus our report on modeling heat production.
We estimated twelve separate LMEMs (Tables 1,2,3). The low and high dose groups did not differ significantly from the control group in heat production in any model. Models with a cubic term for time fit the data better than models with a quadratic term, which fit the data better than models with a linear term (see also Figs. 2a–c,3a–c). Also, models with log-transformed heat production fit much better than those with untransformed heat production. Furthermore, models of hourly mean heat production fit better than models of raw heat production at the scale of minutes, although the parameter estimates of paired models (differing only in time units) were very similar. Because coefficients for time terms and their standard errors were often close to the lower bound of zero, inference for these parameters may be inaccurate.
Model | Outcome | Variable | S.E. | p | AIC | |
---|---|---|---|---|---|---|
LMEM (raw) | Heat | Intercept | 2.589 | 0.058 | 4420.8 | |
L vs. C | –0.141 | 0.098 | 0.170 | |||
H vs. C | 0.005 | 0.098 | 0.96 | |||
Time (mins) | –0.0003 | 0.00 | ||||
LMEM (log transformed) | log(Heat) | Intercept | 0.933 | 0.027 | –1921.06 | |
L vs. C | –0.06 | 0.05 | 0.21 | |||
H vs. C | 0.011 | 0.05 | 0.81 | |||
Time (mins) | –0.0001 | 0.00 | ||||
LMEM mean(Heat)/hr | mean(Heat) | Intercept | 2.584 | 0.063 | 328.575 | |
L vs. C | –0.141 | 0.097 | 0.169 | |||
H vs. C | 0.015 | 0.097 | 0.877 | |||
Time (hr) | –0.015 | 0.002 | ||||
LMEM mean{log(Heat)}/hr | mean{log(Heat)} | Intercept | 0.935 | 0.029 | –398.918 | |
L vs. C | –0.061 | 0.045 | 0.201 | |||
H vs. C | 0.015 | 0.045 | 0.749 | |||
Time (hr) | –0.006 | 0.001 |
p is the parametric p-value obtained based on Z-score, which
is given by
C, control; L, Low dose; H, High dose; hr, hour; min, minutes
Model | Outcome | Variable | S.E. | p | AIC | |
---|---|---|---|---|---|---|
LMEM (raw) | Heat | Intercept | 2.992 | 0.06 | 3819.95 | |
L vs. C | −0.14 | 0.098 | 0.17 | |||
H vs. C | 0.012 | 0.098 | 0.91 | |||
Time (mins) | –0.0002 | 0.0001 | ||||
Time |
–0.000 | 0.000 | ||||
LMEM (log transformed) | log(Heat) | Intercept | 1.110 | 0.028 | –2554.46 | |
L vs. C | –0.06 | 0.05 | 0.21 | |||
H vs. C | –0.014 | 0.05 | 0.76 | |||
Time (mins) | –0.0001 | 0.000 | ||||
Time |
0.000 | 0.000 | ||||
LMEM mean(Heat)/hr | mean(Heat) | Intercept | 3.052 | 0.069 | 174.539 | |
L vs. C | –0.141 | 0.097 | 0.17 | |||
H vs. C | 0.016 | 0.097 | 0.87 | |||
Time (hr) | –0.0124 | 0.008 | ||||
Time |
0.004 | 0.0003 | ||||
LMEM mean{log(Heat)}/hr | mean{log(Heat)} | Intercept | 1.133 | 0.031 | –548.795 | |
L vs. C | –0.061 | 0.045 | 0.20 | |||
H vs. C | 0.015 | 0.045 | 0.75 | |||
Time (hr) | –0.052 | 0.003 | ||||
Time |
0.002 | 0.0001 |
p is the parametric p-value obtained based on Z-score, which
is given by
Time
C, control; L, Low dose; H, High dose; hr, hour; min, minutes.
Model | Outcome | Variable | S.E. | p | AIC | |
---|---|---|---|---|---|---|
LMEM (raw) | Heat | Intercept | 2.87 | 0.063 | 3814.23 | |
L vs. C | –0.14 | 0.098 | 0.173 | |||
H vs. C | 0.014 | 0.098 | 0.889 | |||
Time (mins) | –0.0001 | 0.0003 | ||||
Time |
–0.000 | 0.000 | ||||
Time |
–0.000 | 0.000 | ||||
LMEM (log transformed) | log(Heat) | Intercept | 1.036 | 0.029 | –2583.84 | |
L vs. C | –0.06 | 0.05 | 0.21 | |||
H vs. C | –0.015 | 0.05 | 0.74 | |||
Time (mins) | –0.0003 | 0.000 | ||||
Time |
0.000 | 0.000 | ||||
Time |
–0.000 | 0.000 | ||||
LMEM mean (Heat)/hr | mean(Heat) | Intercept | 2.872 | 0.08 | 176.975 | |
L vs. C | –0.141 | 0.097 | 0.17 | |||
H vs. C | 0.016 | 0.097 | 0.87 | |||
Time (hr) | –0.045 | 0.02 | ||||
Time |
–0.003 | 0.0002 | ||||
Time |
–0.0002 | 0.000 | ||||
LMEM mean{log(Heat)}/hr | mean{log(Heat)} | Intercept | 1.045 | 0.036 | –550.11 | |
L vs. C | –0.061 | 0.045 | 0.202 | |||
H vs. C | 0.015 | 0.045 | 0.749 | |||
Time (hr) | –0.014 | 0.008 | 0.105 | |||
Time |
–0.002 | 0.0001 | 0.013 | |||
Time |
0.0001 | 0.000 |
p is the parametric p-value obtained based on Z-score, which
is given by
Time
Time
C, control; L, Low dose; H, High dose; hr, hour; min, minutes.

Boxplots of the residuals for heat production (kcal/kg BW/h) over time (hours) for 10-week-old ZDF rats. (a) LMEM with a linear term for time. (b) LMEM with a quadratic term. (c) LMEM with a cubic term for time. (d) TPBF SMEM with a linear smoothing spline. (e) TPBF SMEM with a quadratic smoothing spline. (f) TPBF SMEM with a cubic smoothing spline. (g) B-spline SMEM with a linear smoothing spline. (h) B spline SMEM with a quadratic smoothing spline. (i) B-spline SMEM with a cubic smoothing spline. The quadratic and cubic terms for time in the LMEMs fit the nonlinear trends in the data better than the LMEM with a linear term for time.

Predicted values of mean heat production (kcal/kg BW/h) against observed mean heat production(kcal/kg BW/h) in 10-week-old ZDF rats. (a) LMEM with a linear term for time. (b) LMEM with a quadratic term. (c) LMEM with a cubic term for time. (d) TPBF SMEM with a linear smoothing spline. (e) TPBF SMEM with a quadratic smoothing spline. (f) TPBF SMEM with a cubic smoothing spline. (g) B-spline SMEM with a linear smoothing spline. (h) B spline SMEM with a quadratic smoothing spline. (i) B-spline SMEM with a cubic smoothing spline. The quadratic and cubic terms for time in the LMEMs fit the nonlinear trends in the data better than the LMEM with a linear term for time.
The TPBF models fit the data substantially better than the LMEMs (Table 4; Figs. 2d–f,3d–f). The linear spline TPBF model fit raw heat production at the scale of minutes best, while the quadratic spline model fit hourly mean heat production best. As in the LMEMs, there were no statistically significant treatment effects in any TPBF model. Also, TPBF models of hourly mean heat production fit better and had lower AIC values when compared to the AIC values for the analyses conducted at the minute-levels. For the cubic spline models, the higher order terms for time (quadratic and cubic) were not statistically significant. However, both the linear and quadratic terms for time were statistically significant quadratic spline models in the paired models that differed only in time scale.
Model | Outcome | Variable | p* | CI* | AIC | |
TPBF (linear spline) | Heat | Intercept | 2.883 | 0.552 | (–0.298, 3.247) | 3447.53 |
L vs. C | –0.108 | 0.508 | (–0.352, 0.256) | |||
H vs. C | 0.088 | 0.588 | (–0.255, 0.317) | |||
Time (mins) | –0.002 | 0.648 | (–0.004, 3.189) | |||
mean(Heat) | Intercept | 2.706 | (2.576, 3.371) | 179.369 | ||
L vs. C | –0.106 | 0.300 | (–0.387, 0.148) | |||
H vs. C | 0.045 | 0.944 | (–0.248, 0.318) | |||
Time (hr) | –0.017 | (–0.004, –0.001) | ||||
TPBF (quadratric spline) | Heat | Intercept | 2.918 | 0.560 | (–0.346, 3.233) | 3505.01 |
L vs. C | –0.051 | 0.456 | (–0.406, 0.257) | |||
H vs. C | –0.010 | 0.608 | (–0.237, 0.314) | |||
Time (mins) | –0.002 | 0.648 | (–0.003, 0.000) | |||
Time |
0.000 | (0.000, 3.18) | ||||
mean(Heat) | Intercept | 2.981 | (2.404, 3.274) | 176.249 | ||
L vs. C | –0.106 | 0.344 | (–0.393, 0.253) | |||
H vs. C | 0.079 | 0.896 | (–0.263, 0.358) | |||
Time (hr) | –0.087 | (–0.003, –0.001) | ||||
Time |
0.003 | 0.004 | (0.000, 0.000) | |||
TPBF (cubic spline) | Heat | Intercept | 1.908 | 0.520 | (–0.322, 3.277) | 3601.67 |
L vs. C | –0.250 | 0.512 | (–0.393, 0.274) | |||
H vs. C | –0.030 | 0.600 | (–0.242, 0.35) | |||
Time (mins) | –0.003 | 0.584 | (–0.005, 0.000) | |||
Time |
0.000 | 0.712 | (–0.000, 0.000) | |||
Time |
0.000 | 0.900 | (–0.000, 3.218) | |||
mean(Heat) | Intercept | 2.743 | 0.004 | (1.738, 3.454) | 186.497 | |
L vs. C | –0.115 | 0.312 | (–0.418, 0.165) | |||
H vs. C | 0.066 | 0.988 | (–0.275, 0.359) | |||
Time (hr) | –0.034 | 0.008 | (–0.006, –0.000) | |||
Time |
–0.003 | 0.216 | (–0.000, 0.000) | |||
Time |
0.000 | 0.340 | (–0.000, 0.000) |
p* is the non-parametric p-value calculated as
Time
Time
C, control; L, Low dose; H, High dose; hr, hour; min, minutes.
The B-spline SMEMs (Table 5; Figs. 2g–i,3g–i) fit the data better than the TPBF models and LMEMs. As with all of the other models, there were no statistically significant treatment effects in the B-spline models. Also, B-spline models of hourly mean heat production fit better and had lower standard errors of coefficients than models of raw heat production at the time scale of minutes, although the patterns of coefficients for paired models were similar. The quadratic B-spline analyzed at the hourly level performed the best of all models we estimated for both mean hourly untransformed heat production and untransformed heat production at the time scale of minutes. Time was not statistically significant in the linear spline models for the analyses performed at the hourly and minute levels. However, the linear and quadratic terms for time were statistically significant in the quadratic spline model performed on a the hour level time scale. The quadratic and cubic terms for time were not statistically significant in the cubic spline models.
Model | Outcome | Variable | p* | CI* | AIC | |
---|---|---|---|---|---|---|
(SMEM) (linear spline) | Heat | Intercept | 2.754 | 0.560 | (–0.292, 3.147) | 3379.22 |
L vs. C | –0.170 | 0.596 | (–0.325, 0.295) | |||
H vs. C | 0.037 | 0.784 | (–0.222, 0.309) | |||
Time (mins) | 0.000 | 0.648 | (–0.001, 3.09) | |||
mean(Heat) | Intercept | 2.704 | (2.604, 3.209) | 162.114 | ||
L vs. C | –0.159 | 0.268 | (–0.359, 0.149) | |||
H vs. C | 0.082 | 0.744 | (–0.246, 0.342) | |||
Time (hr) | –0.016 | 0.004 | (–0.001, –0.000) | |||
(SMEM) (quadratric spline) | Heat | Intercept | 3.064 | 0.568 | (–0.272, 3.315) | 3344.63 |
L vs. C | –0.176 | 0.620 | (–0.344, 0.263) | |||
H vs. C | 0.093 | 0.832 | (–0.222, 0.319) | |||
Time (mins) | –0.002 | 0.648 | (–0.003, 0.000) | |||
Time |
0.000 | (0.000, 3.297) | ||||
meag(Heat) | Intercept | 2.899 | (2.722, 3.364) | 142.136 | ||
L vs. C | –0.127 | 0.316 | (–0.367, 0.179) | |||
H vs. C | 0.100 | 0.672 | (–0.214, 0.36) | |||
Time (hr) | –0.115 | (–0.003, –0.001) | ||||
Time |
0.004 | (0.000, 0.0000) | ||||
(SMEM) (cubic spline) | Heat | Intercept | 2.923 | 0.552 | (–0.276, 3.447) | 3376.26 |
L vs. C | –0.106 | 0.636 | (–0.327, 0.321) | |||
H vs. C | 0.065 | 0.852 | (–0.206, 0.36) | |||
Time (mins) | –0.003 | 0.552 | (–0.004, 0.000) | |||
Time |
0.000 | 0.728 | (–0.000, 0.000) | |||
Time |
0.000 | 0.640 | (–0.000, 3.402) | |||
meag(Heat) | Intercept | 2.909 | (2.741, 3.427) | 164.561 | ||
L vs. C | –0.121 | 0.364 | (–0.38, 0.174) | |||
H vs. C | 0.126 | 0.620 | (–0.224, 0.368) | |||
Time (hr) | –0.083 | 0.028 | (–0.004, 0.000) | |||
Time |
0.001 | 0.472 | (–0.000, 0.000) | |||
Time |
0.001 | 1.000 | (–0.000, 0.000) |
p* is the non-parametric p-value calculated as
Time
Time
(C, control; L, Low dose; H, High dose; hr, hour; min, minutes)
Mixed effects models are useful for analyzing repeated measures data. However, with relatively noisy data such as device-based heat production data, variance parameters might not be well estimated. The semiparametric models, especially those with B-splines, approximated the nonlinear patterns in the untransformed heat production data better and thus had substantially higher predictive power than the parametric models. Another advantage of the semiparametric mixed effects modeling approach is that it does not require transforming the outcome variable (e.g., log transformation of heat production in our LMEMs) to make the data approximately normal and improve model fit. In analyzing energy expenditure data collected by devices, the first step is to evaluate plots of energy expenditure against time in minutes. If there appears to be a considerable amount of random noise in the plots, summarizing the data into longer time periods, such as hours, will reduce the random variation due to the frequency of data collection. If the data represent a high dimensional curve over time, rather than a linear function, we recommend semiparametric mixed effects models with smoothing splines for analysis.
In this manuscript, we demonstrated the use of semiparametric models to analyze noisy high dimensional data frequently collected by devices in epochs of 60-seconds over multiple days. A common approach to analyzing these data is to summarize the data into an overall summary such as overall heat production observed over a given week. In our previous analysis of these data [11], we summarized the data to the hourly level and used parametric linear mixed effects models to assess the effects of oral supplementation of interferon tau on device-based measures of energy expenditure. Our analysis included an interaction term between time and the treatment levels. The overall test for the interaction between treatments and time was not statistically significant at the 5% significance level. However, when separate analysis were conducted by each hour of observation to determine the treatment effects at each hour, we observed that the relationship between interferon tau treatment and measures of energy expenditure such as heat production depended on time and that the differences between the animals on the higher doses of interferon and the lower doses depended on time. A limitation of the current study is the sample size. The use of semiparametric methods in assessing treatment effects require larger sample sizes. Our findings have important implications for statistically analyzing data from experimental and clinical studies regarding effects of nutrition (e.g., dietary intakes of amino acids [31]) on improving metabolic profiles and health in animals and humans.
With the rise in complex data frequently collected from devices such as the Oxymas instrument, we recommend summarizing the data from units of time in minutes to hourly or half-hourly measures to reduce the noise associated with the frequency of data collection. The use of semiparametric regression methods provide more flexible modeling approaches to analyzing these data compared to parametric methods based on polynomial mixed effects models.
AIC, Aikake information criteria; LMEM, linear mixed effects model; SMEM, semiparametric mixed effects model; TPBF, truncated power basis function; ZDF, Zucker diabetic fatty.
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.
HK drafted the manuscript and performed the initial data analysis as part of her master’s thesis under the guidance of the last author while at Texas A&M University. YL re-did some of the data analysis and simulations, and assisted with drafting the manuscript. RSZ assisted with developing the concept and data analysis. GW and CDT designed and conducted the animal experiments at Texas A&M University. CDT developed the concept for and assisted with the statistical data analysis, drafting and editing of the manuscript.
This study (Animal Use Protocol # 2010-251) was approved by The Institutional Animal Care and Use Committee of Texas A&M University. No consent to participate was applicable.
The authors would like to thank Dr. Devon Brewer and Dr. Lisa Giles for their assistance in editing this manuscript.
This research was supported by National Institute of Diabetes and Digestive and Kidney Diseases award number 1R01DK132385-01. The animal experiment of this work was supported by grants from the American Heart Association (10GRNT4480020 and 11GRNT7930004).
The authors declare no conflict of interest. GW is serving as Editorial Board Member of this journal. We declare that GW had no involvement in the peer review of this article and has no access to information regarding its peer review. Full responsibility for the editorial process for this article was delegated to YC.
Publisher’s Note: IMR Press stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.