Parametric and Semiparametric Approaches to Analyzing Device-Based Measures of Energy Expenditure in Zucker Diabetic Fatty Rats
Hyunkyoung Kim, Yuanyuan Luan, Roger S. Zoh, Guoyao Wu, Carmen D. Tekwe

TL;DR
This paper compares statistical methods to analyze energy expenditure data in diabetic rats and finds that flexible models work best.
Contribution
The study introduces a recommended approach using semiparametric models and data summarization for analyzing frequent energy expenditure measurements.
Findings
No effect of interferon tau on energy expenditure was observed in the study.
A B-spline semiparametric model performed best for modeling nonlinear energy expenditure patterns.
Summarizing data into 30-60 minute epochs is recommended to reduce noise in high-dimensional energy expenditure data.
Abstract
Obesity results from a chronic imbalance between energy intake and energy expenditure. Total energy expenditure for all physiological functions combined can be measured approximately by calorimeters. These devices assess energy expenditure frequently (e.g., in 60-second epochs), resulting in massive complex data that are nonlinear functions of time. To reduce the prevalence of obesity, researchers often design targeted therapeutic interventions to increase daily energy expenditure. We analyzed previously collected data on the effects of oral interferon tau supplementation on energy expenditure, as assessed with indirect calorimeters, in an animal model for obesity and type 2 diabetes (Zucker diabetic fatty rats). In our statistical analyses, we compared parametric polynomial mixed effects models and more flexible semiparametric models involving spline regression. We found no effect of…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdipose Tissue and Metabolism · Diet and metabolism studies · Dietary Effects on Health
Introduction
As sedentary lifestyles spread globally, obesity has become an increasing public health concern [1]. Sedentary lifestyles are growing rapidly with developing technology [2]. Obesity results from a chronic imbalance between food intake and energy expenditure, genetic predisposition, consumption of high fat diets, and inflammation [3]. Additionally, obesity contributes to adverse health outcomes such as insulin resistance, type 2 diabetes, obstructive sleep apnea, osteoarthritis, stroke, hypertension, and cancer [4]. As obesity becomes more prevalent, researchers seek to better understand the causal pathways leading to it. Energy expenditure is a key factor on these pathways and refers to the amount of energy used by the body for all physiological functions, such as movement, respiration, and digestion [5–7]. Energy expenditure has three components: resting metabolism, the thermic effect of feeding, and the thermic effect of physical activity [5,7]. Resting metabolism makes up 60% to 70% of an individual’s daily energy expenditure [5]. The thermic effect of feeding, including digestion, accounts for up to 10% of daily energy expenditure [5]. Finally, the thermic effect of physical activity comprises 20% to 30% of daily energy expenditure [5].
Measuring energy expenditure accurately requires sensitive and sophisticated instruments. One commonly used instrument is the open circuit calorimeter, such as the computer-controlled Oxymax metabolic chamber for research animals (Columbus Instruments, Ohio, USA). This instrument measures energy expenditure in epochs of 60 seconds to five minutes during an observation period. The instrument calculates an animal’s energy expenditure from its volumetric carbon dioxide production (VCO_2_) and volumetric oxygen consumption (VO_2_). The device also records the animal’s total heat production (heat), and respiratory quotient (RQ). The resulting data are repeated measures that appear as curves or complex high dimensional non-linear functions of time (Fig. 1). Researchers often are confused about the most appropriate method for analyzing these data. A common approach is to compute a summary measure, such as the overall mean energy expenditure for the whole observation period or categorizing individuals on their intensity of activity [8–11]. These approaches are limited because they do not capture variation in energy expenditure or its pattern over time.
Because energy expenditure affects the development of obesity [12–14], some researchers have sought to manipulate energy expenditure and its physiological effects as a way to prevent or reduce obesity. Interferon tau, an anti-inflammatory cytokine, is one proposed intervention for achieving this aim [15, 16]. In a previous study, we evaluated the impact of interferon tau on obesity-related outcomes in Zucker diabetic fatty (ZDF) rats [11]. The ZDF animal model has deficiencies in its leptin receptors and therefore researchers often use it for obesity and type 2 diabetes studies. The objective of this study is to provide an introduction to more flexible approaches to assessing intervention effects on high dimensional data frequently collected in biomedical studies such as the device-based measures of energy expenditure.
Materials and Methods
We obtained 18 male 23-day-old ZDF rats from Charles River Laboratories and fed them a Purina 5008 diet throughout the study. The Purina 5008 diet consisted of 23.5% crude protein, 6.0% fat, 34.9% starch, 2.6% sucrose, 0.5% glucose plus fructose, 6.8% minerals, and 3.8% fiber, yielding 17,364 kJ gross energy/kg [11]. We kept the study animals in a temperature- and humidity-controlled facility on a 12-h light: 12-h dark cycle. The Texas A&M University Animal Use and Care Committee approved the study (#2010–251).
At 28 days of age, we randomly assigned the rats to receive drinking water (distilled and deionized H_2_O) with 0 (control), 4 (low dose), or 8μg (high dose) of interferon tau/kg body weight per day (6 rats per condition). The rats had free access to food and drinking water during the 8-week study. To maintain assigned interferon tau dosages, we adjusted concentrations of interferon tau in the drinking water daily based on the volume of water the animals consumed. We changed their drinking water every other day. When the rats were 10 weeks old (week 6 of the interferon tau treatment), we placed each in an Oxymax chamber for 24 hours to assess energy expenditure. Approximately every five minutes, the instrument measured several indicators of energy expenditure: volumetric O_2_ consumption (VO_2_;L/h/kg body weight [BW]), volumetric CO_2_ production (VCO_2_; L/h/kgBW), respiratory quotient (RQ; CO_2_ production/O_2_ consumption) and heat production (kcal/h) (Heat). We focused our analyses on heat production. Our original report has further details on the experiment [11].
Models Considered
Linear Mixed Effects Models
3.1
Linear mixed effects models (LMEMs) can be used to analyze repeated measures data [17]. These models extend classical linear regression to correlated data. They provide powerful techniques for analyzing correlated data with complex variance structures, handling missing data, and incorporating nonlinear trends with log or higher order polynomial transformations. LMEMs take the following form:
where is the response for the subject, is a vector of fixed coefficients, is a vector of fixed variables, is a vector for the random effects, and is a vector for the random variables. The random error terms represent the random variation associated with the response. These models rely on the assumptions that and , where is the variance-covariance matrix for . The mean response of is , the fixed component of the model, while is the random component of the model, representing individual variation from the overall sample mean and allowing description of individual-specific trajectories.
In assessing the impact of oral interferon tau supplementation on energy expenditure, we estimated 12 separate LMEMs resulting from all combinations of energy expenditure transformation (raw or log transformed), unit of time (minutes or hours), and time term (linear, quadratic, or cubic). We intended the log transformations to make the data approximately normal. We performed all analyses using the R Crans Software version 4.2.0 (R Core Team, Vienna, Austria) [18].
Semiparametric Mixed Effects Models
3.2
Penalized spline regression is a flexible semiparametric approach to estimating mean functions in mixed effects models [19]. Mean functions represented by splines can be expressed easily as the best linear unbiased predictors of the mixed effects model [20]. Semiparametric mixed effects models (SMEMs) are also specified as in Eqn. 1. However, the elements of the random components matrices differ from LMEMs. SMEMs include spline basis functions as random effects in addition to subject-specific random effects. Thus, SMEMs can be written as classical mixed effects models that include nonparametric terms for curve smoothing.
We used two kinds of semiparametric functions in our SMEMs: truncated power basis functions (TPBFs) and cubic B-spline functions.
Truncated Power Basis Functions
3.2.1
Truncated power basis functions are simple semiparametric functions that approximate curves. We define a truncated power function at a given knot as
where is the order of the polynomial function, and represents the number of knots [21]. The functions are differentiable up to times [20–23]. In modeling mean functions, TPBFs approximate curves based on polynomial expansions. A mixed effects model based on truncated power basis is
where is the vector of responses for the subject, represents the total number of responses per subject, is the fixed part of the model, and is the subject specific random intercept. The term is a order truncated power basis of degree , with representing the knot [20]. Eqn. 2 is a polynomial piece-wise regression model with separate slopes, , fit to different partitions of the predictor variable. Thus, is an indicator variable indicating the partition where is positive. Knots are the points where adjacent partitions meet. For effective estimation, the TPBF approach requires an adequate number of knots or penalization [21].
Our cubic TPBF model of energy expenditure is
are the fixed coefficients for the linear, quadratic, and cubic terms for time, respectively, and and represent the low interferon tau and high interferon tau groups’ contrasts, respectively, with the control group. The term is the cubic spline basis. We treat the truncated cubic basis splines and the intercept, , as random and assume and . When , Eqn. 2 reduces to a mixed effects model. The random effects , which we model as normal random curves with mean zero [23], are not present in the LMEM in Eqn. 1. The smoothness of the spline regression rises with increasing degree of the polynomial [23]. The smoothing parameter, , controls the smoothness of the curve, while the mean square error of the model grows with increasing [22,23]. Although easy to construct, models based on the TPBF can be numerically unstable due to correlations between the basis functions. When the range for in Eqn. 2 is wide, the basis functions increase rapidly as rises. To resolve this issue, the range for may be re-scaled to [0, 1]. These disadvantages make the models prone to computational difficulties [21]. B-splines allow analysts to avoid these problems [21,24,25].
B-spline Basis Functions
3.2.2
B-splines allow flexible approaches to analyzing data [21,25]. B-splines are piece-wise polynomial functions of order connected at their inner knots [19,21,24,26]. While B-splines are equivalent to TPBFs on any given interval , they are more numerically stable [20,21,27]. B-splines are transformations of TPBFs [20,21]. To illustrate their equivalence, let and be design matrices for the TPBF and the B-spline basis functions of the same degree and same knot locations, respectively. Then where is a square invertible matrix [20].
B-spline basis functions are nonzero over the interval . Next, let be a set of non-decreasing knots. The domain for B-splines is , with and , typically representing the two boundary knots [24]. We define the B-spline basis function of degree recursively as
In our analyses, we specified the B-spline models as
where is the vector of responses for the subject, and are the fixed effects, and and are the subject-specific random intercepts and random slopes for the B-spline basis functions, respectively.
Inference and Model Selection
3.3
One assumption of classical regression models is that covariates are independent. However, polynomial splines in regression models are not independent because they are piece-wise functions used to approximate curves. Therefore, the standard errors and confidence intervals for parameters in classical regression models are not applicable in models involving splines. For inference in spline regression models, nonparametric bootstrap methods can be used [28]. The nonparametric bootstrap involves resampling the data to estimate variances of model parameters without any distributional assumptions. To implement the nonparametric bootstrap, we first resampled the original data with replacement for each animal at different time points in the study. Next, we estimated model coefficients with the resampled data, and then repeated the resampling and estimation process b = 500 times. We computed the 95 th percent bootstrap confidence intervals using the percentile approach using , where . The terms and represent the quantiles of the bootstrap distributions for the estimated coefficients.
We also calculated the corresponding -values for the estimated coefficients under the null hypothesis of as [28].
We selected models with the smallest Akaike information criteria (AIC) [29,30] values as the best fitting.
Results
Summarizing heat production by minute increased variability and random noise in the data relative to summarizing by hour (Fig. 1a–d). Heat production also varied nonlinearly over time. Other device based measures of energy expenditure showed similar patterns as heat production. Therefore, we focus our report on modeling heat production.
Linear Mixed Effects Models
4.1
We estimated twelve separate LMEMs (Tables 1,2,3). The low and high dose groups did not differ significantly from the control group in heat production in any model. Models with a cubic term for time fit the data better than models with a quadratic term, which fit the data better than models with a linear term (see also Figs. 2a–c,3a–c). Also, models with log-transformed heat production fit much better than those with untransformed heat production. Furthermore, models of hourly mean heat production fit better than models of raw heat production at the scale of minutes, although the parameter estimates of paired models (differing only in time units) were very similar. Because coefficients for time terms and their standard errors were often close to the lower bound of zero, inference for these parameters may be inaccurate.
Semiparametric Mixed Effects Models
4.2
Truncated Power Basis Functions
4.2.1
The TPBF models fit the data substantially better than the LMEMs (Table 4; Figs. 2d–f,3d–f). The linear spline TPBF model fit raw heat production at the scale of minutes best, while the quadratic spline model fit hourly mean heat production best. As in the LMEMs, there were no statistically significant treatment effects in any TPBF model. Also, TPBF models of hourly mean heat production fit better and had lower AIC values when compared to the AIC values for the analyses conducted at the minute-levels. For the cubic spline models, the higher order terms for time (quadratic and cubic) were not statistically significant. However, both the linear and quadratic terms for time were statistically significant quadratic spline models in the paired models that differed only in time scale.
B-spline Basis Functions
4.2.2
The B-spline SMEMs (Table 5; Figs. 2g–i,3g–i) fit the data better than the TPBF models and LMEMs. As with all of the other models, there were no statistically significant treatment effects in the B-spline models. Also, B-spline models of hourly mean heat production fit better and had lower standard errors of coefficients than models of raw heat production at the time scale of minutes, although the patterns of coefficients for paired models were similar. The quadratic B-spline analyzed at the hourly level performed the best of all models we estimated for both mean hourly untransformed heat production and untransformed heat production at the time scale of minutes. Time was not statistically significant in the linear spline models for the analyses performed at the hourly and minute levels. However, the linear and quadratic terms for time were statistically significant in the quadratic spline model performed on a the hour level time scale. The quadratic and cubic terms for time were not statistically significant in the cubic spline models.
Discussion
Mixed effects models are useful for analyzing repeated measures data. However, with relatively noisy data such as device-based heat production data, variance parameters might not be well estimated. The semiparametric models, especially those with B-splines, approximated the nonlinear patterns in the untransformed heat production data better and thus had substantially higher predictive power than the parametric models. Another advantage of the semiparametric mixed effects modeling approach is that it does not require transforming the outcome variable (e.g., log transformation of heat production in our LMEMs) to make the data approximately normal and improve model fit. In analyzing energy expenditure data collected by devices, the first step is to evaluate plots of energy expenditure against time in minutes. If there appears to be a considerable amount of random noise in the plots, summarizing the data into longer time periods, such as hours, will reduce the random variation due to the frequency of data collection. If the data represent a high dimensional curve over time, rather than a linear function, we recommend semiparametric mixed effects models with smoothing splines for analysis.
In this manuscript, we demonstrated the use of semiparametric models to analyze noisy high dimensional data frequently collected by devices in epochs of 60-seconds over multiple days. A common approach to analyzing these data is to summarize the data into an overall summary such as overall heat production observed over a given week. In our previous analysis of these data [11], we summarized the data to the hourly level and used parametric linear mixed effects models to assess the effects of oral supplementation of interferon tau on device-based measures of energy expenditure. Our analysis included an interaction term between time and the treatment levels. The overall test for the interaction between treatments and time was not statistically significant at the 5% significance level. However, when separate analysis were conducted by each hour of observation to determine the treatment effects at each hour, we observed that the relationship between interferon tau treatment and measures of energy expenditure such as heat production depended on time and that the differences between the animals on the higher doses of interferon and the lower doses depended on time. A limitation of the current study is the sample size. The use of semiparametric methods in assessing treatment effects require larger sample sizes. Our findings have important implications for statistically analyzing data from experimental and clinical studies regarding effects of nutrition (e.g., dietary intakes of amino acids [31]) on improving metabolic profiles and health in animals and humans.
Conclusions
With the rise in complex data frequently collected from devices such as the Oxymas instrument, we recommend summarizing the data from units of time in minutes to hourly or half-hourly measures to reduce the noise associated with the frequency of data collection. The use of semiparametric regression methods provide more flexible modeling approaches to analyzing these data compared to parametric methods based on polynomial mixed effects models.
Supplementary Material
EE_analysis_K_AIC_Github
EE_analysis_Github
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Wyatt SB, Winters KP, Dubbert PM. Overweight and obesity: prevalence, consequences, and causes of a growing public health problem. The American Journal of the Medical Sciences. 2006; 331: 166–174.16617231 10.1097/00000441-200604000-00002 · doi ↗ · pubmed ↗
- 2Hamilton MT, Hamilton DG, Zderic TW. Role of low energy expenditure and sitting in obesity, metabolic syndrome, type 2 diabetes, and cardiovascular disease. Diabetes. 2007; 56: 2655–2667.17827399 10.2337/db 07-0882 · doi ↗ · pubmed ↗
- 3Fernández-Sánchez A, Madrigal-Santillán E, Bautista M, Esquivel-Soto J, Morales-González A, Esquivel-Chirino C, Inflammation, oxidative stress, and obesity. International Journal of Molecular Sciences. 2011; 12: 3117–3132.21686173 10.3390/ijms 12053117 PMC 3116179 · doi ↗ · pubmed ↗
- 4Abelson P, Kennedy D. The obesity epidemic. Science. 2004; 304: 1413.15178768 10.1126/science.304.5676.1413 · doi ↗ · pubmed ↗
- 5Poehlman ET. A review: exercise and its influence on resting energy metabolism in man. Medicine and Science in Sports and Exercise. 1989; 21: 515–525.2691813 · pubmed ↗
- 6Levine JA. Measurement of energy expenditure. Public Health Nutrition. 2005; 8: 1123–1132.16277824 10.1079/phn 2005800 · doi ↗ · pubmed ↗
- 7Donahoo WT, Levine JA, Melanson EL. Variability in energy expenditure and its components. Current Opinion in Clinical Nutrition and Metabolic Care. 2004; 7: 599–605.15534426 10.1097/00075197-200411000-00003 · doi ↗ · pubmed ↗
- 8Tudor-Locke C, Leonardi C, Johnson WD, Katzmarzyk PT, Church TS. Accelerometer steps/day translation of moderate-to-vigorous activity. Preventive Medicine. 2011; 53: 31–33.21295063 10.1016/j.ypmed.2011.01.014 · doi ↗ · pubmed ↗
