A novel algorithm for detecting human circadian rhythms using a thoracic temperature sensor

Article history: Received: 07 May, 2017 Accepted: 29 June, 2017 Online: 19 July, 2017 Circadian rhythms undergo high perturbations due to cancer progression and worsening of metabolic diseases. This paper proposes an original method for detecting such perturbations using a novel thoracic temperature sensor. Such an infrared sensor records the skin temperature every five minutes, although some data might be missing. In this pilot study, five control subjects were evaluated over four days of recordings. In order to overcome the problem of missing data, first four different interpolation methods were compared. Using interpolation helps covering the gaps and extending the recordings frequency, subsequently prolonging sensor battery life. Afterwards, a Cosinor model was proposed to characterize circadian rhythms, and extract relevant parameters, with their confidence limits. A divergence study is then performed to detect changes in these parameters. The results are promising, supporting the enlargement of the sample size and warranting further assessment in cancer patients.


Introduction
Most biological processes follow circadian rhythms, with the standard period of about 24 hours. These 24-hour rhythms are driven by circadian clocks, which have been observed in plants, fungi, cyanobacteria, and humans [1,2]. In mammals, each cell contains molecular clocks, whose coordination is ensured by the suprachiasmatic nuclei in the hypothalamus through the generation of rhythmic physiology. Such a Circadian Timing System (CTS) regulates rhythmically cellular metabolism and proliferation over the 24 hours.
The body temperature follows circadian rhythms, that are tightly related to cancer development [3] [4]. Indeed, they play an important role in the coordination of molecular circadian clocks in various peripheral organs, such as lung, liver, kidney, and intestine, and could have an effect on tumors through their regulatory effects on Heat Shock Factor (HSF), Heat Shock Proteins (HSPs), and Cold-Induced Proteins [4,5]. Moreover, it has been demonstrated that giving chemotherapy at an accurate circadian timing improves tolerance up to fivefold and to almost doubles antitumor efficacy, compared to constant rates or wrongly timed administrations, in both rodent models and cancer patients [6] [7]. In the same way, the destruction of the suprachiasmatic nuclei suppressed any circadian rhythm in body temperature in mice, which causes a 2-3 fold acceleration of experimental cancer progression [8]. On the other hand, the temperature circadian rhythms could be disrupted by anticancer medications, and molecular clocks are impaired, as a function of dose and circadian timing in mice [9,10]. Therefore, the circadian rhythm of body temperature is a useful biomarker of CTS function [11] [12].
The relevance of monitoring core body temperature for CTS assessment has been approved in previous works, however they all suffer from a lack of non-invasive screening tools, which has limited testing in cancer patients [13,14]. To overcome such a problem, one could use the skin temperature, which is correlated to core temperature. However, changes in skin temperature display an opposite pattern compared to that in core body temperature, but with a similar circadian rhythm. sampling frequency with this sensor, that is, one point every five minutes. Using such a low frequency is possible since temperature varies slowly over time. Also, a minimal memory is required and the battery life is extended, due to reduced energy consumption for the measurement device. However, having a low signal frequency produces more difficult complications. Moreover, sensor malfunction or a subject not wearing it for a period of time causes missing data in the signals.
This paper proposes an original algorithm to analyze the circadian rhythms in skin temperature signals. It first develops an interpolation algorithm to set the signals to a higher frequency such as one sample per minute, and to fill any gap in the signals [15]. For this reason, a kernel-based machine learning algorithm is presented and compared to other classic interpolation methods. Afterwards, a rhythmometric modeling, using Cosinor models, is proposed. It aims to extract parameters such as the MESOR, the amplitude, the orthophase and the bathyphase from interpolated signals. The paper then proposes a divergence study over these features to detect changes caused by chemotherapy. Five healthy subjects' temperature signals are involved in this study, as a prerequisite for subsequent investigations involving a larger number in healthy controls and cancer patients.

Subjects and materiel
In this study, three female and two male control subjects, aged 45.2 ±13.6 years, are considered. They were given a detailed description of the objectives and requirements of the study before the experiment, and they read and signed an informed consent prior to testing. The infrared Movisens (GmbH -move III) sensor was positioned on the thorax of the subjects to monitor their skin temperature for four days. It measures 5.0 x 3.6 x1.7 cm 3 , and weighs 32 g. The sensor is composed of a tri-axial acceleration sensor (adxl345, Analog Devices; range: ±8 g; sampling rate: 64 Hz; resolution: 12 bit) embedded with a temperature sensor (MLX90615 high resolution 16bit ADC; resolution of 0.02°C). The recorded data is saved on a memory chip inside the sensor and transmitted to a server via GPRS. A hypoallergenic patch has been used to maintain the sensor in the upper right anterior thoracic area of the subjects.

Interpolation of temperature signals
The temperature signals have a small frequency, with occasional missing data. Let ( ) 1≤ ≤ denote the temperature samples collected for a certain subject using the sensor, with in minutes. The aim of interpolation is to estimate a function ̂( . ), that computes the temperature at any time , and which verifies the following: Different algorithms for interpolation such as linear, polynomial or cubic splines techniques exist in the literature [15,16]. In the linear interpolation, temperature is represented by a straight line between any two consecutive collected measurements. Having ∈ [ , +1 ], for ∈ {1, … , − 1}, . This technique is easy to implement, but it yields several functions, one per interval between two consecutive measurements. Moreover, it is disadvantageous for large timeintervals due to the non-specification of the linear estimation.
In the polynomial interpolation, temperature is represented by a single polynomial function, which fits the measured data. By using Lagrange polynomials, one obtains the following function: where ′ ≤ is the degree of the obtained polynomial and ′ ⊆ {1, … , }. By taking ′ close to , the obtained function is specific, but with highly complex computations. Cubic splines is the most commonly used interpolation technique. It computes a set of piece-wise polynomial functions that maximize the smoothness of the whole curve. The -th splines function defined over the interval [ , +1 [, for ∈ {1, … , }, is given as follows: where: where ̂′ (. ) is the derivative of the temperature model ̂( . ). The values of the coefficients are then computed iteratively as shown in [2]. This technique is efficient, but proposes piecewise functions, that need iterative computations.
This paper proposes a kernel-based regression approach that generates a single function [17,18]. A training database is first constructed using the collected temperature measurements ( , ( )) =1,…, . Then a model is computed using this database, taking time as input and yielding temperature as output. This model is defined using the kernel-based ridge regression technique. The model is afterwards applied to other times, where temperature values are unknown, for interpolation. Consider a reproducing kernel defined from ℝ 2 to ℝ and denote ℋ its reproducing kernel Hilbert space. Some commonly used reproducing kernels are given in Table I. where the kernel parameters and are positive and is a positive integer. Then the temperature model is defined by minimizing the regularized mean quadratic error between the model's outputs and the measured data of the learning database: where is a regularization parameter that controls the tradeoff between the training error and the complexity of the solution and ‖. ‖ ℋ 2 is the norm in the reproducing kernel Hilbert space [19]. According to the Representer theorem, that all machine learning algorithms share, the minimization problem could be reduced to a more computation-friendly problem. Hence, the temperature model could be written as follows: where the coefficients are to be determined. Let denote the column coefficients vector whose -th entry is . By injecting the model expression in the minimization problem, one obtains a dual optimization problem whose solution is given by: where is a -by-matrix whose ( , )-th entry is given by ( , ), is the -by-identity matrix and is the temperature column vector, whose -th entry is given by ( ). Now that the model is defined for each temperature signal, the temperature value at a given time is estimated by ̂( ) = ( ).
The main advantage of the kernel-based approach remains in the fact that a single-function model is obtained, unlike the linear and cubic spline interpolations, where a piece-wise expression is obtained. In the following, and for simplicity, the notation ( ) is used for the estimated signal, having a value at each minute, obtained after interpolation.

Detection of rhythmicity
The detection of rhythmicity is usually performed in the frequency domain . The spectral analysis using the "Fourier transform" is a well-known study to do this . In this analysis, any signal, regardless of its shape and properties, can be represented by a complex function of frequency that highlights the frequencies that make it up. By applying the inverse Fourier transform, the signal is then decomposed into an infinite sum of sine and cosine functions of infinite frequencies [22]. The signals could be deterministic such as periodic/non-periodic or random such as stationary/nonstationary. A similar analysis for periodic signals is the Fourier series analysis, which represents a function as a sum of sine and cosine functions of different frequencies.
In this paper, an algorithm based on Fourier analysis is first proposed for frequency and harmonic detection. This algorithm starts by estimating the fundamental frequency, the fundamental amplitude and phase, and the harmonic amplitudes and phases, to evaluate the periodogram. The term periodogram was introduced by Schuster in December 1934 when Fourier analysis was used to estimate periodicity in meteorological phenomena . The technique was evaluated for the first time when inspecting circadian rhythms in the 1950s to measure circadian rhythms of mice after blinding . The periodogram showed that periodic signals have a frequency spectrum consisting of harmonics. For instance, if the time domain repeats at f, the frequency spectrum will contain a first harmonic at f, a second harmonic at 2f, a third harmonic at 3f, and so forth. The first harmonic, which is the frequency at which the time domain repeats itself, is called the fundamental frequency, and has the highest amplitude. Periodograms and spectral density were originally used in chronobiology in the 1960s [25].
In order to set the periodogram, a suitable window must be applied to the signal, to reduce side-lobes. For the proposed algorithm, a normalized Hanning window has been chosen [26] since this window does not disturb the position of spectral peaks in the spectral density, although the amplitude is decreased and the peak is larger. Having the periodogram and thus the fundamental and harmonic frequencies, the temperature signal is then modeled, using the Fourier series as follows: where is the angular frequency i.e. = 2 , i.e. is the fundamental period (duration of one cycle) and is the number of the considered harmonics with the fundamental frequency.
With respect to circadian rhythms, the rhythm persists in constant conditions with a period of around 24-hours, i.e. = 24 * 60 = 1440 minutes. takes values from 1 to infinity. The higher is, the better the model ̃( ) fits the observed value ( ). The parameters , ℎ and ℎ could be computed using the computations of Fourier series. The main advantage of this technique is that one is able to determine the exact fundamental frequency, with its following ones, by analyzing the periodogram, obtained with the Fourier transform. However, it needs the data to be equidistant and to cover more than a single cycle, otherwise the analysis would be erroneous.
One interesting method used for analyzing unequidistant and time-limited observations is the single Cosinor procedure. This study was developed to evaluate rhythmicity of unequidistant data series [27] [28], and is frequently used in the analysis of biologic time series that have expected rhythms. Cosinor uses the least squares method to fit a sum of sine functions to a time series, as least squares procedures do not have an equidistant data limitation. Practically, it considers the Fourier series model of (8) with a precise fundamental frequency, which corresponds to 24-hours, and number of harmonics and then computes the parameters , ℎ and ℎ so as to minimize the error between the signal ( ) and the model ̃( ). Let ( ) be the residual corresponding to the value ( ), that is, ( ) = ( ) −̃( ), and consider the modeling error as the sum of the squared residuals ( ) for all the data, that is, The parameters , ℎ and ℎ are then obtained using least squares by setting the derivatives of the over each parameter equal to zero. Since the temperature follows a circardian rhythm, then the fundamental frequency corresponds to 24h. Figure 1 shows an example of a Cosinor model obtained with = 4. The obtained model will subsequently be used to compute significant rhythmometric parameters, as shown in the following paragraph.
Once the temperature signals are modeled, using either Fourier series or Cosinor, some features are extracted from their sinusoidal representation, as will be shown in the following paragraph.

Features selection and divergence study
Once the temperature signal is decomposed into cumulative sine functions, using Fourier series or Cosinor, the objective here is to determine the sinusoidality of the data. This requires the extraction of some features from the models by inspecting their graph plotted against time, as shown in Figure 1. When the model is composed of more than one trigonometric function, that is, the fundamental period and some harmonics, four features could be extracted [29]:  the MESOR (M), for "Midline Estimating Statistic Of Rhythm", that is the mean of the model,  the amplitude (A) that is defined as the half of the difference between the maximum and the minimum of the model in one fundamental period,  and, finally, the phases of the maximum and the minimum of the composite model including harmonic terms, which are called the orthophase ΦO and bathyphase ΦB respectively. Figure 1 illustrates these features. For a control subject, these rhythmometric parameters vary slightly over time; whereas chemotherapy could produce significant modification in the circadian rhythm, which yields a divergence of one or more rhythmometric parameters. In order to detect this divergence, a sliding window algorithm is considered over the temperature signal ( ) of a period of several days.
Then, for each window, the signal is modeled using the cumulative sine functions, and the four rhythmometric parameters are extracted. A statistical test, such as an exact Fisher test, Wilcoxon test or another, is then applied to check whether the values through the window diverge from their previous values. This study is motivated by the perturbation of the circadian rhythm due to chemotherapy, which induce a divergence of the statistical distributions of the extracted rhythmometric parameters.

Results
This section starts with the illustration of the effectiveness of the interpolation techniques. To this end, the collected temperature signals of two subjects out of the five are considered. These two have a frequency of one sample per minute over four days, whereas the remaining are measured with a rate of one sample every five minutes. An example of a skin temperature signal while wearing the IR sensor for four days, for a control pattern with an expected pattern is shown in Figure 2.  In order to compare the interpolation approaches, the signals of the two subjects were divided into segments of 1440 minutes, i.e. one-day length, leading to eight segments. Then, these segments are resampled by taking one point every five minutes, leading to segments of 288 points. Thereafter, the linear, polynomial, cubic spline and kernel-based interpolation techniques were applied over these segments and the results are compared to the original observed signal. For the kernel-based interpolation, a Gaussian kernel was used, using a crossvalidation algorithm to select the optimal values of the bandwidth σ and the regularization parameter η according to the learning data . The mean errors for the interpolations are shown in Table II. These were computed by averaging the absolute difference between the observed and the estimated signals for the eight segments. In order to simulate the missing data, a 30-min segment was subsequently removed from the oneday length segment. This 30-min segment corresponds to six consecutive points among the 288-point segments. Then, two and three 30-min segments are removed. These segments were randomly selected within each one-day length segment. Table II shows the results for segments with no missing data (0 segment removed), one 30-min segment removed, two segments removed, and three segments removed. It is worth noting that simulations are performed 50 times and errors are averaged over all results, since the removed segments are selected randomly. The table shows that the Kernel-based algorithm yields better results with less estimation error in all cases, which is expected due its malleability and its adaptation to the curve, even under non-linear conditions. The measured temperature signal for a typical subject for a five-hour period is shown in Figure 3. The figure also shows the interpolated signal obtained with the kernel-based interpolation. The plot shows that the computed signal is close to the measured one, with a smooth curve produced.
Then, the objective is to illustrate the rhythmometric modeling techniques, which use the Fourier series or Cosinor. To this end, we consider the 5 temperature signals over a 4-day period, having passed the interpolation phase, i.e. the signals have no gaps, with a rate of one sample per minute. We start by applying the Fourier series, taking the fundamental period to be equal to 24 hours, i.e. 1440 minutes, followed by 3 harmonics, i.e. 12 hours, 8 hours and 6 hours. We consider that the modeling error is the average of the absolute differences between the 4-day temperature signals and their modeled ones for the 5 signals. In this case, the modeling error of the Fourier series technique with a 24-hour period is equal to 0.77. We then apply the Fourier transform to the signals to obtain their periodograms. These computations showed that the local maxima of the spectral energy are not obtained exactly at the frequencies 1/24, 1/12, 1/8 and 1/6, but very close to them. is very close to 24 hours. If we take and the following three local maxima, and perform the inverse Fourier transform, the modeling error decreases to 0.73, which was expected since this way more information is considered in the modeling. Having the fundamental period not equal to 24h is related to the fact that only four cycles (4 days) are considered, which is not enough for Fourier analysis. Afterwards, by applying Cosinor computations, while taking the fundamental period equal to 24h, and considering the following 3 harmonics, the modeling error is the least, being equal to 0.53. This shows the power of such a method with limited-duration signals.   Table III shows their mean values in degrees with the standard deviations over the 5 signals.
Here the parameters are extracted from the rhythmometric models obtained for the whole 4-days signals. As expected, considering the fundamental period from the periodogram, modeling leads to a difference in the characteristics equivalent to 30 minutes, whereas both Fourier series and Cosinor lead to close values. This study shows that the Cosinor computations have promising results, to be validated with more signals later on.

Discussion
This paper proposed a longitudinal evaluation of skin temperature measurement, collected using a thoracic sensor. The collected signals have a sampling frequency of one point every five minutes, with a length of four days for control subjects. At a first phase, we performed interpolation, using a kernel-based machine learning technique, to resample the signals to a higher frequency of one point every minute and to fill in the gaps in the measured signals. This step is crucial for the following data processing techniques. A rhythmometric study is then followed up on resampled signals, using either a Fourier representation or Cosinor. A Fourier analysis with the Fourier transform and the Fourier series is first proposed to model the signals, then a Cosinor model is constructed using Fourier analysis and least squares computations. The advantage of Cosinor remains in its robustness against un-equidistant and low-duration data, which is not the case for Fourier analysis. Rhythmometric parameters such as MESOR, amplitude, orthophase and bathyphase, are then extracted using the obtained model. For a given temperature signal of a given subject, such computations are performed for each sliding window of the signal, leading to a set of random variables that are the rhythmometric parameters. Then a divergence study is applied to detect any perturbation of the rhythm. In fact, chemotherapy can alter the circadian rhythm of the patients, therefore it is expected in that case to observe a divergence of the distributions of the rhythmometric parameters between the time span on chemotherapy and that before or after it. To detect such divergence, we can separate each parameter values into two sets (on or off chemotherapy), then apply a statistical test to detect the divergence between their distributions. This study is promising since chemotherapy-induced perturbations have been documented for over 15 anticancer drugs in experimental models, and for several drug combination protocols in cancer patients [6 14 15].
The kernel-based interpolation phase was validated using different cases of missing data in the signals, and by comparing them to well-known interpolation methods such as linear, polynomial and cubic splines. A comparison between the rhythmometric modeling techniques is then conducted, showing that Cosinor analysis leads to fewer modeling errors. Having only five signals with a 4-day duration is not enough to draw conclusions, however the results are promising for future works. Thus a larger study involving more healthy controls and  cancer patients is planned to further determine the relevance of the methodology developed here and the minimum number of harmonics. It should be possible to evaluate the effect of harmonics such as 12h, 8h, and 6h, and even to use only specific harmonics of varying frequency in order to determine the exact moment when the circadian rhythm is modified. Advanced processing such as the goodness of fit and the rhythm detection will also be applied on the signals using the F-test. Moreover, an evaluation of the IR sensor is also to be done, to verify if the observed temperature with such a sensor remains a mirror of the central temperature.