The specific procedures were as follows:. In the average spectrum, the minimum absorbance value was greater than and close to 0, and the maximum absorbance value was less than and close to 5.
Therefore, the A min and A max values were set as 0 and 5, respectively. The corresponding waveband combination was 1,—1, and 1,—1, nm. Sketch map of the relationship between wavelength and absorbance in the average spectrum. The NIR spectra of all human serum samples in the entire scanning region —2, nm are shown in Figure 3A. The saturation region with high absorption was mainly located near nm, whereas the low absorption region was mainly located on the left side of nm.
The full-PLS models based on the entire scanning region —2, nm were established. The R P,Ave values were 0. The results showed a low correlation between the NIR predicted values and the measured values of the conventional method using the spectroscopy data without pretreatment. NIR spectra of all human serum samples A raw spectra, and B 1st derivative spectra. TABLE 1. The spectral data were preprocessed with SG smoothing and then the modeling was performed. The parameters of SG smoothing include order of derivatives d , degree of polynomial p , and number of smoothing points m , odd.
The corresponding first derivative spectra are shown in Figure 3B , wherein the baseline drifts of the spectra significantly decreased. The R P,Ave values were improved to 0. The corresponding parameters I , N , and F and the prediction effects are summarized in Table 2.
The corresponding wavebands were 1,—1, nm for TC and 1,—1, nm for TG. R p ,Ave greatly increased to 0. TABLE 2. The corresponding prediction effect and parameters for the PLS models are summarized in Table 3. TABLE 3. The transmittances ranged from The corresponding waveband combinations based on the SG derivative spectra were 1,—1, and 1,—1, nm for TC and 1,—1, and 1,—1, nm for TG.
TC and TG avoid extremely high or low absorption wavebands of the spectra, which correspond to a high quality of information content and a low level of noise. TABLE 4. It was observed that the optimal waveband combinations for TC and TG were basically the same, and the combination of TG 1,—1, and 1,—1, nm completely covered the combination of TC 1,—1, and 1,—1, nm. Therefore, the optimal waveband combination of TG can be used for the high-precision analysis of the two indicators simultaneously.
These local optimal solutions corresponded to the case where only the saturation region with high absorption was eliminated. Similar works can be found in previous studies [ 12 , 17 , 21 , 23 ].
However, compared with the global optimal solution, the predictive performance of the local optimal solution was poor. This outcome showed that only the optimization of the absorbance upper bound is insufficient. These local optimal solutions corresponded to the case where only the low absorption region was eliminated.
However, compared with the global optimal solution, the predictive performance of the local optimal solution was even poor, which showed that only the optimization of the absorbance lower bound is insufficient. The local optimal models can also be used as valuable references. The instrument design typically involves some restrictions in the position and number of wavelengths e.
In some instances, the demand of the actual conditions cannot be satisfied by the optimal model. Therefore, some local optimal models with prediction effects close to those of the global optimal model remain a viable option. The corresponding selections of waveband combinations were also determined easily; the modeling effects were close to the optimal model. Figures 4C,D present similar results for TG, but the relevant discussion was omitted due to the limitation in article length.
The PLS regression coefficients were determined using the SG derivative spectra and measured reference values of the modeling samples depending on the corresponding parameters. The three methods for the two indicators demonstrated acceptable prediction accuracy and high correlation for the clinically measured values.
The prediction effect of NIR analysis was then evaluated from the criteria of sensitivity and specificity. Using SPA, the numbers of true positive a , false negative b , false positive c , and true negative d samples are 45, 6, 3, and 48, respectively; the sensitivity and specificity were Figure 6 shows the 2D diagram of NIR predicted values of the validation samples classified as negative and positive for hyperlipidemia using the two methods.
The results confirmed the feasibility of hyperlipidemia screening with NIR spectroscopy. The high water content of the serum samples can lead to saturated absorption and noise interference. TC and TG are lipid compounds. The results show that the predicted effects of TC and TG are not affected and evidently improved after eliminating the saturated absorption bands of water.
If the water content of samples was measured, then a shorter optical path length could be used to avoid saturated absorption. It is meaningful that the cooperativity model can detect two indicators at the same time. Figure 2, below, is a plot of the water absorption on a linear scale. The lower plot is a magnification of the region between nm and nm. There is a small peak at , but it goes up to a maximum of 0. The absorption is dominated by melanin in the skin.
Both of these wavelengths are in the therapeutic or optical window and are effective for laser therapy. Since melanin absorption dominates at lower wavelengths, nm is better for maximal penetration for skin with melanin. Figure 2 Data generated from Hale and Querry. Figure 3 shows a plot of the various absorption coefficients as a function of wavelength on a linear scale. This shows the dominance of melanin absorption compared to hemoglobin, oxyhemoglobin, water and fat. Therein, the spectra covered the wavelength range of — nm and were collected in transmittance mode.
Three blood samples were collected from each of the five participating subjects during an exercise study. Each sample was spiked twice with lactate to produce a wider range of lactate variations in the dataset; yielding 45 samples with different lactate concentrations. Following this, the in-vivo application of this method was carried out within the — nm spectral range Forty samples were collected from 10 subjects in an excericse study and their spectra were used for training and cross-validation.
This clearly highlights the importance of baseline differences between subjects, both in blood composition and in the optical properties of the skin. While the aforementioned studies clearly demonstrate the potential of multivariate modelling and optical spectroscopy in achieving a reagent-free and potentially non-invasive measurement of lactate, to date, a direct and accurate optical lactate sensor has not yet been developed.
The authors believe that a comprehensive analysis of the optical properties of lactate and the identification of wavelengths that are most indicative of its concentration can play a crucial role in achieving this aim. A data-driven approach to fulfil this objective is the use of variable wavelength selection methods.
These methods offer two clear benefits. Firstly, it has been shown that the inclusion of uninformative wavelengths in the training process negatively affects the accuracy of predictions and model interpretability 19 , 20 , 21 , Secondly, from a more practical point of view, the identification of a few wavelengths, or regions of the optical spectrum, that contain information about chemical species, significantly reduces the time and cost associated with their measurement and enables the development of portable and high-speed optical sensors.
In spite of these benefits, no study in the literature has investigated the application of wavelength selection methods for the estimation of lactate. To this end, this study conducts a comprehensive analysis of the application of wavelength selection methods on a set of spectra that were obtained by controlled variation of Na-lactate in a Phosphate Buffer Solution PBS. PBS was chosen to firstly ensure minimal chemical changes that may be correlated with the concentration of lactate, and secondly to facilitate the interpretation of results.
The rest of this article is organised as follows. Section 2, highlights the importance of lactate as a fundamental biomarker. This section is, then, concluded with a brief on spectroscopic data and quantitative methods for their analysis.
Section 3 describes the lactate and PBS dataset. Section 4 provides a summary of three classes of wavelength selection methods, describes the four methods investigated in this study, outlines their limitations, and presents the new GA algorithm. Finally, Sect. Highlighting the relationship between lactate and Adenosine TriPhosphate ATP , known as the universal energy currency of cells, helps demonstrate the importance of lactate as a remarkable biomarker.
Transcending human biology, ATP is seen in all living organisms. It provides cells with easily accessible energy to fuel various processes such as biosynthesis, metabolism, DeoxyriboNucleic Acid DNA synthesis, muscle contraction, transport of ions and impulse transition in the nervous system The storage of easily accessible bioenergy in ATP is the outcome of cellular respiration. In this process, carbohydrates are broken down into simple sugars and these simple sugars are further processed to produce ATP.
In animals, including humans, glucose is the most common type of such sugars that can be metabolised by cells. Glycolysis is an important ATP producing system in living organisms. In glycolysis, cells break down a molecule of glucose and extract a net of two ATP and two molecules of pyruvate.
This process does not require oxygen anaerobic and can produce ATP at a very fast rate. For instance, in an intense maximal workout, glycolysis may become an important energy production mechanism to fuel the body, but only for a few minutes.
In the absence of a sufficient supply of oxygen, pyruvate is converted to lactate to help maintain glycolysis for longer periods. In the presence of oxygen, pyruvate, together with fatty acids and amino acids, are eventually reduced to hydrogen and carbon dioxide in a chain of reactions that produce around 28—30 additional ATP per molecule of glucose.
The whole system is collectively referred to as aerobic respiration and includes 4 stages; glycolysis, the link reaction, the Krebs cycle, and oxidative phosphorylation The completion of an aerobic respiration cycle, takes a longer time than glycolysis alone, up to times.
But of course, among other things, aerobic respiration requires sufficient supplies of oxygen. Therefore, higher than normal levels of glycolysis, and consequently, higher levels of lactate may be observed in conditions where the oxygen supply to cells is restricted or when cells need faster than normal deposits of energy relative to what aerobic respiration can supply.
This brief introduction highlights the importance of lactate as a fundamental biomarker that sheds light on the energy consumption patterns of the body. What follows provides a brief overview of the role of lactate as a prognostic and diagnostic measure in a wide range of diseases. Lactate is a valuable biomarker in understanding diseases both at the cellular and physiological levels.
From a cellular perspective, as described in Sect. From a physiological perspective, some organs are -mainly- lactate consuming; for instance, it has been suggested that lactate may be an efficient fuel for the brain. Some organs are lactate clearing; for instance, the liver processes excess, non-metabolised lactate and converts it back to glucose in the Cori Cycle.
Finally, some organs are lactate producing such as skeletal muscles and some are lactate clearing such as the kidney. The kidney helps clear excess lactate in hyperlactatemia. The lactate shuttle theory describes a dynamic, pH-dependant flow of lactate between organs Therefore, a major, unexpected disruption in these dynamics can be an early sign of disease.
While such dynamics are not fully understood, the relationship between lactate and many diseases is well-documented. Insufficient delivery of oxygen to tissue hypoxia or insufficient perfusion of the tissue ischemia can have drastic consequences. The most significant tissues that are commonly affected by hypoxia or ischemia are cardiac and cerebral tissues.
Severe cardiac or cerebral ischemia can cause irreversible and potentially fatal conditions, namely myocardial infarction and acute ischemic stroke. As mentioned in Sect. Therefore, monitoring lactate levels can contribute to improved diagnosis and prognosis in these conditions 3 , 5 , Similarly, different types of shock, caused by inadequate cardiac pump function cardiogenic shock , severe infection septic shock , obstruction of the vessels obstructive shock , and decrease in blood volume hypovolemic shock are associated with hyperlactatemia 6 , Increased lactate levels have been observed in patients with lung injury 8 , increased white blood cell activity 27 , and reduced lactate clearance capability by the liver and kidney Lactate has been shown to be an important diagnostic marker of generalised tonic—clonic seizures Increased lactate level has been linked to the progression of insulin resistance in diabetic patients 9.
Lactate has also been described as a key regulatory element in response to stress 30 , Cancer is another life-threatening disease that causes elevated lactate levels. It is known that cancer cell metabolism is different from normal cells, and demonstrate an abnormal conversion of glucose to lactate, even in the presence of oxygen aerobic glycolysis 9 , 10 , 11 , A sudden change in the balance between glycolysis and oxidative metabolism can also take place as a result of important metabolic changes Given this overview, it is not surprising that there is overwhelming evidence that underlines the relationship between increased lactate levels and increased morbidity and mortality in critically ill patients.
This further highlights the importance of monitoring lactate in the early resuscitation of critically ill patients 1 , 2 , These subjects have been more extensively explored in the litrature 26 , 33 , 34 , 35 , In spectroscopic applications, commonly one is interested in analysing the optical properties of a molecule over a wide wavelength interval. In quantitative analysis, the target may be finding a mathematical model that maps such spectra to the concentration of an analyte. Firstly, often many of the wavelengths in the spectra contain collinear, redundant or uninformative information and both PLS and PCR can adequately eliminate the negative effects of such variables.
The resulting variables are known as latent variables. Secondly, based on the Beer-Lambert law, there is a linear relationship between the optical absorbance of an absorbing species and its molar concentration, when the concentration is low. However, PLS usually achieves the same level of accuracy with less latent variables. Moreover, apart from the first two investigated methods, i.
PLS can be described using Eqs. The choice of the number of latent variables can determine overfitting or under-fitting. While different criteria for the selection of the number of latent variables have been proposed in the literature, a common approach is to plot the Predicted Residual Error Sum of Squares PRESS against the number of latent variables and choosing the point where the PRESS plateaus This section describes the spectra collection procedure and has previously been comprehensively described The lactate increments are 0.
The lactate solutions were run at random to prevent any potential temporal bias. Figure 1 a shows the raw absorbance spectra, Fig. Moreover, from a more practical point of view, the identification of the specific wavelengths, or regions of the optical spectrum, that contain information about chemical species, significantly reduces the time and cost associated with monitoring them and facilitates the development of portable, high-speed sensors.
What follows provides a brief overview of some wavelength selection methods. These methods are briefly described below. Subsequently, some of the well-known methods from each group are applied to the lactate dataset and the results are compared. Throughout this article, in order to ensure that validation sets are sufficiently representative of the dataset, they are chosen in the following way. The validity of models is assessed using cross-validation and an independent test set.
Due to the need for manual intervention in some cases as well as the computationally demanding nature of some of the investigated wavelength selection methods, it is impractical to perform wavelength selection within each cross-validation loop. Therefore, cross-validation is performed on the models with selected wavelengths. This is expected to bias the cross-validation results and underestimate RMSE due to data leakage.
RMSEP is given by,. Garrido Frenich et al. Moreover, a large coefficient can also represent a variable with small absolute value and large variance.
In order to mitigate the error that could be caused by such variables, each variable is scaled with the inverse of its standard deviation. In particular, these values can exhibit drastic changes in successive training with resampled data. To demonstrate this, the histograms of three coefficients obtained with MCCV for the lactate dataset are depicted in Fig.
Centner et al. These estimates are obtained using a resampling method, for instance, leave-one-out. The reliability values below a certain threshold may be deemed uninformative and, therefore, unselected. The optical spectra have locality features. In other words, the information about the concentration of chemical species is focused on certain regions of the optical spectra.
The incorporation of this assumption into the search strategies yields a different class of wavelength selection techniques. A simple way to achieve this is to split the spectra into equidistant intervals and treat each interval as a variable. As a result, the dimensionality of the space of variables can be significantly reduced, from thousands of variables to tens of intervals.
This has the additional benefit that many classical techniques such as Forward Selection FS and Backward Elimination BE can be applied to these intervals. The number of intervals is a key parameter that could have major implications on the outcome. In this study, two values for the number of intervals were used, 20 and While iPLS can provide a good representation of the informative regions of the optical spectrum, the two requirements of equidistant intervals and the predetermined number of intervals are restrictive.
As a result, there is very little chance that the optimal interval can be found using this approach. Moving Window PLS MWPLS relaxes these requirements by selecting an interval with a predetermined length at one end of the spectrum and then moving the centre of the window one wavelength at a time Multiple models with different complexities PLS components are developed for every interval.
If the SSR is relatively high or if it continues to significantly improve with the addition of PLS components, this suggests that these components are used to model inherent uncertainties, and therefore, the interval is not informative. In summary, according to this criterion, a good interval is an interval that obtains low SSRs with only a few components and the addition of extra components does not significantly improve the results.
A different approach is the use of heuristic global optimisation methods to find a suboptimal combination of variables. Genetic algorithms, a class of biologically-inspired, evolutionary algorithms have found many applications engineering, biomedicine, chemometrics, genomics, and spectroscopy due to their ability in solving complex, nonlinear optimisation problems.
Many variations and different implementation of this method can be found in the literature 49 , 50 , 51 , 52 , 53 , 54 , Genetic algorithms are inspired by Darwinian selection, i.
The algorithm begins with a random population of chromosomes or candidate solutions. In a variable selection problem, each chromosome can be coded as a bit-string, where a one represents selection and a zero represents deselection.
Each chromosome is then assigned a fitness value or an unfitness value in a minimisation problem. Candidate solutions chromosomes that produce better results will have a higher likelihood of passing on their genes to the next generation selection , recombining their -likely good- genes with other fit genomes crossover , and finally passing on their genes to the next generation of genomes with some random mutation.
The last operation is necessary to ensure that the optimisation landscape will be explored and reduce the likelihood of getting stuck in local minima. This process will continue for a predetermined number of generations or until a convergence criterion is met. In the lactate dataset, this ratio is greater than 80, i. However, there are some measures that can mitigate this issue, such as using intervals rather than variables similar to iPLS or using a dependant validation set to stop the operations when the RMSEV starts to increase early stopping.
Therefore, in the proposed method we use the bit-string representation of wavelengths. The unfitness function is defined as MSEC, however, in order to reduce the likelihood of over-fitting three strategies are incorporated in the GA method;.
Firstly, in each run of GA, the training set is randomly split into a validation set and a training set. Secondly, the evaluation of the fitness function for each chromosome is done using MCCV with resampling iterations.
This layered design minimises the likelihood of overfitting since it ensures that the solution has produced good results across thousands of resampled datasets a in every fitness evaluation step and b along different generations. In the current study, only the average estimate of MSEC is used in the objective function, but a comprehensive analysis of different objective functions will be carried out in the future. Finally, the genetic algorithm is run times, and the importance of variables are calculated as the average number of times that a variable is selected.
Figure 3 depicts the results for one run of the proposed GA-based wavelength selection on the lactate dataset.
0コメント