#### Document Type

Dataset

#### Rights

Available under a Creative Commons Attribution Non-Commercial Share Alike 4.0 International Licence

#### Start Date

2021

#### End Date

2022

#### Abstract

Carotenoids are naturally abundant fat-soluble pigmented compounds, with dietary, antioxidant and vision protection advantages. The dietary carotenoids, Beta Carotene, Lutein and Zeaxanthin, complexed with in bovine serum albumin (BSA) in aqueous solution, were explored using Raman spectroscopy to differentiate and quantify their spectral signatures. UV visible absorption spectroscopy was employed to confirm the linearity of responses over the concentration range employed (0.05-1mg/ml) and, of the 4 source wavelengths, 785nm, 660nm, 532nm, 473nm, 532nm was chosen to provide the optimal response. After preprocessing to remove water and BSA contributions, and correct for self-absorption, a partial least squares model with R^{2} of 0.9995, resulted in a accuracy of Root Mean Squared Error of Prediction for Beta Carotene of 0.0032 mg/ml and Limit of Detection 0.0106mg/ml. Principal Components Analysis clearly differentiated solutions of the three carotenoids, based primarily on small shifts of the main peak at ~1520cm^{-1}. Least squares fitting analysis of the spectra of admixtures of the carotenoid:protein complexes showed reasonable correlation between norminal% and fitted%, yielding 100% contribution when fitted with individual carotenoid complexes and variable contributions with multiple ratios of admixtures. The results indicate the technique can potentially be used to quantify carotenoid content of human serum and to identify their differential contributions for application in clinical analysis.

#### Recommended Citation

Udensi, J., Loskutova, E., Loughman, J., & Byrne, H. (2022). Quantitative Raman Analysis of Carotenoid Protein Complexes in Aqueous Solution [Data set]. Technological University Dublin. DOI: 10.21427/CJS0-Y159

#### DOI

https://doi.org/10.21427/cjs0-y159

#### Methodology

*4.1 Sample** **preparation*

Beta Carotene, Lutein and Zeaxanthin powders were all purchased from Sigma Aldrich (Arklow, Ireland). Powders were dissolved in 40mg/ml BSA (Sigma Aldrich, Arklow, Ireland) to give a final concentration of 1mg/ml. The stock solutions were prepared using ultra-pure water (Millipore) and used immediately to prevent oxidation under light and air.

The carotenoids (Beta Carotene in particular) were seen to be poorly soluble in BSA solution, and mild sonication using a sonic VCX – 750 Vibra cell ultra-sonic processor (Sonics and materials Inc., USA), equipped with a model CV33 Sonic Tip sonication probe (20% amplitude for 10 seconds) was used to disperse the carotenoid solids more evenly in the BSA solute. To confirm linearity of the solubilisation of the carotenoids in BSA, a range of different concentrations of Beta Carotene were generated. Several dilutions were prepared in BSA with concentrations of Beta Carotene ranging from 0.05mg/ml to 2.0mg/ml while keeping the concentration of BSA constant.

Admixtures of the three different carotenoids were prepared by mixing BSA solutions of Beta Carotene, Lutein and Zeaxanthin in (B:L:Z) ratios reflective of physiological relevance [66,72] as follows: 100:40:20, 100:30:30, 100:50:10 and 100:20:40. The concentration of BSA was kept constant at 40mg/ml.

To make the carotenoid paste (carotenoid reference for Section 4.2.4), 20ul of ultrapure water was added to 1g of Beta Carotene powder and mixed until a thick paste was formed.

*4.2 Absorbance measurement*

UV-VIS absorption spectra were recorded in the visible range of 400 – 700nm using a plate reader, SpectraMax M3 (Molecular devices). The carotenoids/BSA complexes and admixtures were all measured using a 96 well plate at a fixed concentration to ascertain their absorbance. The control (BSA) was also measured. To examine the concentration dependence, various concentrations (2mg/ml to 0.05mg/ml) of Beta Carotene in BSA solution were measured, while maintaining the concentration of BSA constant throughout. Absorbance of the solution at 540nm was plotted against concentration.

*4.3 Raman Analysis*

Raman spectral measurements were carried out using a Horiba Jobin-Yvon LabRam HR800 spectrometer with a 16-bit Peltier cooled CCD detector, coupled to an Olympus BX41 upright microscope. The laser lines used were 473nm, 532nm, 660nm and 785nm, in each case with a 300 lines/mm grating. The spectral range employed was 400 – 3500cm^{-1} and the back scattered Raman signal was typically accumulated for 5 x 4 seconds. Depending on the measurement, 3-9 spectra were acquired per sample.

Raman measurements of the pure compounds were obtained by measuring a wet paste of the compound at room temperature with x60 objective and at the four different laser wavelengths. For the BSA complexes, measurements were performed by focussing into the solutions contained in a polystyrene 96-well plate, using a x10 objective.

To examine the concentration dependence, various concentrations of the Beta Carotene in BSA solution were further measured at 532nm while keeping the concentration of BSA constant.

*4.4 Raman Spectral Pre-processing*

Pre-processing techniques were applied to the raw Raman spectra within the MATLAB platform to correct for excess noise and remove inherent background signals. Smoothing and noise correction were done using the Savitzky-Golay algorithm [74], using a polynomial order of 5, window of 9. Further processing was done using the Extended multiplicative signal correction (EMSC) algorithm [42,49], Pre-processing was necessary to remove the interferent water and BSA spectra, which made up the solvent in which the carotenoids were dissolved. The reference for the EMSC was obtained for each source wavelength by adding a few drops of distilled water to a known amount of the carotenoid powder and making a thick paste. Using the modification of the process by Parachalil et al., the corrected spectra can be normalised by the co-efficient of the subtracted water, as an internal standard [49,50].

*4.5 Self Absorption Correction*

In the case where the wavelength of the Raman source is resonant with the sample absorption, the intensity is reduced as it propagates through the sample, and the Raman scattered light itself can be similarly attenuated by the sample absorption [24,75]. This self-absorption process can result in a deviation from linearity of the concentration dependence of the measured Raman signal [76].

The correction method described by Lu and colleagues in 2018 [24] was employed to correct for self-absorption. In accordance with Beer’s law, the depth (Z) profile of the Raman scattering intensity I_{R}(Z), including self-absorption is described by:

(1)

where I_{R0 }is the intensity of Raman scattering without absorption; α_{L}andα_{R }areextinction coefficients at incident laser and Raman scattering wavelengths, respectively [24,77,78]. The measured Raman scattering, I_{Rm}, can be expressed as:

(2)

= (3)

where d is the thickness of solution layer. Correcting for self-absorption, the original intensity of Raman scattering (without absorption) is described by:

(4)

The Raman spectrum of a compound can therefore be corrected, knowing the (concentration dependent) absorption at the source wavelength, and across the Raman spectral range. In the measurements reported here, the measurement pathlength for UV/visible absorption and Raman spectroscopy are different, and therefore the exponent in equation 3 is amended to _{L}d_{1} + _{R}d_{2}.

*4.6 Partial Least Squares Regression Analysis and Cross Validation*

Following pre-processing and the elimination of inherent background from the spectra, multivariate regression analysis of concentration dependent Raman responses was carried out using partial least squares regression (PLSR) to confirm the linear concentration dependent responses, and to demonstrate the predictive capacity of the technique.

The PLSR algorithm looks at the variation in spectral data or predictors, (X matrix), as they relate to the associated factors or responses, (Y matrix), according to the linear equation Y = XB + E, where B is the regression coefficient matrix and E is the residual matrix [73,79]. The Y matrix, or “target” variable is usually a quantifiable or systematically varied external factor, in this case carotenoid concentration. It then attempts to maximise the covariance of X (the Raman spectra) and the target, Y, described according to Latent Variables in a systematic model [73,79]. It can reduce the number of predictors to a much smaller set of uncorrelated components or latent variables which when summed up, cumulatively and progressively ((LV1>LV2 etc.) account for the co-variance. Least squares regression is therefore carried out on the latent variables, rather than using the original data [79,80]. PLSR can construct a predictive model, which can be used, for example, to predict the value of the target variable, based on the spectrum of an unknown sample, or vice versa.

The loading of the LV reveals the spectral features which contribute to that LV, and therefore to the co-variance. The Regression Co-efficient is the weighted sum of all the contributing LVs, and in spectral analysis, for a good correlation, should yield the spectrum of the constituent components which vary systematically as a function of the target variable.

In the protocol employed, the number of LVs to construct the model was chosen by identifying the point at which the cumulative %Variance Explained reached ~100%. The model was then subjected to a 10-fold Leave One Out Cross Validation process, repeated 100 times, to establish the Root Mean Squared Error of Cross Validation.

The *K* fold cross validation technique was used in this study to validate the model created. This method of cross validation is a non-exhaustive method, whereby the original dataset is divided randomly into K equal subsample sizes. One of the *K* subsamples is used as the validation data for testing the model and the remaining subsample is used as training data. Cross validation is repeated *K* times and each of the subsamples is used once as the validation data. Starting from the first element, the cycle continues until all the components have been trialed as ‘test’. To improve the number of latent variables used to create the model, the value equivalent to the minimum of the root mean square error of cross validation (RMSECV) and percent variance was estimated. RMSECV is useful to assess the efficiency of the prediction model created while the percent variance accounts for the validation of the number of components to be used to obtain the highest variation from the data. When *K* is equal to the number of observations (n), *K* fold cross validation is the same as the leave one out cross validation (LOOCV).

In this study, a 10 fold cross validation (*K* = 10) was carried out. Here, the observation set is divided into 10 equal sizes by random selection. The cross validation process is then carried out 100 times. During the sequence, each observation is used for testing just once but all the observations are used for training and testing. An average is obtained from the result and this is used to produce a single estimation.

*4.7 Principal Components Analysis*

Principal Components Analysis (PCA) is a multivariate analysis technique frequently used in analysing multi-dimensional data sets. It can reduce the number of variables in data sets with multi dimensions without altering the major variations within the data set [81]. The order of the principal components describes the relevance to the data set. For instance, PC1 should describe the highest variation in the data, followed by PC2, PC3 and so on. The first 3 PCs will generally provide up to 99% variance in the data set, giving the best visualisation of the differentiation in the cluster sets. [81,82].

In this study, PCA was used to explore the differentiation of the carotenoid spectral data sets of Beta Carotene, Lutein and Zeaxanthin. Data which had been previously corrected for interferents, noise, self-absorption and normalised for water content was used. The most prominent PC loadings were used to highlight the differences in the data set for the carotenoids.

*4.8 Nonlinear least squares curve fitting*

Nonlinear least squares curve fitting using a problem-based workflow in the MATLAB environment was employed to undertake the self-absorption correction of the concentration dependence of the Raman spectra of Beta Carotene/BSA complexes, and to fit weighted sums of the constituent spectra to admixture spectra of different ratios, correcting for the self-absorption of each of the constituent components.

In the first case, the function was defined by equation 2, and the problem was solved by fitting the concentration dependence of the measured Raman signal at ~1519 cm^{-1} by optimising the parameters for d_{1} and d_{2}. Equation 3 was then employed to correct the concentration dependent spectra of Beta Carotene, as well as the 1mg/ml spectra of Lutein and Zeaxanthin for self-absorption.

In the second case, the problem was defined by equation 3, and the problem was solved by fitting weighted sums of the Raman spectra of the three constituent carotene/BSA complexes, corrected for self-absorption, to the measured spectrum of the admixture, by optimising the weighting parameters, A(1), A(2), A(3), for each constituent component.

#### Language

Eng

#### File Format

.xls, .txt

#### Viewing Instructions

Matlab, windows

#### Data Owner

no

#### Funder

Technological University Dublin

#### Included in

Biological and Chemical Physics Commons, Investigative Techniques Commons, Optics Commons