Applications of Pattern Recognition Entropy (PRE) and Informatics to Data Analysis

Shiladitya Chatterjee, Brigham Young University


The primary focus of my work is the application of informatics methods to the fields of materials science and analytical chemistry. The statistical analysis of data has become increasingly important in understanding the properties of materials and analytes. Statistical methods like principal component analysis (PCA) and multivariate curve resolution (MCR) are widely used for analysis in chemistry and other fields given their ability to categorize spectra in an unsupervised way. PCA is relatively easy to apply and has appealing mathematical properties. However, the results can be challenging to interpret, even for experienced users. In contrast, MCR results can be more interpretable, because the factors resemble real spectra and do not have negative scores or loadings. Nevertheless, the useful orthogonality properties of the scores and loadings in PCA are sacrificed in doing so. Other statistical analysis methods like cluster analysis and partial least squares regression (PLS-R) present their own challenges. Pattern recognition entropy (PRE) is a novel application of Shannon’s information theory for understanding the underlying complexity in spectra. Unlike PCA and MCR, PRE is a summary statistic that adopts the mathematical quantification of information and applies it for chemometric analysis. PRE values reflect the shape and complexity of spectra. Chapter 1 contains a description of the analytical methods/instruments that provided the data I analyzed by PRE and other informatics tools, including (i) X-ray photoelectron spectroscopy (XPS) and time-of-flight secondary ion mass spectrometry (ToF-SIMS) and (ii) liquid chromatography-mass spectrometry (LC-MS) and capillary electrophoresis (CE), (iii) a discussion of some of the commonly used statistical analysis tools like PCA, MCR, cluster analysis and PLS-R, and (iv) a description of PRE. Chapter 2 describes in much greater detail the theory associated with the statistical tools I used and PRE. Chapter 3 describes the PRE and informatics analysis of depth profiles through thin films by XPS and ToF-SIMS. Chapter 4 introduces the concept of the ‘reordered spectrum’ as an intuitive, visual representation of spectra to address the abstraction associated with PRE result. Total ion current chromatograms (TICCs) generated using LC-MS are often extremely complex and ‘noisy’. Chapter 5 describes the application of PRE as a variable reduction method for producing higher quality TICCs. Chapter 6 discusses the limitations associated with the application of PRE to TICCs and presents a new method using cross-correlation (CC) in conjunction with a PRE analysis. Chapter 7 discusses a new methodology that uses CE and PRE to detect autologous blood doping (ABD). Chapter 8 presents my conclusions of this present work and discusses the scope of future work on PRE. The thesis also contains several appendices. Appendix 1 introduces polyallylamine (PAAm) as a simple, easy-to-apply adhesion promoter for the widely used photoresist SU-8. Appendices 2, 3 and 4 contain articles I wrote that relate to trends in modern XPS instrumentation and 5-8 contain supplemental information relating to Chapters 3, 4, 5, and 7 respectively.