For centuries, scientists have dreamed of creating materials by design. Rather than discovery by accident, bespoke materials could be tailored to fulfill specific technological needs. Quantum theory and computational methods are essentially equal to the task, and computational power is the new bottleneck. Machine learning has the potential to solve that problem by approximating material behavior at multiple length scales. A full end-to-end solution must allow us to approximate the quantum mechanics, microstructure and engineering tasks well enough to be predictive in the real world. In this dissertation, I present algorithms and methodology to address some of these problems at various length scales. In the realm of enumeration, systems with many degrees of freedom such as high-entropy alloys may contain prohibitively many unique possibilities so that enumerating all of them would exhaust available compute memory. One possible way to address this problem is to know in advance how many possibilities there are so that the user can reduce their search space by restricting the occupation of certain lattice sites. Although tools to calculate this number were available, none performed well for very large systems and none could easily be integrated into low-level languages for use in existing scientific codes. I present an algorithm to solve these problems. Testing the robustness of machine-learned models is an essential component in any materials discovery or optimization application. While it is customary to perform a small number of system-specific tests to validate an approach, this may be insufficient in many cases. In particular, for Cluster Expansion models, the expansion may not converge quickly enough to be useful and reliable. Although the method has been used for decades, a rigorous investigation across many systems to determine when CE "breaks" was still lacking. This dissertation includes this investigation along with heuristics that use only a small training database to predict whether a model is worth pursuing in detail. To be useful, computational materials discovery must lead to experimental validation. However, experiments are difficult due to sample purity, environmental effects and a host of other considerations. In many cases, it is difficult to connect theory to experiment because computation is deterministic. By combining advanced group theory with machine learning, we created a new tool that bridges the gap between experiment and theory so that experimental and computed phase diagrams can be harmonized. Grain boundaries in real materials control many important material properties such as corrosion, thermal conductivity, and creep. Because of their high dimensionality, learning the underlying physics to optimizing grain boundaries is extremely complex. By leveraging a mathematically rigorous representation for local atomic environments, machine learning becomes a powerful tool to approximate properties for grain boundaries. But it also goes beyond predicting properties by highlighting those atomic environments that are most important for influencing the boundary properties. This provides an immense dimensionality reduction that empowers grain boundary scientists to know where to look for deeper physical insights.



College and Department

Physical and Mathematical Sciences; Physics and Astronomy



Date Submitted


Document Type





materials discovery, machine learning, grain boundaries, derivative structure enumeration, Polya enumeration theorem, monte carlo structure identification, order parameters