"Post-Irradiation Fault Injection for Complex FPGA Designs" by Nathan G. Baker

Abstract

Fault injection is a common technique for estimating the impact of single-event upsets (SEUs) within the configuration memory (CRAM) of an FPGA design. Fault injection is typically conducted prior to a radiation test to evaluate the effectiveness of SEU mitigation techniques, estimate the FPGA design's behavior at the radiation test, and prepare the radiation test infrastructure. This thesis explores how fault injection can be used after radiation testing to reveal additional insight into an FPGA's behavior during a radiation test. A unique methodology, referred to as post-irradiation fault injection (PIRFI), was developed in which the locations and timestamps of CRAM upsets are recorded during a radiation test. The timestamps of design failures are also recorded. After the radiation test, the record of CRAM upsets is "played back" using fault injection to see how the design behaves during these same CRAM upsets outside of the radiation beam. A variety of other PIRFI experiments can also be performed to gain additional insight into an FPGA's radiation test behavior. This methodology was used to conduct two case studies on two design--a Linux SoC FPGA design and an FPGA switch design. PIRFI successfully reproduced 92% of the failures observed at a radiation test of the Linux design. The results indicated that CRAM upsets are not always guaranteed to cause design failures. PIRFI also provided insight into board dependency, revealing that some failures occur only on certain boards. Additionally, some failures exhibited a dependence on hidden latent faults introduced by prior CRAM upsets. Another PIRFI experiment demonstrated that the longer a fault persists, the more likely it is to cause failure. Finally, PIRFI helped to identify the specific CRAM upsets responsible for design failure at a radiation test and showed instances where multiple-bit CRAM upsets were required to cause failure. PIRFI was also conducted on a switch FPGA design, and the results are presented. Due to a logging error in recording CRAM upsets, only 11% of radiation test failures were reproduced. However, it is estimated that proper logging would have allowed approximately 80% of failures to be reproduced.

Degree

MS

College and Department

Ira A. Fulton College of Engineering; Electrical and Computer Engineering

Rights

https://lib.byu.edu/about/copyright/

Date Submitted

2025-02-28

Document Type

Thesis

Handle

http://hdl.lib.byu.edu/1877/etd13518

Keywords

FPGA, fault injection, radiation testing, fault tolerance, SEU

Language

english

Included in

Engineering Commons

Share

COinS