Field-programmable gate arrays (FPGAs) are powerful integrated circuits for low-overhead custom computing needs and design prototyping. Due to the hardware nature of programming an FPGA, finding bugs in a design can be a very challenging process. Signals need to be physically probed and data recorded in real time. This is often done by dedicating some resources on the FPGA itself towards an embedded logic analyzer. This method is effective but can be time and resource consuming. Academic research projects have produced a variety of methods for reducing this difficulty. One option that has previously been unexplored is the use of distributed LUT memory for debug trace buffers, rather than dedicated FPGA BRAM. This dissertation presents a novel, lean embedded logic analyzer that leverages leftover LUT resources on the FPGA for this purpose. Distributed Memory Debug (abbreviated as "DIME Debug") provides some amount of signal visibility into very large (90\%+ LUT utilized) FPGA designs or designs where the programmer requires all available device BRAM, situations in which currently available embedded logic analyzers are likely to fail. The ubiquitous nature of LUTs on FPGAs provides opportunities to insert debug circuitry near signals of interest without disturbing placement of the user design. Using only leftover LUTs for trace buffers allows for effectively no area overhead. The DIME Debug system typically has a critical path delay in the 7-9ns range, which can force non-ideal slower timing constraints on the user design. A simulated annealing based placement algorithm and other optimizations are shown to improve timing closure results from 20-50\% depending on benchmark and probe count. DIME debug can be instrumented into a fully implemented design incrementally using the RapidWright CAD tool, resulting in debug iterations under 15 minutes even for very large benchmarks. Another interesting possibility introduced by the use of memory LUTs for debug trace buffers is preallocating these resources. Setting aside a certain number of LUTs before implementation of the user design leaves them available for incremental debug instrumentation. Experiments with a preallocation scheme show that, with virtually no penalty to the user design, debug critical paths are lowered by approximately 1ns and 2-3X the number of trace buffers can be instrumented into most benchmarks.



Date Submitted


Document Type





FPGA, debug, embedded logic analyzer, distributed memory



Included in

Engineering Commons