Effective Post-silicon Validation

Lecture

May 27 14:00
International Conference Center


Subhasish Mitra
Stanford University

Hardware failures are a growing concern as electronic systems become more complex, interconnected, and pervasive. The complexity challenge is further exacerbated by new ways of improving energy efficiency of electronic systems in the absence of CMOS (Dennard) scaling: increasing amounts of cores, uncore components, and accelerators; increasing degrees of adaptivity; and, increasing levels of heterogeneous integration. All these features and their complex interactions make future systems highly vulnerable to design flaws (bugs) that can jeopardize correct system operation and/or introduce security vulnerabilities.

Existing validation methods barely cope with today’s complexity. Traditional pre-silicon verification alone is no longer adequate because it is nearly impossible to detect and fix all bugs before manufacture. Post-silicon validation involves operating manufactured ICs in actual application environments to detect and fix bugs. Existing post-silicon practices are ad-hoc, and their costs are rising faster than design cost.

Post-silicon validation involves four major steps:

1. Detecting a problem by running test programs, such as OS, games or functional tests, until a system malfunction is observed.

2. Localizing the problem to a small region from the observed system malfunction.

3. Identifying the root cause of the problem.

4. Fixing or bypassing the problem.

Effective post-silicon validation requires a radical departure from today’s ad-hoc practices to structured techniques that are inspired by advances in design verification, formal methods, manufacturing testing, and robust system design with built-in resilience to failures. Another highly important aspect is the creation of benchmarks that can drive quantitative evaluation of new post-silicon validation techniques.

Prerequisites & suggested preliminary readings

Basic concepts in digital circuits, systems, computer architecture, and some knowledge of VLSI testing.

Learning outcomes

Post-silicon validation is fairly new as a “technical research” area with lots of challenging problems spanning bug detection, localization, root-cause and debug, bug fixing, and emulation. The audience will learn about research problems, as well as an overview of techniques in research literature (together with their drawbacks). The importance of benchmarking will be highlighted. Finally, an extensive bibliography will be provided.

Syllabus

Basic concepts; overview of challenges in bug detection, localization, debug, bug fixing, and emulation; importance of various coverage metrics; overview of various techniques in research literature, and their pros and cons; overview of research problems; synergy with robust system design techniques with built-in resilience to failures.