NSA Awards Authors of Assessment of Trustworthiness in GPT Models

Source: National Security Agency NSA

The National Security Agency (NSA) Research Directorate recently selected “Decoding Trust: Comprehensive Assessment of Trustworthiness in GPT Models,” as the winner of its 12th Annual Best Scientific Cybersecurity Paper Competition.

The winning paper, authored by 19 researchers including professors Dawn Song, University of California at Berkeley; Bo Li, University of Illinois Urbana-Champaign; and Sanmi Koyejo, Stanford University, evaluated the framework for large language models (LLMs) and proposed a comprehensive trustworthiness evaluation for them, with a focus on generative pre-trained transformer (GPT) models. 

“The value of this paper to the foundational research community is the proposed evaluation framework for large language models,” said Dr. Adam Tagert, Technical Director of NSA’s Science of Security program. “This framework is a noteworthy foundational advancement in that other people can use it to rigorously evaluate future LLMs and scientifically compare them. It is essentially providing much needed metrics in this community.” 

NSA’s Laboratory for Advanced Cybersecurity Research established the competition in 2013 to encourage the development of scientific foundations in cybersecurity, and to support enhancement of cybersecurity within devices, computers, and systems through rigorous research, solid scientific methodology, documentation, and publishing. 

NSA’s Research Director Gil Herrera, along with NSA cybersecurity experts and external authorities in the field, selected this year’s winning entry from 42 papers nominated by the public.
The winning researchers considered diverse perspectives in their evaluation, including toxicity, stereotype bias, adversarial robustness, out-of-distribution robustness, robustness on adversarial demonstrations, privacy, machine ethics, and fairness. 

They discovered previously unpublished vulnerabilities to trustworthiness threats, such as that GPT models can be easily misled to generate toxic and biased outputs and leak private information in training data and conversation history. The paper uncovers vulnerabilities to trustworthiness threats and sheds light on trustworthiness gaps.
 
“This paper has high impact,” Tagert said. “A generative AI company has already adopted its research results and created a leaderboard on the framework.” 

According to Tagert, because of the high quality of papers nominated this year, two were selected for honorable mentions.

The first went to “Ethical Frameworks and Computer Security Trolley Problems: Foundations for Conversations,” by professors Tadayoshi Kohno, University of Washington; Yasemin Acar, Paderborn University (Germany)/Georgetown; and Wulf Loh, University of Tübingen (Germany). 

The second honorable mention was awarded for “SoK: I Have the (Developer) Power! Sample Size Estimation for Fisher’s Exact, Chi-Squared, McNemar’s, Wilcoxon Rank-Sum, Wilcoxon Signed-Rank and t-Tests in Developer-Centered Usable Security,” by PhD student Anna-Marie Ortloff, researcher Christian Tiefenau, and professor Matthew Smith, all of the University of Bonn in Germany. 
Nominations for the 13th annual Best Scientific Cybersecurity Paper Competition open on 15 January. NSA welcomes nominations of papers published in 2024 in peer-reviewed journals and technical conferences that show an outstanding contribution to cybersecurity science. Winners will be announced at the end of 2025.

Visit the Best Scientific Cybersecurity Paper Competition webpage for more information on the competition and to nominate a paper.