Abstract:
Large Language Models (LLMs) have revolutionized natural language processing, but they continue to face critical challenges related to security, privacy, and reliability. Our research aims to enhance the trustworthiness of LLMs by addressing key concerns such as hallucinations, glitch tokens, and jailbreaks—issues that compromise the reliability of LLM outputs and necessitate robust mechanisms for detection and mitigation.
In this talk, we introduce a novel framework that leverages logic programming and metamorphic testing to automatically generate testing benchmarks and detect hallucinations in LLM responses. This structured, scalable method offers an effective means of uncovering and evaluating hallucinations. Furthermore, we briefly introduce our works on other crucial aspects of LLM trustworthiness, including glitch tokens, and jailbreak exploits, outlining a roadmap for improving the reliability and robustness of LLMs.
Bio:
Wang Kailong is an associate professor (with tenure) in the School of CSE at Huazhong University of Science and Technology (HUST). He earned his PhD from the National University of Singapore (NUS), supervised by Prof. Jin Song Dong. His research focuses on the Security Analysis and Testing of Large Language Models, Mobile and Web Security and Privacy. He has published in various top-tier conferences and journals such as OOPSLA, NDSS, MobiCom, TSE, TOSEM, FSE, ASE, ISSTA and WWW.