Analysis shows that ChatGPT produces incorrect answers more than 50% of the time

Our manual analysis shows that ChatGPT produces incorrect answers more than 50% of the time. Moreover, ChatGPT suffers from other quality issues such as verbosity, inconsistency, etc. Results of the in-depth manual analysis also point towards a large number of conceptual and logical errors in ChatGPT answers. Additionally, our linguistic analysis results show that ChatGPT answers are very formal, and rarely portray negative sentiments. Although our user study shows higher user preference and quality rating for human answers, users make occasional mistakes by preferring incorrect ChatGPT answers based on ChatGPT’s articulated language styles, as well as seemingly correct logic that is presented with positive assertions.

Is Stack Overflow Obsolete? An Empirical Study of the Characteristics of ChatGPT Answers to Stack Overflow Questions

Leave a comment