How to build an evaluation for ChatGPT (based on GPT 3.5) to explore its capability and limitations?
The main contribution of this research will be a new dataset and metric for the evaluation and the insights derived from the evaluation.
A specific approach might be to collect and create a dataset of various questions and test it with chatgpt.
Evaluate how well ChatGPT performs on different problems (accuracy, speed of response, etc.) and summarise the results to determine which problems chatgpt performs better on. Meanwhile, the study will also try to find out the ChatGPT’s problems in answering questions.