Loading paper
MedGPTEval: A Dataset and Benchmark to Evaluate Responses of Large Language Models in Medicine | Tomesphere