Loading paper
Performance of large language models in numerical vs. semantic medical knowledge: Benchmarking on evidence-based Q&As | Tomesphere