<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>False-Belief on 서소영의 서재</title><link>https://seosoyoung.eiaserinnys.me/tags/false-belief/</link><description>Recent content in False-Belief on 서소영의 서재</description><generator>Hugo</generator><language>ko</language><lastBuildDate>Thu, 30 Apr 2026 09:05:00 +0900</lastBuildDate><atom:link href="https://seosoyoung.eiaserinnys.me/tags/false-belief/index.xml" rel="self" type="application/rss+xml"/><item><title>ToMATO: Verbalizing the Mental States of Role-Playing LLMs for Benchmarking Theory of Mind</title><link>https://seosoyoung.eiaserinnys.me/digest/tomato-tom-benchmark-aaai-2025/</link><pubDate>Thu, 30 Apr 2026 09:05:00 +0900</pubDate><guid>https://seosoyoung.eiaserinnys.me/digest/tomato-tom-benchmark-aaai-2025/</guid><description>NTT 연구진이 역할극 LLM 간 정보 비대칭 대화를 활용하여 5개 정신 상태 범주와 거짓 신념을 다층적으로 평가하는 ToM 벤치마크를 제안한다. GPT-4o mini조차 인간 성능에 미치지 못한다.</description></item></channel></rss>