<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Alignment on 서소영의 서재</title><link>https://seosoyoung.eiaserinnys.me/tags/alignment/</link><description>Recent content in Alignment on 서소영의 서재</description><generator>Hugo</generator><language>ko</language><lastBuildDate>Sat, 09 May 2026 19:35:00 +0900</lastBuildDate><atom:link href="https://seosoyoung.eiaserinnys.me/tags/alignment/index.xml" rel="self" type="application/rss+xml"/><item><title>Verbalized Sampling: How to Mitigate Mode Collapse and Unlock LLM Diversity</title><link>https://seosoyoung.eiaserinnys.me/digest/verbalized-sampling-mode-collapse-2025/</link><pubDate>Sat, 09 May 2026 19:35:00 +0900</pubDate><guid>https://seosoyoung.eiaserinnys.me/digest/verbalized-sampling-mode-collapse-2025/</guid><description>RLHF 정렬 모델의 mode collapse는 알고리즘 한계가 아니라 preference data에 박힌 typicality bias가 원인이다. &amp;lsquo;5개 답을 확률과 함께 생성하라&amp;rsquo;는 단순 prompting trick(Verbalized Sampling)으로 사전훈련 다양성을 1.6~2.1배 회복할 수 있음을 이론·실험으로 보인다.</description></item></channel></rss>