MoreRSS

site iconqtmuniao | 青藤木鸟修改

一名专注大规模数据系统的程序员,喜欢读书、写作、分享、羽毛球和摄影。
请复制 RSS 到你的阅读器,或快速订阅到 :

Inoreader Feedly Follow Feedbin Local Reader

qtmuniao | 青藤木鸟的 RSS 预览

大模型的损失函数为什么是交叉熵

2026-03-29 15:31:35

引子

刚入门大模型的时候,由于线性代数、概率论和信息论等数学知识的短板,很容易迷失在诸多术语中:logprob(对数概率)、likelihood(似然)、NLL(Negative Log Likelihood,负对数似然)、cross entropy(交叉熵)、perplexity(困惑度)。它们常常出现在论文和文档的各种角落里,但都像点赞之交的朋友,频见其名,不解其意。

后来某天,在慢慢的补过一些最基础的数学知识后,在公司相关的上下文浸淫足够久后,终于在某次和 ChatGPT 的聊天中发现:上面一组概念本质上是同一件事的不同面向的侧写。从概率论的门摸进去叫 NLL,从信息论的门踏进去叫交叉熵,从 PyTorch 的门看进去叫 F.cross_entropy——殊途同归,本质上都是在试图刻画"模型当前输出离预期还有多远"。

“横看成岭侧成峰”,在大模型这种高维上下文的领域中,这种盲人摸象的感觉所在多有。不过我们这种三维生物,也只能靠长久的浸淫,才能靠着不同领域知识的交叉验证,才会突然有一天顿悟——嗷,原来这是同一座山。

本文想做的,就是想聊聊大模型领域中最基础概念——交叉熵这个损失函数的“一花各表”。

NLL-entropy.png

Why Is the Loss Function of Large Models Cross-Entropy

2026-03-29 15:31:35

Prologue

When first entering the world of large models, due to gaps in foundational math knowledge such as linear algebra, probability theory, and information theory, it’s easy to get lost among numerous terms: logprob (log probability), likelihood, NLL (Negative Log Likelihood), cross entropy, perplexity. They frequently appear in various corners of papers and documentation, yet they all feel like acquaintances you only know by name — you see them often, but don’t truly understand them.

Then one day, after slowly catching up on some basic math knowledge and immersing myself in the company context for long enough, I finally realized during a chat with ChatGPT: the above set of concepts are essentially different perspectives of the same thing. Enter through the door of probability theory and it’s called NLL; step through the door of information theory and it’s called cross entropy; look through the door of PyTorch and it’s F.cross_entropy — different paths leading to the same destination, all essentially trying to characterize “how far the model’s current output is from the expected result.”

“Viewed from the side, a mountain looks like a ridge; viewed from the end, a single peak” — in a high-dimensional field like large models, this feeling of blind men touching an elephant is everywhere. But we three-dimensional creatures can only rely on long-term immersion, cross-verifying knowledge from different domains, until one day we suddenly have an epiphany — ah, so this is the same mountain.

What this article aims to do is talk about the most fundamental concept in the large model domain — the “many faces” of cross entropy as a loss function.

NLL-entropy.png

20260120 Bilibili Live — Key Takeaways on Switching to LLMs

2026-01-25 16:13:26

I joined an LLM company in early 2024, having previously worked in the infra industry (databases, storage, etc.), so I have some very basic insights on switching careers. I haven’t shared on Bilibili for a long time; this live stream forced me to get back in gear. I answered some of your questions and bridged a bit of the information gap. This post is a slightly more organized summary of some points mentioned during the stream, with some materials I find valuable attached at the end.

Bilibili live stream: https://www.bilibili.com/video/BV1uckJBkEto

Cover image

20260120 B 站直播 —— 转行大模型文字精要

2026-01-25 16:13:26

我是 2024 年初到一家大模型公司工作,之前一直在数据库、存储等 infra 行业工作,因此有些很粗浅的转行认知。很久没有在 b 站做分享了,这次靠直播强制开机,回答了大家一些问题,稍稍弥合一点信息差。本文对直播中提到的一些点的稍微规整一点的总结,并将一些我觉得不错的资料附在最后。

b 站直播:https://www.bilibili.com/video/BV1uckJBkEto

题图

2025 Year-End Summary — Inward Growth

2025-12-28 23:28:46

Since becoming distinctly self-aware, never have I clashed so intensely with the world and with myself as I have this year—yet the result is strangely magical: I have become even more peaceful. Many subconscious reactions, many habitual practices, when excavated inward, can be traced back to such ancient reinforcement chains. Just as Shi Tiesheng said—the bullet fired in youth strikes squarely between the brows at this age.

Thus, whether forced or spontaneous, this year has become an inevitable journey of inward growth—observing and tracing the subtle origins of my emotional shifts, as in the investigation of things to extend knowledge. Seeing heaven and earth, seeing all beings, ultimately serves to see oneself. Although old inertia will persist for some time, the beginning of awareness is the seed that shapes a different trajectory.

Foguang Temple's Sutra Pillar and East Main Hall

2025 年终总结——向内生长

2025-12-28 23:28:46

有明显的自我意识以来,从没有像今年这样和世界、和自己发生如此激烈的冲撞,但结果很神奇——反倒更加平和了。很多下意识的反应、很多习以为常的做法,向内挖时,竟然都能摸出如此久远的强化链路。正如史铁生说的——那颗年少时射出的子弹,在长到这个年纪的时候,正中眉心。

于是,不管是被迫地还是自发地,今年都开始难以避免地向内生长——如格物致知一般去观察和追溯自己细微的情绪变化源头,见天地、见众生,终是为了见自己。虽然以前惯性还会持续一段时间,但觉察的开始,便是塑造另外轨迹的种子。

佛光寺经幢和东大殿