kuchinster

kuchinster

kuchinster@hub.hubzilla.de

A seeker of truth, with a keen sense of justice.

Ort:: Kharkov

Geschlecht:: Мужчина

Homepage:: https://www.mas.to/@kuchinster

2023-12-06 15:36:01

kuchinster

kuchinster@hub.hubzilla.de

Wayne Radinsky schrieb den folgenden Beitrag Wed, 06 Dec 2023 04:52:35 +0100

Visualization of how GPT works. This is an impressive visualization, where you can even mouseover the matrices and it'll show you not just the value in that cell but how the value in that cell was calculated. It has accompanying text where animations are played as you move through the text (with the spacebar) and you canreplay animations with little "play" buttons. It uses a simplified 85,000-parameter GPT called Nano-GPT.

The heart of it is the "self attention" chapter. Remember, "transformer" is the nonsensical name for the blocks in neural networks that handle the "attention" mechanism (and is the "T" in "GPT"). The "self attention" chapter shows how there are learned weights (the animation shows only inference, not training, so you have to assume the weights have the correct values already from a training process not shown) for "Q", "K", and "V". These are combined with the input to from Q vectors, K vectors, and V vectors. "Q" stands for "query", "k" stands for "key", and "V" stands for "value", and this is supposed to remind of you doing a lookup in a key-value table.

But while that may be the general idea, the visualization here shows you what actually happens in "transformer" neural networks like GPT. The Q and K vectors are combined in such a way that only the current "Q" is used but "K" up to the current entry are used, so "K" is allowed to "see into the past". Q and K are combined into an "attention matrix". After a "normalization" step, this combined "attention matrix" is coupled with V to produce the output of the transformer block.

The text that accompanies the visualization explains the full context of this, including the "tokenization" at the beginning of the process (that produces "embeddings") and the softmax and logits that are used to pick the tokens that are output at the end of the process.

There are visualizations for GPT 2 and GPT 3 as well, which are much huger, but there is no accompanying text to walk you through those visualizations.

LLM Visualization

#solidstatelife #ai #aieducation #llms #gpt