Gradient Descent on Token Input Embeddings

(lesswrong.com)

2 points | by kp1197 5 hours ago

1 comments

  • kp1197 5 hours ago
    Does performing gradient descent on token input embeddings lead to interpretable results? And if not, why?