“Interpretability”, “alignment”, and “superalignment” are buzzwords, not feasible safety goals for large language models
Share this post
The Fatal Flaws of the Prevailing AI Safety…
Share this post
“Interpretability”, “alignment”, and “superalignment” are buzzwords, not feasible safety goals for large language models