Justbee / Weesbij: Sam Bowman, Nicholas Schiefer

Friday, 2 February 2024

Sam Bowman, Nicholas Schiefer - "Intelligence Testing"

https://asteriskmag.com/issues/04/intelligence-testing

Amazing but scary article about intelligence testing of AI models.

Red teaming, or deliberately prompting the model to try to get it to do harmful things, is straightforwardly useful and in some sense does measure alignment. But if the model is capable enough that it might be able to make good guesses about when it's being evaluated, then you really can't be sure of anything. It's impossible to rule out the possibility that the model is just doing its best to look benign when it is being evaluated, and that contaminates any result you might have.

Justbee / Weesbij

Friday, 2 February 2024

Sam Bowman, Nicholas Schiefer - "Intelligence Testing"

Pages

Search This Blog

Blog Archive

Labels

Army of Slaves