https://asteriskmag.com/issues/04/intelligence-testing
Amazing but scary article about intelligence testing of AI models.
Red teaming, or deliberately prompting the model to try to get it to do harmful things, is straightforwardly useful and in some sense does measure alignment. But if the model is capable enough that it might be able to make good guesses about when it's being evaluated, then you really can't be sure of anything. It's impossible to rule out the possibility that the model is just doing its best to look benign when it is being evaluated, and that contaminates any result you might have.