Animating images inherently limits you to the specific images you are animating. If the replacement image has the same mesh, it can work if we store the vertex changes relatively. If the mesh is different we can still apply the animation, but the new image is unlikely to deform how you want. Maybe this is just something the user needs to watch out for. I suppose this can lead to an animation per image, if your images are not similar and you need to deform them.
I'm open to ideas on how to make it more flexible.
Maybe an animation can have a timeline for each image instead of each slot. Eg, you have a single animation and images A and B which are each in a different skin. You would animate with A visible and deform it. When you switch skins so B is visible, B does not deform at all. You could then add keys to the same animation to deform B. At runtime, if A is visible it uses A's deform keys and ignores B's deform keys, and vice versa.
If you change an image during an animation, you should be able to manipulate the new image's vertices just fine. Eg, image A has 6 vertices, image B has 9. You key vertex changes for A, then switch to B, then key vertices for B. This could work if we store deform keys in a timeline per slot or per image.
Yes, you can use a "region" attachment which is what we have now, or a "mesh" attachment which is new. The mesh can have 3 or more vertices, so you only pay the CPU cost for the vertices you need.