It's just prerecorded animations and it picks out key words. Tell it to do anything to the sofa, for example, and it just sits on the thing. My "shake it like Beyonce" thing works fine just as "shake it", etc.
Although I remember when I first saw it there were a few minutes where I thought it was live or something.
|