Vision-based approaches have the advantage that the user is not encumbered with any complex devices (although they may still have to wear a glove, it is not as device-ridden or cumbersome as some of the device-based gloves). However, it has the significant disadvantage that an immense amount of computing power is typically required just to process the images to extract the hand position, before being able to analyse the data. While this would not be a major disadvantage for virtual reality applications (the mainstay of the hand-interface device industry) -- since it is conceivable to have a virtual reality room (which some people have, in honour of Star Trek: The Next Generation, called ``holodecks'') -- this would pose major difficulties for applications potentially requiring portability, such as sign language recognition. Also, while giving reasonable 2-D resolution, these systems typically do not handle the 3-D aspect of hand positioning well, unless two separate cameras are used, doubling the input data and complicating the already complex algorithms further.
On the other hand, the device-based approaches can suffer from the limitations of the devices used for measurement of hand movement. Different people have different hand-sizes, and calibration of these devices is a problem which has only been addressed recently. The motion detectors are subject to physical noise (for example, any metal near a Polhemus tracker tends to have rather strange effects on its behaviour), and some software is still required for filtering. On the other hand, they consume little computing power (in fact one of the first gloves, the Z-glove [ZL87] was attached to a Commodore 64 -- hardly a rival to a Cray). This means that cost can be kept down. It also adds to their portability and some of the gloves (such as the CyberGlove [KL89]) have been road-tested, while attached to small palm-top computers.