Using Cloud Recognition in combination with Video Drawables

Video drawable in general

Video drawables can start playback if the play() function is called but their ARObject is not in the onImageRecognized state. This is expected behaviour. If you want that the video is only playing in case it's visible to the user, please use the onImageRecognized and onImageLost trigger of your AR.ImageTrackable or GeoObject to call the play(), resume() and pause() function. You find an example implementation for this in our Video - Playback States example.

Another important point to mention here is the enabled property. The current implementation simply disables the video rendering but its audio track is still playing in the background. So instead of only setting the enabled property to false, please also call the pause() function of AR.VideoDrawable.

Video drawable in combination with cloud recognition

Cloud recognition just adds a small layer of complexity to the above paragraph. It's important to know that cloud recognition uses similar content to what can be found in a .wtc file. Each successful server response replaces the existing internal .wtc content with a new one. In case a AR.ImageTrackable is created with content A and a new content B arrives at the client, the AR.ImageTrackable can no longer receive onImageRecognized an onImageLost events. That's the reason why all existing AR.ImageTrackables should be deleted in case a new server response is received. Example Cloud Recognition - Basic Recognition OnClick demonstrates this in line 90. Also all augmentations are deleted (line 82 and 84) every time a new server response is received. This ensures that there are no augmentations or trackables left over that are no longer usable.

In case augmentations and trackables are not deleted, they are still alive in the SDK but since the trackable can no longer fire its events, augmentations are not visible anymore in case the reference image they belong to is visible in the current camera frame. So the video drawable could no longer be started or stopped based on the state of the vision trigger.

If the video drawable already received a play() function call and internally started loading its content before the tracking data will be replaced by the next server response, it would start playing back its audio track as soon as enough video data is loaded. The pause() function could still be called but not from the vision trigger anymore (remember, the trackable is already disabled by the new tracking data). If the video drawable is deleted once the new tracking data arrives, this behaviour would not occur anymore because the object is deleted and therefore does not perform any loading or playback operation.

To summarise this paragraph: It's very important that all augmentations and trackable are deleted in case a successful server response is received.

Attached is a sequence diagram that shows a situation where video drawables would continue playing their audio track although its video frame is not rendered anymore.

What's happening:

The sequence diagram shows the usage of an Architect World that sends a camera frame to the server every time the user taps on a button.

The first two times target A is recognized. Target B is recognized when the third request is send. In case target A is recognized, a video drawable and trackable is created. Target B has no augmentation defined (for simplification). Also noteworthy here is the fact that each response from the server invalidates the currently loaded tracking data (as described before).

The first client/server interaction simply starts a video drawable without any problems. Once the server response is processed, a video drawable is created and starts loading its content. If enough video data is received, the video drawable starts rendering the video and playing back the audio track.

The second client/server interaction is still fine. It simply replaces the previous video drawable with a new one and calls its play() function (and again starts loading the same video data because the deletion of the first one completely removed all of it's already loaded video data).

The critical client/server interaction is the third one. The user initiated the server communication before the second response arrived at the client. So while the second video drawable started loading its content, the third server response arrived at the client. This caused the trackable, that was created with the second response, to be unreachable. It's attached video drawable would never be visible although the reference image might be in the current camera frame (Because its reference image data is already overwritten by the third response). Because the second video drawable already started it's loading procedure, it starts audio playback as soon as enough video data is received at the client. The result is that the user can hear the audio track of the second video drawable but is unable to see the frames rendered although he might have the reference image for target A in the current camera frame.

To fix this problem, simply delete all augmentations and trackable as soon as a new server response is received.