Modelling and Retrieving Video via Moving Objects

(viewing version)


Modelling and Retrieving Video via Moving Objects

(Work in progress)

Mohammad Nabil

John Shepherd

Anne Ngu

[Diagram:pic/cselogo]

School of Computer Science and Engineering
University of New South Wales (UNSW), Sydney


The Goal

Provide mechanisms to answer the following kinds of queries:


... The Goal

Important characteristics of these queries: Specific goal of this work:

Develop a model for describing spatiotemporal relationships among moving objects in video, animation, etc. that is useful as a basis for querying.

Needs to be: comprehensive, concise, efficient

Does not need to be: 100% accurate

We assume: objects have been identified and segmented in each frame


Remainder of Talk

  1. Model for static images   (2D-PIR image graphs)
  2. Model for video scenes   (TS-2D-PIR scene graphs)
  3. Scene graph operators
  4. Scene graph construction
  5. Scene graph similarity
  6. Prototype implementation
  7. Summary of progress, future work


2D-PIR Model for Static Images

Provides a concise structure for representing information about spatial relationships between objects in a static image.

Assumes we have a static image that has been pre-processed:

Approach draws on work by Basic idea: define relationships between objects in terms of


Toplogical Relationships  (Egenhofer)

[Diagram:pic/toprels]


Interval Relationships  (Allen)

[Diagram:pic/intrels]


2D-PIR

2D-PIR  =  2-Dimensional Projection Interval Relationship

A 2D-PIR Rs(A,B) is a triple (t,xp,yp), where

Notes:


Image Graphs

An image is modelled by a labelled digraph G(V,R) where We include only one of Rs(A,B) or Rs(A,B) as determined by object ordering.

For all Rs(A,B) in R,   A < B.

All retrieval on spatial relationships uses image graphs rather than images.


2D-PIR Example #1

[Diagram:pic/2dpir2]

Rs(A,B) = (disjoint,~during,~during)

(or   Rs(B,A) = (disjoint,during,during) )

[Diagram:pic/igraph1]


2D-PIR Example #2

[Diagram:pic/2dpir1]

Rs(A,B) = (disjoint,before,~overlaps)

Rs(A,C) = (disjoint,before,~meets)

Rs(B,C) = (covers,~finishes,~starts)

[Diagram:pic/igraph2]


2D-PIR Image Retrieval

Two modes: Example:

Find images containing a hawk perching on a stump

Query language:

  [ P | P <- ImageDB,  P.Rs(Hawk,Stump) sim (touches,~during,~meets) ]

  A sim B  =  Sim(A,B) > Threshold

(uses list-comprehension notation from FP and Obj.Method notation from OOP)


... 2D-PIR Image Retrieval

Sketch-based query: Example query:

[Diagram:pic/sketchq]


More Example Queries

Query:

   Find images containing a green apple on a rough wooden table

expressed as:

  [ P | P <- ImageDB,  P.Rs(Apple,Table) = (touches,?,~meets) ]

Query:

   Find images with Kate Winslett lying on a piece of wreckage in the middle of the North Atlantic

expressed as:

  [ P | P <- ImageDB,  P.Rs(KateWinslett,Wreckage) = (overlaps,?,?)
		       & P.Rs(Wreckage,NorthAtlantic) = (inside,?,?) ]


Image Distance

Image graph distance is based on distances of component 2D-PIRs.

For two graphs G1(V1, R1)  and  G2(V2, R2)

[Diagram:pic/eqDistGG1]

where

[Diagram:pic/eqDistGG2]

2D-PIR distance based on distances of component relationships.

For two 2D-PIRs Rs,1 = (t1,xp1,yp1)  and  Rs,2 = (t2,xp2,yp2)

[Diagram:pic/eqDistRR]


Spatial Relationship Distances

How to define e.g. D(before,overlaps)?

We base these distance measures on the observation of what happens when we move two objects continuously together.

[Diagram:pic/relDist]

We treat relationships that occur adjacent in the time sequence as being closer.


... Spatial Relationship Distances

Based on this idea, we define relationship neighbourhood graphs.

If one relationship R1(A,B) transforms directly into another R2(A,B) by continuous movement of objects A and B, then R1 and R2 are neighbours.

Relationship distance is defined as the number of arcs in the neighbourhood graph between two relations.

Topological relationship neighbourhood graph:

[Diagram:pic/topneigh]

Interval relationship neighbourhood graph:

[Diagram:pic/intneigh]


Image (Graph) Similarity

Based on our definition of relationship distance,  max(D(Rs,1, Rs,2)) = 20.8

We define image similarity as normalised, inverse distance:

[Diagram:pic/eqImgSim]

S=0 means totally different;   S=100 means identical


Static Image Retrieval Prototype

[Diagram:pic/imgarch]


Image Retrieval Costs

Quantities involved:
n number of objects in image
N distinct objects in DB
K number of images in DB
Kq number of images with Common>0 for query q

Size of image graph: n2/2 2D-PIRs + n ObjectDescriptors

Cost in answering queries:


A Model for Video Retrieval

A video work is a sequence of scenes.

A scene (aka shot) is:

All access to the database occurs in "units" of scenes.

A video query:

The user is presented with a thumbnail of each matching scene, and then performs further manual filtering (a la QBIC).


TS-2D-PIR Model for Video Scenes

Aim is analogous to that for 2D-PIR ...

Develop a concise structure for representing spatiotemporal information in video scenes to allow efficient retrieval.

Assumes we have a pre-processed video:

Not (yet) feasible for general video sources, but possible for some limited domains (e.g. video surveillance, satellite tracking, computer-generated animation).


Time

We model time via instants and intervals.

An interval is defined by start and end instants: I = (ts, tf)

Operations on intervals:

start(I) = ts      finish(I) = tf      duration(I) = tf - ts      I1 + I2 = (start(I1), finish(I2))

All scene-time is measured relative to start of scene.

If duration of scene is T, then scene interval  IS = (0,T)

Database may store absolute start time elsewhere.


Movement of Single Objects

Object movement is approximated by a sequence of linear movements.

Each movement represented as an iso-directional-interval   (d,r,I)

A track is: [ (d1,r1,I1),   (d2,r2,I2),   ...   (dn,rn,In) ]

where finish(I1) = start(I2),   finish(I2) = start(I3),   ...   (i.e. intervals are contiguous)


... Movement of Single Objects

Notes:


Track Example

[Diagram:pic/move1]

Track:    [ (d1, NW, I1),   (d2, N, I2),   (d3, E, I3),   (d4, NE, I4) ]

where    I1 = (t0,t1),   I2 = (t1,t2),   I3 = (t2,t3),   I4 = (t3,t4)


TS-2D-PIR

Tracks model individual objects; we also need to model changing spatial relationships between objects (i.e. spatiotemporal relationships)

Since spatial relationships typically persist over intervals, we again use an interval-based data structure.

TS-2D-PIR  =  Temporal Sequence of 2D-PIR

A TS-2D-PIR Rst(A,B) is a sequence of (t,xp,yp,I), where

A TS-2D-PIR is also called an st-relationship.


TS-2D-PIR Example

[Diagram:pic/stexample]

Rst(B,G) =
	[ (disjoint, ~starts, after, (t0,t1)),
	  (disjoint, ~finishes, after, (t1,t2)),
	  (disjoint, before, after, (t2,t3)) ]

Rst(B,R) =
	[ (disjoint, ~meets, after, (t0,t1)),
	  (disjoint, ~overlaps, after, (t1,t2)),
	  (touches, equals, ~meets, (t2,t3)) ]

Rst(G,R) =
	[ (touches, ~meets, meets, (t0,t1)),
	  (disjoint, ~meets , before, (t1,t2)),
	  (disjoint, after, before, (t2,t3)) ]


Scene Graphs

Our model for video scenes is an extension of image graphs to incorporate temoporal notions.

A scene is modelled by a labelled digraph S(M,R), where


Scene Graph Example

[Diagram:pic/stexample]

M = {
    (B, [(1.4,SW,(t0,t1)), (1,S,(t1,t2)), (0,?,(t2,t3))]),
    (G, [(1.4,SE,(t0,t3))]),
    (R, [(1,E,(t0,t3))])
}
R = {
    ((B,G),
	[ (disjoint, ~starts, after, (t0,t1)),
	  (disjoint, ~finishes, after, (t1,t2)),
	  (disjoint, before, after, (t2,t3)) ]
    ),
    ((B,R),
	[ (disjoint, ~meets, after, (t0,t1)),
	  (disjoint, ~overlaps, after, (t1,t2)),
	  (touches, equals, ~meets, (t2,t3)) ]
    ),
    ((G,R),
	[ (touches, ~meets, meets, (t0,t1)),
	  (disjoint, ~meets , before, (t1,t2)),
	  (disjoint, after, before, (t2,t3)) ]
    )
}


Scene Graph Construction

Algorithm: Scene graph construction
Input:     A pre-processed video scene (objects are identified,
                  object positions, pointsets and projections
		  are available for each frame)
Output:    A scene graph S(M,R) representing the scene 

begin
 (1)    M = { };   R = { };
 (2)    for each frame F do
 (3)       t = time of frame F relative to start of scene
 (4)       for each object X in F do
 (5)          if X is not already in M then
 (6)             include new object descriptor (X,[]) in M
 (7)          else
 (8)             compute (d,r,I) from current and previous positions
 (9)             append (d,r,I) to track for X
 (10)         endif
 (11)         save current position as previous
 (12)         for each object Y greater than X in F do
 (13)            compute current Rst(X,Y)
 (14)            if there is no edge for (X,Y) in R then
 (15)               include new edge (X,Y,[]) in R
 (16)               save current Rst(X,Y) as existing Rst(X,Y)
 (17)               set time for startOfRst(X,Y) to t
 (18)            else
 (19)               if F is the last frame or
                          current Rst(X,Y) is different to existing Rst(X,Y) then
 (20)                  compute (t,xp,yp,I) based on existing Rst(X,Y)
                          and interval from startOfRst(X,Y) until t
 (21)                  append (t,xp,yp,I) to TS-2D-PIR for edge (X,Y)
 (22)                  save current Rst(X,Y) as existing Rst(X,Y)
 (23)                  set time for startOfRst(X,Y) to t
 (24)               endif
 (25)            endif
 (26)         endfor
 (27)      endfor
 (28)   endfor
end.


Scene Retrieval

As for image retrieval, there are two modes: Wider range of operators than for static image retrieval: Use OO notation for operators: Obj.Operator(Args)

For relevant operations, if Obj is a set (list) of objects, then Operator is applied to each element in the set.


Trajectory Extraction

If S is a scene graph containing an object descriptor:

(A,   [ (d1,r1,I1), (d2,r2,I2), ... (dn,rn,In) ])

then S.trajectory(A) is the sequence:

[ (d1,r1), (d2,r2), ... (dn,rn) ]


2D-PIR Extraction

If S is a scene graph containing:

Rst(A,B) = [ (t1,xp1,yp1,I1), (t2,xp2,yp2,I2),... (tn,xpn,ypn,In) ]

then S.st-seq(A,B) is the sequence:

[ (t1,xp1,yp1), (t2,xp2,yp2),... (tn,xpn,ypn) ]


Temporal Extraction

If S is a scene graph and

ST = [ (t1,xp1,yp1), (t2,xp2,yp2),... (tn,xpn,ypn) ]

is a 2D-PIR interval, then S.interval(A,B,ST) is an intveral

I = I1+I2+...+In

such that there is a subsequence

[ (t1,xp1,yp1,I1), (t2,xp2,yp2,I2),... (tn,xpn,ypn,In) ]

in Rst(A,B).

If no such sequence exists, the result is the special value Never.


Selection and Predicates

Selection uses list-comprehension notation:

  [ S | S <- DB, Predicate1, Predicate2, ... ]

Assume that standard relational tests are available.

The notation Objs(S) denotes the set of objects in a scene.

The predicate has is defined as:   S.has(X) = X Objs(S)


Spatial Matching

Let ST be a 2D-PIR sequence of the form

[ (t1,xp1,yp1), (t2,xp2,yp2),... (tn,xpn,ypn) ]

Let P be a 2D-PIR pattern:

Let L* be the set of all subsequences of ST.

The spatial matching predicate is defined as:

[Diagram:pic/spmatch]


Scene Construction Operators

Spatiotemporal projection: S.project(X1,X2,...,I)

Yields a new version of S containing only the specified objects and only information for the specified time interval.

Special cases:

Overlay: S1.overlay(S2, ts)

Combines scene graphs, with all S2 motion starting at time ts after the start of S1.


Example: Animal Tracking

Assume we have a fixed camera watching a region of country and animals tagged with GPS/radio-transmitters.

Monitor the region for several years, treating each day as a scene.

Animal movement is easy to track automatically.

Each animal is identified uniquely by its transmitter.

Assume image is oriented so that top=N, left=E, bottom=S, right=W.

This kind of data might be useful to biologists to monitor migration, breeding, ...


Example Animal Query #1

Query:

  Show the migration path of a specific animal named Elephant.

Strategy:

Expressed as:

  [ S | S <- AnimalDB, S.has(Elephant) ] . trajectory(Elephant)


Example Animal Query #2

Query:

  How long is animal A inside region R?

Strategy:

Expressed as:

  AinsideR = [ S | S <- AnimalDB, S.interval(A,R,[(~contains,?,?)]) != Never ]

  AinsideR.interval(A,R,[(~contains,?,?)]).duration


Example Animal Query #3

Query:

  Find scenes where animal A enters region R from the west side and leaves from the north side.

Strategy:

Expressed as:

  EnterWest = [ (disjoint,before,?), (touches,meets,during),
                (overlaps,overlaps,during), (~contains,during,during) ]

  ExitNorth = [ (~contains,during,during), (overlaps,during,~overlaps),
                (touches,during,~meets), (disjoint,during,after) ]

  [ S | S <- AnimalDB, S.st-seq(A,R).matches(EnterWest + ExitNorth) ]

Would be simpler to do as sketch query.

[Diagram:pic/movsketchq]


Example Animal Query #4

Query:

  Find scenes where some animal passes through region R.

Strategy:

Expressed as:

  PassThrough = [ (disjoint,?,?) .. (~contains,?,?) .. (disjoint,?,?) ]

  [ S | S <- AnimalDB,
        exists [ A | A <- Objs(S), S.st-seq(A,R).matches(PassThrough) ]
  ]


Scene Similarity

Scene similarity is based on combination of distances between scene components (tracks, TS-2D-PIRs).

For two scene graphs S1(M1,R1) and S2(M2,R2),

[Diagram:pic/eqSceneDist0]

[Diagram:pic/eqSceneDist1]

where

[Diagram:pic/eqSceneDist2]

wt and ws reflect weighting to be given to trajectories and 2D-PIR sequences respectively.


Track Similarity

Requires measuring distance between two trajectories for possible "alignments" and choosing minimum.

Trajectory distance was tested by:

Found: reasonable precision/recall performance for threshold 70%


Video Scene Retrieval Prototype

[Diagram:pic/movarch]


Conclusions

Have developed: Still to do:


References/Further Reading

  1. J.F.Allen,
    Maintaining Knowledge about Temporal Intervals,
    Communications of the ACM, Nov 1983
    [original description of interval relationships; in this case the context was temporal, rather than spatial, intervals]
  2. S.K.Chang, Q.Y.Shi, C.W.Yan,
    Iconic Indexing by 2D Strings,
    IEEE Transactions on Pattern Recognition and Machine Intelligence, 1987
    [description of method to represent static images by using relationships between object projections on axes]
  3. M.J.Egenhofer
    Point-set Topological Spatial Relations,
    International Journal on Geographic Information Systems, 1991
    [original description of point-set topological relations between image regions, which forms the basis for the topolgical component of 2D-PIRs]
  4. M.Nabil, A.Ngu, J.Shepherd
    Picture Similarity Retrieval using the 2D Projection Interval Representation,
    IEEE Trancations on Knowledge and Data Engineering, 1996
    [detailed description of 2D-PIR-based moving object model from this talk]
  5. M.Nabil, A.Ngu, J.Shepherd
    Modelling and Retrieving Video via Moving Objects,
    Journal of Multimedia Technology and Applications (to appear)
    [detailed description of 2D-PIR-based moving object model from this talk]


Produced: 22 May 98