Modelling and Retrieving Video via Moving Objects
(viewing version)
Modelling and Retrieving Video via Moving Objects
(Work in progress)
Mohammad Nabil
John Shepherd
Anne Ngu
School of Computer Science and Engineering
University of New South Wales (UNSW), Sydney
The Goal
Provide mechanisms to answer the following kinds of queries:
-
Find an image containing a
green apple
on a
rough wooden table
-
Find a movie where
Kate Winslett is lying
on a
piece of wreckage
in the middle of the
North Atlantic
-
Find any scenes where
Arnold Schwarzenegger crashes
through a
door
-
Find the place in the 1990 World Cup Final where
Maradonna
dribbles the
ball
into the
penalty area and
scores a goal
-
Find the day when
the elephant
enters
area A
from the west and
leaves to the north
... The Goal
Important characteristics of these queries:
- objects (e.g. Kate, Arnie, soccer ball, penalty area, ...)
- relationships between objects (e.g. Arnie and door)
- objects move (so relationships change over time)
Specific goal of this work:
Develop a model for describing spatiotemporal relationships among
moving objects in video, animation, etc. that is useful as a basis
for querying.
Needs to be: comprehensive, concise, efficient
Does not need to be: 100% accurate
We assume: objects have been identified and segmented in each frame
Remainder of Talk
- Model for static images (2D-PIR image graphs)
- Model for video scenes (TS-2D-PIR scene graphs)
- Scene graph operators
- Scene graph construction
- Scene graph similarity
- Prototype implementation
- Summary of progress, future work
2D-PIR Model for Static Images
Provides a concise structure for representing information about
spatial relationships between objects in a static image.
Assumes we have a static image that has been pre-processed:
- objects have been identified
(we use unique identifier, but any set of
"distinguishing characterstics" would be ok)
- objects have been segmented
(set of pixels that comprise the object is available ...
gives object pointset/region)
Approach draws on work by
Basic idea: define relationships between objects in terms of
- topological relationship (defined by pointsets (boundary,interior))
- interval relationships of object projections on x and y axes
Toplogical Relationships (Egenhofer)
Interval Relationships (Allen)
2D-PIR
2D-PIR = 2-Dimensional Projection Interval Relationship
A 2D-PIR Rs(A,B) is a triple (t,xp,yp), where
- t is a topological relationship
- xp is the interval relationship between x-axis projections of A and B
- yp is the interval relationship between y-axis projections of A and B
Notes:
- approach extends naturally to nD-PIR
- Rs(A,B) ¬= Rs(A,B)
(there is a well-defined inverse)
Image Graphs
An image is modelled by a labelled digraph G(V,R) where
- V is a set of nodes labelled by object descriptors
(in current system, a descriptor is simply a unique identifier)
- R is a set of edges labelled by 2D-PIR relationships
(relationship between the objects in end-point nodes)
We include only one of Rs(A,B) or
Rs(A,B)
as determined by object ordering.
For all Rs(A,B) in R, A < B.
All retrieval on spatial relationships uses image graphs rather than images.
2D-PIR Example #1
Rs(A,B) = (disjoint,~during,~during)
(or Rs(B,A) = (disjoint,during,during) )
2D-PIR Example #2
Rs(A,B) = (disjoint,before,~overlaps)
Rs(A,C) = (disjoint,before,~meets)
Rs(B,C) = (covers,~finishes,~starts)
2D-PIR Image Retrieval
Two modes:
- query language with 2D-PIR operators
- sketch-based similarity retrieval
(with user-specified threshold)
Example:
Find images containing a hawk perching on a stump
Query language:
[ P | P <- ImageDB, P.Rs(Hawk,Stump) sim (touches,~during,~meets) ]
A sim B = Sim(A,B) > Threshold
(uses list-comprehension notation from FP and Obj.Method
notation from OOP)
... 2D-PIR Image Retrieval
Sketch-based query:
- user draws rough object outlines and identifies objects
- system generates image graph for sketch (query graph)
- searches database by matching query graph to image graphs
Example query:
More Example Queries
Query:
Find images containing a green apple on a rough wooden table
expressed as:
[ P | P <- ImageDB, P.Rs(Apple,Table) = (touches,?,~meets) ]
Query:
Find images with Kate Winslett lying on a piece of wreckage in the
middle of the North Atlantic
expressed as:
[ P | P <- ImageDB, P.Rs(KateWinslett,Wreckage) = (overlaps,?,?)
& P.Rs(Wreckage,NorthAtlantic) = (inside,?,?) ]
Image Distance
Image graph distance is based on distances of component 2D-PIRs.
For two graphs
G1(V1, R1)
and
G2(V2, R2)
where
2D-PIR distance based on distances of component relationships.
For two 2D-PIRs
Rs,1 = (t1,xp1,yp1)
and
Rs,2 = (t2,xp2,yp2)
Spatial Relationship Distances
How to define e.g. D(before,overlaps)?
We base these distance measures on the observation of what
happens when we move two objects continuously together.
We treat relationships that occur adjacent in the time
sequence as being closer.
... Spatial Relationship Distances
Based on this idea, we define relationship neighbourhood graphs.
If one relationship R1(A,B) transforms directly into
another R2(A,B) by continuous movement of objects
A and B, then R1 and R2
are neighbours.
Relationship distance is defined as the number of arcs in
the neighbourhood graph between two relations.
Topological relationship neighbourhood graph:
Interval relationship neighbourhood graph:
Image (Graph) Similarity
Based on our definition of relationship distance,
max(D(Rs,1, Rs,2)) = 20.8
We define image similarity as normalised, inverse distance:
S=0 means totally different;
S=100 means identical
Static Image Retrieval Prototype
Image Retrieval Costs
Quantities involved:
n | number of objects in image |
N | distinct objects in DB |
K | number of images in DB |
Kq | number of images with Common>0 for query q |
Size of image graph: n2/2 2D-PIRs + n ObjectDescriptors
Cost in answering queries:
- matching objects
(lookup on relation of size N)
- finding images containing objects
(lookup on relation of size K)
- computing similarity for candidates
(computation of Kqn2 2D-PIR distances)
A Model for Video Retrieval
A video work is a sequence of scenes.
A scene (aka shot) is:
- a sequence of still images (frames)
- showing continuous action, from a single viewpoint
All access to the database occurs in "units" of scenes.
A video query:
- specifies properties of the required scenes
- returns a set of scene identifiers
The user is presented with a thumbnail of each matching scene,
and then performs further manual filtering (a la QBIC).
TS-2D-PIR Model for Video Scenes
Aim is analogous to that for 2D-PIR ...
Develop a concise structure for representing
spatiotemporal information in video scenes to allow efficient retrieval.
Assumes we have a pre-processed video:
- partitioned into scenes
- objects identified/segmented in each frame
- movement of single point in each object tracked
Not (yet) feasible for general video sources, but possible
for some limited domains
(e.g. video surveillance, satellite tracking, computer-generated animation).
Time
We model time via instants and intervals.
An interval is defined by start and end instants: I = (ts, tf)
Operations on intervals:
start(I) = ts
finish(I) = tf
duration(I) = tf - ts
I1 + I2 = (start(I1), finish(I2))
All scene-time is measured relative to start of scene.
If duration of scene is T, then scene interval
IS = (0,T)
Database may store absolute start time elsewhere.
Movement of Single Objects
Object movement is approximated by a sequence of linear movements.
Each movement represented as an iso-directional-interval
(d,r,I)
- I is a time interval
(during which object moves in constant direction)
- d is distance moved during I
(a real number)
- r is the direction moved
(one of N, NE, E, SE, S, SW, W, NW)
A track is:
[
(d1,r1,I1),
(d2,r2,I2),
...
(dn,rn,In)
]
where
finish(I1) = start(I2),
finish(I2) = start(I3),
...
(i.e. intervals are contiguous)
... Movement of Single Objects
Notes:
- assume that the object is "reasonably" rigid
(i.e. it consists of a single connected pointset
throughout the scene)
- this model only deals with translational motion
- simple motion short track;
complex motion long track
- minimum track length = 1;
maximum track length = #frames
Track Example
Track:
[
(d1, NW, I1),
(d2, N, I2),
(d3, E, I3),
(d4, NE, I4)
]
where
I1 = (t0,t1),
I2 = (t1,t2),
I3 = (t2,t3),
I4 = (t3,t4)
TS-2D-PIR
Tracks model individual objects;
we also need to model changing spatial relationships between objects
(i.e. spatiotemporal relationships)
Since spatial relationships typically persist over intervals,
we again use an interval-based data structure.
TS-2D-PIR = Temporal Sequence of 2D-PIR
A TS-2D-PIR Rst(A,B) is a sequence of (t,xp,yp,I), where
- Rs(A,B) = (t,xp,yp)
- I is the time interval over which Rs(A,B) holds
A TS-2D-PIR is also called an st-relationship.
TS-2D-PIR Example
Rst(B,G) =
[ (disjoint, ~starts, after, (t0,t1)),
(disjoint, ~finishes, after, (t1,t2)),
(disjoint, before, after, (t2,t3)) ]
Rst(B,R) =
[ (disjoint, ~meets, after, (t0,t1)),
(disjoint, ~overlaps, after, (t1,t2)),
(touches, equals, ~meets, (t2,t3)) ]
Rst(G,R) =
[ (touches, ~meets, meets, (t0,t1)),
(disjoint, ~meets , before, (t1,t2)),
(disjoint, after, before, (t2,t3)) ]
Scene Graphs
Our model for video scenes is an extension of image graphs
to incorporate temoporal notions.
A scene is modelled by a labelled digraph S(M,R), where
- M is a set of nodes labelled by (object,track) pairs
- R is a set of edges labelled by TS-2D-PIR relationships
(spatiotemporal relationship between objects in end-point nodes)
Scene Graph Example
M = {
(B, [(1.4,SW,(t0,t1)), (1,S,(t1,t2)), (0,?,(t2,t3))]),
(G, [(1.4,SE,(t0,t3))]),
(R, [(1,E,(t0,t3))])
}
R = {
((B,G),
[ (disjoint, ~starts, after, (t0,t1)),
(disjoint, ~finishes, after, (t1,t2)),
(disjoint, before, after, (t2,t3)) ]
),
((B,R),
[ (disjoint, ~meets, after, (t0,t1)),
(disjoint, ~overlaps, after, (t1,t2)),
(touches, equals, ~meets, (t2,t3)) ]
),
((G,R),
[ (touches, ~meets, meets, (t0,t1)),
(disjoint, ~meets , before, (t1,t2)),
(disjoint, after, before, (t2,t3)) ]
)
}
Scene Graph Construction
Algorithm: Scene graph construction
Input: A pre-processed video scene (objects are identified,
object positions, pointsets and projections
are available for each frame)
Output: A scene graph S(M,R) representing the scene
begin
(1) M = { }; R = { };
(2) for each frame F do
(3) t = time of frame F relative to start of scene
(4) for each object X in F do
(5) if X is not already in M then
(6) include new object descriptor (X,[]) in M
(7) else
(8) compute (d,r,I) from current and previous positions
(9) append (d,r,I) to track for X
(10) endif
(11) save current position as previous
(12) for each object Y greater than X in F do
(13) compute current Rst(X,Y)
(14) if there is no edge for (X,Y) in R then
(15) include new edge (X,Y,[]) in R
(16) save current Rst(X,Y) as existing Rst(X,Y)
(17) set time for startOfRst(X,Y) to t
(18) else
(19) if F is the last frame or
current Rst(X,Y) is different to existing Rst(X,Y) then
(20) compute (t,xp,yp,I) based on existing Rst(X,Y)
and interval from startOfRst(X,Y) until t
(21) append (t,xp,yp,I) to TS-2D-PIR for edge (X,Y)
(22) save current Rst(X,Y) as existing Rst(X,Y)
(23) set time for startOfRst(X,Y) to t
(24) endif
(25) endif
(26) endfor
(27) endfor
(28) endfor
end.
Scene Retrieval
As for image retrieval, there are two modes:
- query language with spatial/temporal/data operators
- sketch-based similarity retrieval (user-specified threshold)
Wider range of operators than for static image retrieval:
- extract information from within scenes (tracks, st-rels, ...)
- select scenes based on exact/inexact criteria (predicates)
- construct new scenes (filter/combine existing scenes)
Use OO notation for operators: Obj.Operator(Args)
For relevant operations, if Obj is a set (list) of objects,
then Operator is applied to each element in the set.
Trajectory Extraction
If S is a scene graph containing an object descriptor:
(A, [
(d1,r1,I1),
(d2,r2,I2), ...
(dn,rn,In) ])
then S.trajectory(A) is the sequence:
[
(d1,r1),
(d2,r2), ...
(dn,rn) ]
2D-PIR Extraction
If S is a scene graph containing:
Rst(A,B) = [
(t1,xp1,yp1,I1),
(t2,xp2,yp2,I2),...
(tn,xpn,ypn,In) ]
then S.st-seq(A,B) is the sequence:
[
(t1,xp1,yp1),
(t2,xp2,yp2),...
(tn,xpn,ypn) ]
Temporal Extraction
If S is a scene graph and
ST = [
(t1,xp1,yp1),
(t2,xp2,yp2),...
(tn,xpn,ypn) ]
is a 2D-PIR interval, then S.interval(A,B,ST) is an intveral
I = I1+I2+...+In
such that there is a subsequence
[
(t1,xp1,yp1,I1),
(t2,xp2,yp2,I2),...
(tn,xpn,ypn,In) ]
in Rst(A,B).
If no such sequence exists, the result is the special value Never
.
Selection and Predicates
Selection uses list-comprehension notation:
[ S | S <- DB, Predicate1, Predicate2, ... ]
Assume that standard relational tests are available.
The notation Objs(S) denotes the set of objects in a scene.
The predicate has is defined as: S.has(X) = X Objs(S)
Spatial Matching
Let ST be a 2D-PIR sequence of the form
[
(t1,xp1,yp1),
(t2,xp2,yp2),...
(tn,xpn,ypn) ]
Let P be a 2D-PIR pattern:
- a 2D-PIR sequence is a pattern, matching exactly
- the symbol
?
matches any single component of
(ti,xpi,ypi)
- the symbol
..
matches any 2D-PIR sequence (shortest if multiple matches)
Let L* be the set of all subsequences of ST.
The spatial matching predicate is defined as:
Scene Construction Operators
Spatiotemporal projection:
S.project(X1,X2,...,I)
Yields a new version of S containing only the specified
objects and only information for the specified time interval.
Special cases:
- S.project(Objs(S),I) gives a clip from a scene
- S.project(objs...,IS) eliminates some objects
Overlay:
S1.overlay(S2, ts)
Combines scene graphs, with all S2 motion starting
at time ts after the start of S1.
Example: Animal Tracking
Assume we have a fixed camera watching a region of country
and animals tagged with GPS/radio-transmitters.
Monitor the region for several years, treating each day
as a scene.
Animal movement is easy to track automatically.
Each animal is identified uniquely by its transmitter.
Assume image is oriented so that top=N, left=E, bottom=S, right=W.
This kind of data might be useful to biologists to
monitor migration, breeding, ...
Example Animal Query #1
Query:
Show the migration path of a specific animal named Elephant.
Strategy:
- find the scenes showing this particular animal
- extract the trajectory from each scene and display
Expressed as:
[ S | S <- AnimalDB, S.has(Elephant) ] . trajectory(Elephant)
Example Animal Query #2
Query:
How long is animal A inside region R?
Strategy:
- find the scenes where A is inside R
- extract the interval for which this is true
- compute the length of the interval
Expressed as:
AinsideR = [ S | S <- AnimalDB, S.interval(A,R,[(~contains,?,?)]) != Never ]
AinsideR.interval(A,R,[(~contains,?,?)]).duration
Example Animal Query #3
Query:
Find scenes where animal A enters region R from
the west side and leaves from the north side.
Strategy:
- define the notion of entering a region from west and leaving from north
- extract the 2D-PIR sequence for A and R
- match it against the defined movement
Expressed as:
EnterWest = [ (disjoint,before,?), (touches,meets,during),
(overlaps,overlaps,during), (~contains,during,during) ]
ExitNorth = [ (~contains,during,during), (overlaps,during,~overlaps),
(touches,during,~meets), (disjoint,during,after) ]
[ S | S <- AnimalDB, S.st-seq(A,R).matches(EnterWest + ExitNorth) ]
Would be simpler to do as sketch query.
Example Animal Query #4
Query:
Find scenes where some animal passes through region R.
Strategy:
- define operation for "passes through"
- select scenes that contain at least one animal doing this
Expressed as:
PassThrough = [ (disjoint,?,?) .. (~contains,?,?) .. (disjoint,?,?) ]
[ S | S <- AnimalDB,
exists [ A | A <- Objs(S), S.st-seq(A,R).matches(PassThrough) ]
]
Scene Similarity
Scene similarity is based on combination of distances
between scene components (tracks, TS-2D-PIRs).
For two scene graphs
S1(M1,R1) and
S2(M2,R2),
where
wt and ws reflect weighting to
be given to trajectories and 2D-PIR sequences respectively.
Track Similarity
Requires measuring distance between two trajectories for possible
"alignments" and choosing minimum.
Trajectory distance was tested by:
- creating 80 widely-varied trajectories
- choosing 10 of these as "queries"
- asking 10 users to give similarity for 79 against each query
- measuring distances using distance function
- comparing measured distance rank with user-assigned distance rank
Found: reasonable precision/recall performance for threshold 70%
Video Scene Retrieval Prototype
Conclusions
Have developed:
- model for spatiotemporal relationships in video data
- similarity/exact retrieval process based on this model
- prototype implementation for animated video sequences
Still to do:
- devise efficient access methods
- combine with other image/scene-matching techniques
- ...
References/Further Reading
-
J.F.Allen,
Maintaining Knowledge about Temporal Intervals,
Communications of the ACM, Nov 1983
[original description of interval relationships; in this case the context
was temporal, rather than spatial, intervals]
-
S.K.Chang, Q.Y.Shi, C.W.Yan,
Iconic Indexing by 2D Strings,
IEEE Transactions on Pattern Recognition and Machine Intelligence, 1987
[description of method to represent static images by using relationships
between object projections on axes]
-
M.J.Egenhofer
Point-set Topological Spatial Relations,
International Journal on Geographic Information Systems, 1991
[original description of point-set topological relations between image
regions, which forms the basis for the topolgical component of 2D-PIRs]
-
M.Nabil, A.Ngu, J.Shepherd
Picture Similarity Retrieval using the 2D Projection Interval Representation,
IEEE Trancations on Knowledge and Data Engineering, 1996
[detailed description of 2D-PIR-based moving object model from this talk]
-
M.Nabil, A.Ngu, J.Shepherd
Modelling and Retrieving Video via Moving Objects,
Journal of Multimedia Technology and Applications (to appear)
[detailed description of 2D-PIR-based moving object model from this talk]
Produced: 22 May 98