<a> <b><c/><d/></b> <b><d/><c/></b> <b><d/><c/></b> </a>
tree2dag
is your executable and
"tiny.xml" is the above XML document, then
running
tree2dag -s tiny.xml
should generate exactly this output:
Tree nodes: 10
DAG nodes: 5
DAG edges: 7
Height: 3
Number of labels: 4
Max. sharing: 3 (node 1)
Max. size of sharing: 3 (node 4)
The "Max. sharing" means the maximal number of times that a node is
referenced. This number is not obtained
by simply counting the number of times that "1" appears
within a bracket [..] in the DAG table below! Instead, you need
to take care of the fact that node 4 is referenced twice, and
node 4 itself references node 1; thus, node 1 is referenced three
times: once by node 3, and twice through node 4.
If there are several nodes with this property, then take the one
with the lowest node number (in the example: both nodes 1 and 2
are shared 3 times, but node "1" has the lower node number and
therefore is printed).
b[1,2]
to mean
a b-node with first child at position 1 and second child
at position 2 (which are the c- and d-nodes)
You should print the following, when using "-p" option.
tree2dag -p tiny.xml
should generate exactly this output:
1:c
2:d
3:b[1,2]
4:b[2,1]
5:a[3,4,4]
As discussed in the lecture, your program should build the minimal
unique DAG during SAX parsing,
java.util.Hashtable
.
<a> <b><c/><c/><d/><d/><d/></b> <b><d/><c/></b> <b><d/><c/></b> <b><d/><c/></b> <b><d/><c/></b> <c/> </a>
b[1,2]
into
b[1:2,2:3]
.
The latter is a shorthand notation for
b[1,1,2,2,2]
.
The 2nd, 3rd, 4th, and 5th subtrees of the root a-node
are all identical (namely, the tree at pointer "4").
Hence, we get 4:4
at the second subtree
position of the a-node; such a "multi-edge"
is counted as one edge only (it is a labeled edge now,
by its multiplicity counter).
When using the option "-mp", you
should print the following table for the DAG with multiplicities.
tree2dag -mp tiny02.xml
should generate exactly this output:
1:c
2:d
3:b[1:2,2:3]
4:b[2,1]
5:a[3,4:4,1]
When using the option "-ms" , you should print
the following statistics about the DAG with multiplicities;
tree2dag -ms tiny02.xml
should generate exactly this output:
Tree nodes: 20
DAG nodes: 5
DAG edges: 7
Height: 3
Number of labels: 4
Multiplicities: 3
Max. Multiplicity: 4
Sharings wo Multiplicities: 3
Here, "Sharings wo Multiplicities" means the number of references
to shared nodes that have multiplicity 1, i.e., count how many times
a shared node appears within brackets [..] in the DAG table,
but is not followed by a multiplicity ":x" (x is always >2).
In the example, node 1 is a shared node and is referenced by
nodes 5 and 4, and node 2 is a shared node which is referenced by
node 4, thus in total 3 (all other shared nodes are with
multiplicities).
1:c
2:b[1]
3:a[2:2]
Here the node 1:c is shared since it is referenced by node 2
which is referenced twice in node 3. A simple way to see if a node is shared
is to maintain a counter for each node in your hash table. When you
introduce a new node (say c with ID 1), you create a counter to 1. And every time
you see a node, you increment its counter, so when you see c[] for the second time,
the counter becomes two. Now to display the "Sharing wo Multiplicities" you can apply the
simple algorithm:
for each row in the table
for each number i between brackets in the row
if i doesn't have a multiplicity and its counter is >=2
increment Sharing Wo Multiplicities.
<a> <b><c/><d/></b> <b><d/><c/></b> <b><d/><c/></b> </a>
<a><b><c><_/><d><_/><_/></d></c>
<b><d><_/><c><_/><_/></c></d>
<b><d><_/><c><_/><_/></c></d><_/>
</b></b></b><_/></a>
Here is a pictorial representation of this binary tree
(where two times a left _-child of a d-node was skipped)
a / \ b _ / \ c b / \ / \ _ d d b /\ \ /\ _ _ c d _ /\ \ _ _ c /\ _ _
1:_
2:d[1,1]
3:c[1,2]
4:c[1,1]
5:d[1,4]
6:b[5,1]
7:b[5,6]
8:b[3,7]
9:a[8,1]
which has the following statistics:
Tree nodes: 10
Binary nodes: 21
DAG nodes: 9
DAG edges: 16
Height: 7
Number of labels: 4
Max. sharing: 11 (node 1)
Max. size of sharing: 5 (node 5)
d[1,1]
as
d[1:2]
.
6:b[5,1]
7:b[5,6]
8:b[3,7]
becomes, when using multiplicities:
6:b[5,1]
8:b[3,6:2]
Hence, this is the correct DAG with multiplicities that you
should print.
tree2dag -bp tiny.xml
should generate exactly this output:
1:_
2:d[1,1]
3:c[1,2]
4:c[1,1]
5:d[1,4]
6:b[5,1]
7:b[3,6:2]
8:a[7,1]
and the "-bs" option should generate the corresponding
statistics. Thus, running
tree2dag -bs tiny.xml
should generate exactly this output:
Tree nodes: 10
Binary nodes: 21
DAG nodes: 8
DAG edges: 14
Height: 7
Number of labels: 4
Max. sharing: 11 (node 1)
Max. size of sharing: 7 (node 6)
Multiplicities: 1
Max. Multiplicity: 2
Remark The number of labels is 4, be careful not to count "_" as an actual label, since it is not present in the original document.
As another example consider the following XML document (named "tiny03.xml")<a> <b><c/></b> <b><c/></b> <b><c/></b> <b><c/></b> <b><c/></b> <b><c/></b> <b><c/></b> <b><c/></b> </a> The corresponding binary tree encoding is: a / \ b _ / \ c . /\ . _ _ b / \ c _ /\ _ _
tree2dag -bp file.xml
1:_
2:c[1,1]
3:b[2,1]
4:a[3:8,1]
and these are the statistics you should get
when running tree2dag -bs file.xml
Tree nodes: 17
Binary nodes: 35
DAG nodes: 4
DAG edges: 6
Height: 11
Number of labels: 3
Max. sharing: 18 (node 1)
Max. size of sharing: 5 (node 3)
Multiplicities: 1
Max. Multiplicity: 8
wagner % java tree2dag -s 1998statistics.xml Tree nodes: 28306 DAG nodes: 82 DAG edges: 1377 Height: 6 Number of labels: 46 Max. sharing: 1226 (node 6) Max. size of sharing: 23 (node 28) wagner % java tree2dag -ms 1998statistics.xml Tree nodes: 28306 DAG nodes: 82 DAG edges: 726 Height: 6 Number of labels: 46 Multiplicities: 305 Max. Multiplicity: 9 Sharings wo Multiplicities: 382 wagner % java tree2dag -bs 1998statistics.xml Tree nodes: 28306 Binary nodes: 56613 DAG nodes: 654 DAG edges: 1306 Height: 83 Number of labels: 46 Max. sharing: 28307 (node 1) Max. size of sharing: 2071 (node 502) Multiplicities: 289 Max. Multiplicity: 9 wagner %
wagner % java tree2dag -s s200.xml Tree nodes: 209 DAG nodes: 24 DAG edges: 208 Height: 7 Number of labels: 3 Max. sharing: 73 (node 3) Max. size of sharing: 1 (node 1) wagner % java tree2dag -ms s200.xml Tree nodes: 209 DAG nodes: 24 DAG edges: 146 Height: 7 Number of labels: 3 Multiplicities: 41 Max. Multiplicity: 5 Sharings wo Multiplicities: 85 wagner % java tree2dag -bs s200.xml Tree nodes: 209 Binary nodes: 419 DAG nodes: 129 DAG edges: 256 Height: 37 Number of labels: 3 Max. sharing: 210 (node 1) Max. size of sharing: 231 (node 123) Multiplicities: 41 Max. Multiplicity: 5 wagner % java tree2dag -p s200.xml 1:b 2:a 3:c 4:a[1,2,1,3,3,2,3,3,2,2,1,3,1,3,3,3] 5:b[1,1,2,2,2,2,1,2,1,1,3,2,4,1,2,3,3,1,3,3] 6:c[3,2,1,3,2,3,3] 7:c[1,2,1,3,3,1] 8:c[5,6,7,2,3,2,2,2,1,2,2,2,2,2,1,1,3,2,2,2] 9:a[1,1,8] 10:a[2] 11:c[9,1,1,10] 12:b[3,1] 13:c[2,2,1,3,2,12,3,2] 14:b[1,1,1,3,3,2,3,1,1,2] 15:b[2,3,3,2,3,3,2,3,3,3] 16:b[3,3,14,1,2,1,2,1,3,3,1,1,15,2,2,3,2] 17:a[3] 18:a[1,3,2,2,2,1,3,2] 19:b[3,1,2,3,2,3,17,3,3,2,18,3,2,3,2,2,2,1] 20:b[2,3,19,2,1,1,3] 21:a[1,2,3,2,1,3,3,3,1,1,1,3,3,2,2,2,1,1,1] 22:b[2,21,3,1,3,3,3,3] 23:a[16,20,22,3,1,2,3,3,3,3,2,3,3,3,2] 24:b[11,2,3,13,2,3,3,23] wagner % java tree2dag -mp s200.xml 1:b 2:a 3:c 4:a[1,2,1,3:2,2,3:2,2:2,1,3,1,3:3] 5:b[1:2,2:4,1,2,1:2,3,2,4,1,2,3:2,1,3:2] 6:c[3,2,1,3,2,3:2] 7:c[1,2,1,3:2,1] 8:c[5,6,7,2,3,2:3,1,2:5,1:2,3,2:3] 9:a[1:2,8] 10:a[2] 11:c[9,1:2,10] 12:b[3,1] 13:c[2:2,1,3,2,12,3,2] 14:b[1:3,3:2,2,3,1:2,2] 15:b[2,3:2,2,3:2,2,3:3] 16:b[3:2,14,1,2,1,2,1,3:2,1:2,15,2:2,3,2] 17:a[3] 18:a[1,3,2:3,1,3,2] 19:b[3,1,2,3,2,3,17,3:2,2,18,3,2,3,2:3,1] 20:b[2,3,19,2,1:2,3] 21:a[1,2,3,2,1,3:3,1:3,3:2,2:3,1:3] 22:b[2,21,3,1,3:4] 23:a[16,20,22,3,1,2,3:4,2,3:3,2] 24:b[11,2,3,13,2,3:2,23] wagner % java tree2dag -bp s200.xml 1:_ 2:c[1,1] 3:b[1,2:3] 4:c[1,3] 5:b[1,4] 6:a[1,5] 7:c[1,6:2] 8:a[1,7:2] 9:c[1,8] 10:b[1,9:2] 11:a[1,10] 12:b[1,11] 13:b[1,2:2] 14:c[1,13] 15:a[1,14:2] 16:b[1,15] 17:a[12,16] 18:a[1,17] 19:c[1,18] 20:b[1,19] 21:a[1,20:2] 22:b[1,21] 23:a[1,22] 24:b[1,23:4] 25:a[1,2:2] 26:c[1,25] 27:b[1,26] 28:a[1,27] 29:c[1,28] 30:b[1,1] 31:c[1,30] 32:b[1,31:2] 33:a[1,32] 34:b[1,33] 35:a[1,1] 36:c[1,35:3] 37:b[1,36] 38:a[1,37:2] 39:b[1,38:5] 40:a[1,39] 41:c[1,40:3] 42:a[1,41] 43:c[34,42] 44:c[29,43] 45:b[24:2,44] 46:c[45,1] 47:b[1,46] 48:a[35,1] 49:b[1,48] 50:a[47:2,49:2] 51:c[1,35] 52:b[31,51] 53:a[1,52] 54:c[1,53] 55:b[1,54] 56:a[1,55] 57:b[1,35] 58:c[1,57:2] 59:a[1,58] 60:c[1,59] 61:b[1,60:2] 62:a[1,2:3] 63:c[1,62] 64:a[1,63:2] 65:c[1,64] 66:a[1,65:2] 67:a[1,51] 68:b[66,67:2] 69:b[1,68] 70:c[1,69:2] 71:b[1,70:2] 72:a[1,71] 73:b[1,72] 74:a[1,73] 75:b[1,74] 76:b[61:3,75] 77:c[1,76] 78:b[1,51] 79:a[1,78] 80:c[1,79:3] 81:b[1,80] 82:a[1,30] 83:c[1,82:3] 84:a[1,83] 85:c[1,84] 86:a[81,85] 87:a[1,86] 88:c[1,87] 89:a[2,88:2] 90:c[1,89] 91:a[1,90] 92:c[1,91] 93:a[1,92] 94:b[1,93] 95:c[1,94] 96:b[1,2] 97:a[1,96:2] 98:b[95,97] 99:c[1,98] 100:a[1,99] 101:a[1,30:3] 102:c[1,101:3] 103:b[1,102:2] 104:c[1,103:3] 105:b[1,104:3] 106:a[1,105] 107:c[1,106] 108:a[1,107] 109:b[1,108] 110:b[1,2:4] 111:c[1,110] 112:a[109,111] 113:a[1,112] 114:a[1,51:3] 115:c[1,114] 116:a[1,115:4] 117:b[1,116] 118:c[1,117] 119:b[113,118] 120:b[100,119] 121:b[77:2,120] 122:a[121,1] 123:c[1,122] 124:a[1,123:2] 125:c[56:2,124] 126:c[1,125] 127:a[1,126] 128:c[50,127] 129:b[128,1]