Assignment 5

XPath to SQL Translation
Due Date: June 3rd, 2009



Update: (May 29th)

For a starting point, here is a sample solution to Assignment 3.

In this assignment we want to translate XPath queries that use /, //, preceding, following-sibling, and filters ([..]) to SQL queries.
In a filter there is only one path expression which starts with "." or with "/".
Node tests are restricted to either star ("*") or one letter (e.g., "a").
Your program will recieve as input the name of an XML file, and the XPath expression (written in quotes "/a//b/...").
The program should generate as output a sequence of SQL INSERT-statements, followed by one SQL query.
The INSERT-statements generate a table representation of the XML file. For instance, you can generate
a PRE/POST-table as in Assignment 3. However, it is probably more convenient to additionally generate
a "LEVEL" row, which holds for each node its depth in the XML tree. In this way, the child-axis is much easier:
simply pick all descendants which are at level+1.

Note that we do not care about attributes here. So you do not need to represent them in a table,
but may simply ignore them.

If your program works for // and preceding, then this is worth 4 Points.
If additionally / and following-sibling is supported then that is worth 4 Points.
Finally, if additionally filters are supported then that is worth another 4 Points.

Bonus If your program supports the attribute axis ("@"), then this is worth additional 2 Bonus Points. Note that
the attribute axis may only be used in a filter, and may appear either as existential test such as //a[./@b],
or with an equality test (as in the examples below).

As an example, consider the small "book.xml" document from Assignment 1:

<book isbn="1-2345-6789-0" year="1994">
<title>TCP/IP Illustrated</title>
<author><last>Stevens</last><first>John</first></author>
<publisher>Addison-Wesley</publisher>
<price currency="USD">65.95</price>
</book>

As with Assignment 3, you should generate a table with name z1234567_book_tbl, where the number after the "z" is your student ID.
Don't forget to insert the additional root node with PRE-value 0.

Your program should behave as follows (assuming the executable is named "XPath2Rel"):

> XPath2Rel book.xml "//author"
CREATE TABLE z1234567_book_tbl (
  pre INTEGER PRIMARY KEY,
  post INTEGER,
  level INTEGER,
  tag VARCHAR(256),
  text VARCHAR(1000)
);

CREATE TABLE z1234567_book_attr (
  pre INTEGER REFERENCES z1234567_book_tbl (pre),
  attr VARCHAR(256),
  value VARCHAR(1000)
);

INSERT INTO z1234567_book_tbl (pre, post, level, tag, text)
VALUES (0, 13, 0, null, null);

INSERT INTO z1234567_book_tbl (pre, post, level, tag, text)
VALUES (1, 12, 1, "book", null);

INSERT INTO z1234567_book_tbl (pre, post, level, tag, text)
VALUES (2, 2, 2, "title", null);

INSERT INTO z1234567_book_tbl (pre, post, level, tag, text)
VALUES (3, 1, 3, null, "TCP/IP Illustrated");

INSERT INTO z1234567_book_tbl (pre, post, level, tag, text)
VALUES (4, 7, 2, "author", null);

INSERT INTO z1234567_book_tbl (pre, post, level, tag, text)
VALUES (5, 4, 3, "last", null);

INSERT INTO z1234567_book_tbl (pre, post, level, tag, text)
VALUES (6, 3, 4, null, "Stevens");

INSERT INTO z1234567_book_tbl (pre, post, level, tag, text)
VALUES (7, 6, 3, "first", null);

INSERT INTO z1234567_book_tbl (pre, post, level, tag, text)
VALUES (8, 5, 4, null, "John");

INSERT INTO z1234567_book_tbl (pre, post, level, tag, text)
VALUES (9, 9, 2, "publisher", null);

INSERT INTO z1234567_book_tbl (pre, post, level, tag, text)
VALUES (10, 8, 3, null, "Addison-Wesley");

INSERT INTO z1234567_book_tbl (pre, post, level, tag, text)
VALUES (11, 11, 2, "price", null);

INSERT INTO z1234567_book_tbl (pre, post, level, tag, text)
VALUES (12, 10, 3, null, "65.95");

INSERT INTO z1234567_book_attr (pre, attr, value)
VALUES (1, "isbn", "1-2345-6789-0");

INSERT INTO z1234567_book_attr (pre, attr, value)
VALUES (1, "year", "1994");

INSERT INTO z1234567_book_attr (pre, attr, value)
VALUES (11, "currency", "USD");

SELECT DISTINCT r2.pre FROM z1234567_book_tbl r1, z1234567_book_tbl r2
WHERE r1.pre=0
AND r2.pre>r1.pre
AND r2.post<r1.post
AND r2.tag="author"
ORDER BY r2.pre;

DROP TABLE z1234567_book_attr;

DROP TABLE z1234567_book_tbl;

You can try out your outputs by running them inside of mysql. For instance, if you put your output in "test.sql", then on a CSE machine:
mysql -h cs4317.srvr -u stu1 -pstu1pass studentdb1 < test.sql
For the above example this will produce "pre 4", which is the correct query answer.
Use the following logins for mysql on CSE machines:

server: cs4317.srvr
database: studentdb1
user: stu1
password: stu1pass

More Examples

Note that the wildcard ("*") only selects element nodes, and NOT text nodes.
Thus, you have to explicitly check in the query that nodes slected by "*" are element nodes.
For instance, if we translate the query "//*" over the document of above, then we should obtain this query:
SELECT DISTINCT r2.pre FROM book_tbl r1, book_tbl r2
WHERE r1.pre=0
AND r2.pre>r1.pre
AND r2.post<r1.post
AND r2.tag IS NOT NULL
ORDER BY r2.pre;
It is not important that you produce exactly this query, but when executed then the query should produce the correct pre-values.
For instance, executing the above query gives:

> XPath2Rel book.xml "//*" | mysql -h cs4317.srvr -u stu1 -pstu1pass studentdb1
pre
1
2
4
5
7
9
11
>

Here is an example that uses // and preceding. Remember, you are allowed to generated SQL queries different from the one below, as long as the results are the same!

XPath: "//a//b//preceding::*//d//preceding::e"
SELECT DISTINCT r8.pre
FROM z12345678_book_tbl r1, z12345678_book_tbl r2, z12345678_book_tbl r3, z12345678_book_tbl r4, z12345678_book_tbl r5, z12345678_book_tbl r6, z12345678_book_tbl r7, z12345678_book_tbl r8
WHERE r1.pre = 0
AND r2.pre > r1.pre AND r2.post < r1.post AND r2.tag = "a"
AND r3.pre > r2.pre AND r3.post < r2.post AND r3.tag = "b"
AND r4.pre >= r3.pre AND r4.post <= r3.post
AND r5.pre < r4.pre AND r5.post < r4.post AND r5.tag IS NOT NULL
AND r6.pre > r5.pre AND r6.post < r5.post AND r6.tag = "d"
AND r7.pre >= r6.pre AND r6.post <= r7.post
AND r8.pre < r7.pre AND r8.post < r7.post AND r8.tag = "e"
ORDERED BY r8.pre;

Here are some examples on book.xml document; recall, however, that your program need only
support 1-letter node tests, that is, we will not use queries as below which are using longer element names.

XPath: "//book/author[./first]"
SELECT DISTINCT r3.pre
FROM z12345678_book_tbl r1, z12345678_book_tbl r2, z12345678_book_tbl r3
WHERE r1.pre = 0
AND r2.pre > r1.pre AND r2.post < r1.post AND r2.tag = "book"
AND r3.pre > r2.pre AND r3.post < r2.post AND r3.level = r2.level+1 AND r3.tag = "author"
AND EXISTS
(SELECT DISTINCT sr1.pre
FROM z12345678_book_tbl sr1
WHERE sr1.pre > r3.pre AND sr1.post < r3.post AND sr1.level = r3.level+1 AND sr1.tag = "first")
ORDER BY r3.pre;

XPath: "//*[/book/price]"
SELECT DISTINCT r2.pre
FROM z12345678_book_tbl r1, z12345678_book_tbl r2
WHERE r1.pre = 0
AND r2.pre > r1.pre AND r2.post < r1.post AND r2.tag IS NOT NULL
AND EXISTS
(SELECT DISTINCT sr2.pre
FROM z12345678_book_tbl sr1, z12345678_book_tbl sr2
WHERE sr1.pre > r1.pre AND sr1.post < r1.post AND sr1.level = r1.level+1 AND sr1.tag = "book"
AND sr2.pre > sr1.pre AND sr2.post < sr1.post AND sr2.level = sr1.level+1 AND sr2.tag = "price")
ORDER BY r2.pre;

XPath: "//*/text()"
SELECT DISTINCT r3.pre
FROM z12345678_book_tbl r1, z12345678_book_tbl r2, z12345678_book_tbl r3
WHERE r1.pre = 0
AND r2.pre > r1.pre AND r2.post < r1.post AND r2.tag IS NOT NULL
AND r3.pre > r2.pre AND r3.post < r2.post AND r3.level = r2.level+1 AND r3.tag IS NULL
ORDER BY r3.pre;

XPath: "//*[@currency='USD']/text()"
SELECT DISTINCT r3.pre
FROM z12345678_book_tbl r1, z12345678_book_tbl r2, z12345678_book_tbl r3
WHERE r1.pre = 0
AND r2.pre > r1.pre AND r2.post < r1.post AND r2.tag IS NOT NULL
AND EXISTS
(SELECT DISTINCT sr1.pre
FROM z12345678_book_attr sr1
WHERE sr1.pre = r2.pre AND sr1.attr = "currency" AND sr1.value = 'USD')
AND r3.pre > r2.pre AND r3.post < r2.post AND r3.level = r2.level+1 AND r3.tag IS NULL
ORDER BY r3.pre;

XPath: "//*[@currency]"
SELECT DISTINCT r2.pre
FROM z12345678_book_tbl r1, z12345678_book_tbl r2
WHERE r1.pre = 0
AND r2.pre > r1.pre AND r2.post < r1.post AND r2.tag IS NOT NULL
AND EXISTS
(SELECT DISTINCT sr1.pre
FROM z12345678_book_attr sr1
WHERE sr1.pre = r2.pre AND sr1.attr = "currency")
ORDER BY r2.pre;
CRICOS Provider Number: 00098G