Relational Extensions
Index
- Introduction
- Objectives
- 1.The Object-Relational Extension
- 1.1.Background
- 1.2.The Object-Relational Model
- 1.2.1.User Defined Types (UDTs)
- 1.2.2.References between Objects
- 1.2.3.Collections
- 2.The XML Extension
- 2.1.XML Fundamentals
- 2.1.1.Well-formed XML
- 2.1.2.Namespaces
- 2.1.3.Full XML Example
- 2.1.4.Storing XML Documents in Oracle XML DB
- 2.2.XML Schema
- 2.2.1.Basic Concepts
- 2.2.2.XML Schema Root
- 2.2.3.Complex Types
- 2.2.4.Example
- 2.2.5.Simple Types
- 2.2.6.Registering an XML Schema in Oracle XML DB
- 2.3.XQuery
- 2.3.1.XPath
- 2.3.2.Queries
- 2.3.3.Comments
- 2.3.4.XQuery in Oracle XML DB
- 2.1.XML Fundamentals
- Summary
- Self-evaluation
- Answer key
- Glossary
- Bibliography
Introduction
-
Data must be organized in tables and rows.
-
Relationships between tables must be expressed with the referential integrity constraint (foreign key – candidate key relationships).
Objectives
-
Explain the historical background / needs behind object data models (ODMs) for databases.
-
Enumerate the main features an object data model must provide; similarly, name the main standard object-oriented features from SQL:1999.
-
Discuss about advantages and disavantages of using a purely relational or a purely object-oriented approach. For example, elaborate on the dichotomy attributes vs. methods, OIDs vs. PKs, REFs vs. FKs, etc.
-
Discuss the object-oriented layer implementation in Oracle, with regard to the underlying relational technology.
-
Formulate simple (i.e., basic syntax), correct SQL statements (in Oracle syntax) for the following standard object-oriented features: Row type, UDTs, inheritance, object tables, REFs and collections.
-
Justify the suitability to store objects in columns, object tables or use object views, for Oracle.
-
For a given specification, justify the suitability (at least, four reasons) of using VARRAYs instead of NESTED TABLEs for modeling multi-valued attributes (i.e., collections) in Oracle.
-
Understand the fundamentals of XML, its syntax, its underlying structure and the namespaces mechanism that avoids naming clashes.
-
Know the way to add structure to XML documents using XML Schemas. They make possible to model a domain using XML Schemas and use them to constraint the way XML documents are created to capture data for that domain.
-
Create XQuery requests to retrieve data from XML documents and also to generate output XML documents meeting the requirements for those queries.
-
Comprehend the different clauses, functions and operators that constitute a XQuery, especially XPath to build paths across XML documents to select the relevant parts from them to the XQuery at hand.
-
Be aware of how XML data is stored in a particular database, Oracle, and how XML Schema and XQuery can be used in the context of that particular database.
1.The Object-Relational Extension
1.1.Background
1.1.1.Object-Oriented Database Systems
The Object-Oriented Database System Manifesto:
M. Atkinson, F. Bancilhon, D. DeWitt, K. Dittrich, D. Maier, and S. Zdonik (December 1989). “The Object-Oriented Database System Manifesto”. In Proceedings of the First International Conference on Deductive and Object-Oriented Databases (pages 223-240). Kyoto, Japan.
-
The Golden Rules, or core features, which were divided in two categories: those coming from the object-oriented data model and those from the DBMS field,
-
Optional features, which advocated for advisable characteristics and
-
Open issues, which embraced open problems not solved back then.
-
Golden Rules: These rules define an OODBS. A system that integrates the traditional database management with the object-oriented data model.
-
OO Data Model: Data must be internally organized according to the object-oriented data model; i.e., objects as means to represent data and universally identified by an object identifier (OID). Encapsulation, in terms of functions, is mandatory as well as traditional object-oriented features such as inheritance and extensibility, overriding / overloading operators and late binding. All in all, providing computational completeness (i.e., any algorithm can be implemented following this paradigm).
-
DBMS: A database management system should provide data persistence and therefore, it must deal with issues such as secondary storage management, concurrency, recovery and ad hoc query facilities.
-
-
Optional features: These features are considered to be appealing, but the manifesto acknowledges the difficulty to provide them so they are categorized as optional. It includes multiple inheritance, type checking and type inferring, data distribution (in the sense of distributed database management systems), transactions and versioning (any modification and not only the last version must be stored).
-
Finally, open features refer to those features still representing a challenge for the community, mainly because of the lack of standards. For example, several object-oriented languages co-existed or were developed back then. The manifesto claimed that it was not possible to deploy a different OODBS for each existing high-level language. Thus, some agreement should be reached to provide uniformity. In the manifesto, they identify three main issues on which to agree; a programming paradigm, an internal representation system and a type system. These open issues were never solved, however, as discussed in section 1.1.4.
The Golden Rules |
Optional |
Open |
|
---|---|---|---|
OO Data Model |
DBMS |
||
Complex Objects |
Persistence |
Multiple Inheritance |
Programming Paradigm |
Object Identity |
Secondary Storage Management |
Type Checking and Inferring |
Representation System |
Encapsulation |
Concurrency |
Distribution |
Type System |
Types / Classes |
Recovery |
Design Transactions |
Uniformity |
Inheritance |
Ad hoc Query Facility |
Versions |
|
Overriding, Overloading, Late Binding |
|||
Extensibility |
|||
Computational Completeness |
1.1.2.Object-Relational Database Systems
-
Traditional DBMS services and support for richer object structures and rules (i.e., integrity constraints),
-
Must subsume second generation DBMSs and
-
Must be open to other subsystems.
Object and Rule Management |
DBMS Function |
Towards an Open System |
---|---|---|
Rich Type System |
Non-procedural, high-level access language: SQL |
Accessible from multiple high-level languages |
Multiple Inheritance |
Suppport collections: enumeration of members or using the query language |
Enhancement of DBMS – programming language interfaces |
Encapsulation |
Queries and answers as the lowest level of communication (client / server) |
|
OIDs and PKs |
Updatable views |
|
Rules Enforcement |
Data independence |
-
Object and rule management: These features extend the relational model to deal with objects and rules (i.e., more expressive integrity constraints). Basically, they combine relational and object-oriented features. For example, they mainly talk about rich types, although objects are also supported and consequently, encapsulation. Note, however, that multiple inheritance is mandatory and, interestingly, object identifiers (OIDs) are allowed as an alternative to primary keys. Rule management was dealt as a first-class citizen in order to deal with richer semantics. Accordingly, additional means (to those already provided by the relational model) to enforce constraints are identified as mandatory (this point was the seed for triggers and procedures, which were not available for relational databases at that time).
-
DBMS Function: This set of features extends those principles upon which relational database systems were built. From the database point of view, a non-procedural high-level language should be available to access the database (i.e., SQL) instead of a procedural one (as claimed in the OODBSs manifesto). For better or worse, they say, a non-procedural high-level access language is needed, and cannot be replaced by object-oriented programming languages. Their ultimate objective behind this statement was preserving data independence. However, back then, SQL was heavily criticized for not being computationally complete. In this sense, the ORDBS manifesto proposed additional features to complement SQL; namely: updatable views (in their own words, dynamic views) and collections (to deal with procedural arrays).
-
Towards an open system: By an open system they meant that any application (i.e., high-level languages) should be able to access the database, but always through programming interfaces (this was the seed for what nowadays is known as high-level language connectors; e.g., ODBC or JDBC). SQL remains as the only access language and, again, they recall us that high-level languages should not directly access the physical structures of the database (to preserve data independence). Instead, they should query the database and receive an answer by means of SQL.
1.1.3.Object-Oriented Data Model vs. Object-Relational Data Model
-
Complex Objects, Type Classes, Extensibility etc. vs. Rich Type System: Concepts from the ORDM parallel those in OODBs, with the notions of user-defined data types, object tables formed from user-defined types, hierarchies of user-defined types and object tables, rows of object tables with internal object identifiers, and relationships between object tables that use object identifiers as references. However, in a object-relational database all these concepts are finally mapped to the relational structure. Thus, they are introduced as a relational extension.
-
Inheritance vs. Multiple Inheritance: Inheritance is essential for the OODM, but the ORDM goes even further by asking for multiple inheritance, whereas supporting multiple inheritance is not mandatory for OODBS (see section 1.1.1).
-
Encapsulation: Both models fully agree on the need for encapsulation, which refers to the fact that class methods must also be stored in the database, altogether with the object itself.
-
Object Identity (OID) vs. OIDs and PKs: Whereas the OODM claims for the necessity of OIDs and refer to them by means of object pointers, the ORDM claims for the co-existence of OIDs and object references with primary keys and foreign keys. It must be up to the database designer to decide which mechanism better suits their necessities.
-
DBMS main features (persistence, concurrency, recovery etc.): There is no argument about what a database management system means, and both models advocate for the same principles.
-
HLLs Computational Complexity vs. SQL, Constraints enforcement, updatable views and HLLs Interfaces: High-level languages (HLLs in short) are computationally complete (i.e., you can implement any algorithm), whereas SQL by itself was not. The ORDM proposes to complement SQL with constraints enforcement (basically, triggers and procedures), updatable views (to support dynamic views over data) and also HLLs interfaces to access data (but always through a query-answer manner). Furthermore, people behind the third generation database system manifesto proposed this solution because a non-procedural high-level language untied to physical structures (such as SQL) was mandatory to preserve data independence.
-
Open Systems: By this claim, both models argue that databases must be able to communicate with other systems or applications.
-
OODBSs vs. ORDBSs: All in all, the OODM is supported by OODBSs, whereas the ORDM is supported by ORDBSs, being incompatible with each other.
1.1.4.Object-Oriented Database Systems Today
-
They lacked a strong theoretical framework (like the set theory behind the relational model) behind the OODM,
-
And because OODBSs bloomed with an incredibly strong experimental activity, with many systems and proposals being around but with no de facto standard (instead, relational systems were supported by major relational vendors).
1.1.5.Object-Relational Database Systems Today
-
LOB (Binary Large Object) type: the LOB type was conceived as a “relational” feature to support object-oriented databases. Internally, LOB serializes objects as byte streams.
-
Row type (ROW): The row type allows a column to contain several attributes and, thus, violates the first normal form. These attributes are accessed by means of the dot notation (e.g., person.age). Relevantly, ROWs cannot be references not reused externally, which makes them useless.
-
User defined types (UDTs): UDTs were introduced in two different ways, as distinct types and structured types. Importantly, both are types, not objects. Distinct types attach semantics to already built-in datatypes (e.g., CREATE DISTINCT TYPE age AS Integer) but they do neither provide inheritance nor methods. Structured types (most commonly known as user defined types in databases) were thought to support objects, as discussed later in section 1.2.
-
Typed tables: Tables where each row is an object (i.e., an instance of a UDT).
-
Inheritance: Inheritance is allowed at UDT level. However, multiple inheritance (as claimed in the manifesto) is not supported.
-
REF Type and object pointers, as an alternative to primary and foreign keys.
-
Collection type (ARRAY): This collection type implements an array that could contain multiple values.
1.2.The Object-Relational Model
1.2.1.User Defined Types (UDTs)
-
An extra column to store the mandatory OID and
-
Extra space to store NULL pointers (see section 1.2.2.).
CREATE TYPE name AS OBJECT ( List of attributes, List of procedure specifications, List of function specifications );
CREATE TYPE fullname AS OBJECT ( given VARCHAR2(10), surname VARCHAR2(30), initials VARCHAR(5), MEMBER PROCEDURE generate_full_name (SELF IN OUT NOCOPY fullname), MAP MEMBER FUNCTION get_id RETURN VARCHAR2 );
CREATE TYPE BODY name AS List of subprogram declarations END;
CREATE TYPE BODY fullname AS MAP MEMBER FUNCTION get_id RETURN VARCHAR2 IS BEGIN RETURN given || ' ' || surname; END; MEMBER PROCEDURE generate_fullname (SELF IN OUT NOCOPY fullname) IS BEGIN DBMS_OUTPUT.PUT_LINE(given || ' ' || initials || ' ' || surname); END; END;
CREATE TYPE person AS OBJECT ( name VARCHAR2(20), realName fullname); CREATE TYPE release AS OBJECT ( artist person, title VARCHAR2(30) );
CREATE TABLE my_songs ( song release, rating NUMBER );
INSERT INTO my_songs VALUES ( release (person ('Adele', fullname ('Adele','Atkins','L.B.') ),'Chasing Pavements'), 10);
SELECT s.song.artist.realname.get_id(), s.rating FROM my_songs s WHERE s.rating > 8 ORDER BY s.rating;
Type inheritance
CREATE TYPE name UNDER parentType ( List of attributes, List of procedure specifications, List of function specifications ) [NOT FINAL, NOT INSTANTIABLE];
CREATE TYPE artist UNDER person ( country VARCHAR2(10), genre VARCHAR(20) );
Type evolution
-
Add / drop attributes or methods from an object type,
-
Modify a numeric attribute (length, precision or scale) or a varying length character attribute length or
-
Change a type's FINAL and INSTANTIABLE properties.
ALTER TYPE artist ADD ATTRIBUTE ( subgenre VARCHAR(20)) CASCADE;
Object tables
CREATE TABLE name OF type;
CREATE TYPE single UNDER release ( album VARCHAR2(20), releaseDate DATE, chartPosition NUMBER, MEMBER PROCEDURE display_details ); CREATE TABLE singles OF single; INSERT INTO singles VALUES ( single (artist ('Adele', fullname ('Adele','Atkins', 'L.B.')'UK','soul','none'),'Chasing Pavements', '19', '11-01-2008', 2) );
//SQL SELECT VALUE(s) FROM singles s WHERE s.artist.name = 'Adele'; // PL/SQL block DECLARE song single; BEGIN SELECT VALUE(s) INTO song FROM singles s WHERE s.person.name = 'Adele'; s.display_details; END;
Object views
CREATE TABLE songs ( artist VARCHAR2(30), title VARCHAR2(30), album VARCHAR2(30), length NUMBER(3,2), genre VARCHAR(10), subgenre VARCHAR(10) );
CREATE TYPE adele_song AS OBJECT ( title VARCHAR2(30), album VARCHAR2(30), length NUMBER(3,2) );
CREATE VIEW name OF type AS SELECT statement;
CREATE VIEW adele_songs OF adele_song WITH OBJECT IDENTIFIER (title, album) AS SELECT s.title, s.album, s.length FROM songs s WHERE s.artist = 'Adele';
Practical Issues on UDTs
-
You can benefit from object-oriented features while continuing to work with most of your data relationally (object views), or
-
You can entirely go over to an object-oriented approach (i.e., object tables and object views).
-
The OID is always 16 bytes long. The primary key has an ad hoc size depending on the built-in type used to implement it. For example, an integer is usually 4 bytes long.
-
The primary key provides semantics regarding the scenario. For example, an ID card has clear semantics. However, an OID is an artificial identifier, with no semantics at all.
-
The OID is guaranteed to be unique. The primary key, however, could just be unique in terms of our database. For example, the uniqueness of a given ID card from Spain cannot be guaranteed if other countries are considered (i.e., another country could have issued the same ID number).
1.2.2.References between Objects
-
The OID of the referenced object (16 bytes long) and
-
The OID of the table or view containing that object (16 bytes long).
CREATE TYPE artist UNDER person ( country VARCHAR2(10), genre VARCHAR(20) ); CREATE TABLE artists OF artist; CREATE TABLE single ( artist_ref REF artist SCOPE IS artists, title VARCHAR2(20), album VARCHAR2(20) ); INSERT INTO single SELECT REF(a),'Chasing Pavements','19' FROM artists a WHERE a.name = 'Adele';
DECLARE artist_ref REF artist; BEGIN SELECT REF(a) INTO artist_ref FROM artists a WHERE a.name = 'Adele'; END;
SELECT DEREF(s.artist_ref), s.title FROM single s;
SELECT s.artist_ref.name, s.title FROM singles s WHERE s.album = '19';
-
The referred object has been deleted from the database (this is possible because REFs only guarantee that the referenced table does exist) or
-
By revoking privileges.
CREATE TABLE single ( artist_ref REF artist REFERENCES artists, title VARCHAR2(20), album VARCHAR2(20) );
Practical issues on REFs
1.2.3.Collections
VARRAY
CREATE TYPE name AS VARRAY(limit) OF datatype | type;
CREATE TYPE list_of_songs AS VARRAY(100) OF VARCHAR2(30); CREATE TABLE adele_fans ( fan_name VARCHAR2(30), favourite_songs list_of_songs );
CREATE TYPE song AS OBJECT ( artist VARCHAR2(30), title VARCHAR2(30), rating NUMBER(2) ); CREATE TYPE array_of_songs AS VARRAY(100) OF song; CREATE TABLE adele_fans ( fan_name VARCHAR(30), favourite_songs array_of_songs );
INSERT INTO adele_fans VALUES ('John Smith', array_of_songs( song ('Adele', 'Chasing Pavements', 8), song ('Creedence Clearwater Revival ', 'Susie Q', 9), song ('The Eagles', 'Hotel California', 10)) );
ALTER TYPE array_of_songs MODIFY LIMIT 200 CASCADE;
Nested Tables
CREATE TYPE object table name AS TABLE OF type; CREATE TABLE name ( List of attributes, List of nested tables, ) NESTED TABLE nested table column STORE AS nested table name;
CREATE TYPE single AS OBJECT ( title VARCHAR2(30), album VARCHAR2(20), ); CREATE TYPE single_table AS TABLE OF single; CREATE TABLE singers ( name VARCHAR2(30), releasedSingles single_table ) NESTED TABLE releasedSingles STORE AS single_nt;
INSERT INTO singers (name) VALUES ('Adele'); UPDATE singers SET releasedSingles = single_table ( single('Chasing Pavements', '19'), single('Daydreamer', '19'), single('Cold Shoulder', '19') ) WHERE name = 'Adele';
CREATE TYPE single ( title VARCHAR2(30), album VARCHAR2(20), ); CREATE TYPE single_table AS TABLE OF single; CREATE TABLE singers ( name VARCHAR2(30), releasedSingles single_table DEFAULT single_table() ) NESTED TABLE releasedSingles STORE AS single_nt; INSERT INTO singers (name) VALUES ('Adele');
CREATE INDEX album_idx ON single_nt(album);
CREATE TYPE single ( title VARCHAR2(30), album VARCHAR2(20), ); CREATE TYPE single_table AS TABLE OF single; CREATE TYPE award ( name VARCHAR2(30), category VARCHAR2(5), ); CREATE TYPE award_table AS TABLE OF award; CREATE TABLE singer ( name VARCHAR2(25), singles single_table, awards award_table ) NESTED TABLE singles STORE AS singer_singles_nt NESTED TABLE awards STORE AS singer_awards_nt;
Practical Issues on Collections
-
The VARRAY size depends on the number of elements it can hold (specifically, number of elements * size + overhead, where overhead are NULL values) and it is always stored as RAW data.
-
According to the LIMIT value defined, the VARRAY is either stored inline or in LOBs.
-
If the whole collection is manipulated at once, it behaves much better than nested tables (it is fetched at once, unlike rows in a nested table).
-
It is exactly stored as a relation.
-
If the nested table has a primary key, it is organized as an index-organized table (IOT).
Relational |
Object-relational |
||
---|---|---|---|
Per column |
Per row |
VARRAY |
Nested Tables |
Fixed number of values |
Variable number of values |
Fixed number of values |
Variable number of values |
Few values |
Many values |
Many values (LIMIT required) |
Many values |
Generates nulls |
There are no null values |
One null |
There are no null values |
One I/O |
Many I/O |
One I/O |
Many I/O |
Global processing |
Partial processing |
Global processing |
Partial processing |
Natural PK |
Artificial PK |
Natural PK + Indexes |
OIDs |
Less space |
More space |
Less space |
More space |
Hard to aggregate |
Easy to aggregate |
Hard to aggregate (Methods/Extensibility) |
Easy to aggregate |
Many CHECKs |
One CHECK |
No CHECKs |
One CHECK (not for REFs) |
Lower concurrency |
Higher concurrency |
Lower concurrency |
Higher concurrency |
-
The first one tells us whether the chosen implementation accepts a variable number of values or not. For example, a column-oriented relational implementation and VARRAYs need to know, at design time, the number of elements to be stored as maximum. We can obviously modify it later (by means of an ALTER TABLE and an ALTER TYPE, respectively) but the other two options can handle this issue dynamically.
-
The second row tells us if, compared to the other approaches, that implementation can store a few or many values. The relational column-oriented solution is clearly the less flexible according to this criterion, because we need to define as many columns as elements in the collection. VARRAYs could eventually store large numbers of values, but the maximum number of values to store have to be expressed at definition time by means of the LIMIT keyword.
-
The third row reflects if NULLs are generated. The relational row-oriented implementation and nested tables do not generate NULLs at all, but the first one does, if we do not straightforward fill the whole collection.
-
The fourth row accounts for the number of I/O operations needed to retrieve the whole collection. Column-oriented solutions are able to read the whole collection with one I/O. If the collection is stored as row-oriented, we might need to read many blocks to retrieve all the rows. Note, however, that the VARRAY type is moved out of the table when it gets too large and needs to be stored as a LOB. Therefore, it is not true anymore that a single I/O operation retrieves the whole collection.
-
The fifth row tells us if this solution is appropriate to deal with all the elements at the same time or if it is better suited for dealing with partial collections. Column-oriented solutions perform better when reading the whole collection at once, and row-oriented solutions deal better with partial processing.
-
The sixth row tells us what kind of identifier we can use in each solution. In the first case, nothing prevents us to add a new column containing a natural primary key for such collection. Later, we can access each element according to its own name, which is, again a natural identifier. VARRAYs behave similarly, except for the use of indexes to access the elements. Implementing the collection per rows in a relational table means that we need to come up with an artificial primary key (for example, surrogates) to identify each element. Nested tables, however, use OIDs.
-
The seventh row tells us the amount of space to be used: inline solutions (i.e., column-oriented) are cheaper, because no extra space is needed.
-
The eighth row tells us whether performing aggregations over the collection is easy or not. Row-oriented solutions behave better in this case, because we can solve the aggregation by means of a SQL statement. In column-oriented ones we should iterate the collection over instead.
-
The ninth row focuses on enforcing constraints. Thus, it tells us how many CHECKs we do need to enforce a constraint over all the elements of the collection. In row-oriented approaches we can use a single CHECK to be enforced in all rows, whereas column-oriented ones require a CHECK for each element in the collection.
-
Finally, the last row tells us the degree of concurrency provided by such solution. In this sense, row-oriented solutions benefit from row-locking (thus, we only lock one element of the collection). Column-oriented ones block the whole collection when accessing a single element, because the whole collection is retrieved at once.
2.The XML Extension
2.1.XML Fundamentals
-
Elements with just text:
<a> Text </a>
-
Elements containing other elements and attributes:
<a atr="val"> <b> </b> </a>
-
Elements with attributes, text and subelements:
<a atr="val"> <b> </b> text </a>
-
Empty elements, which can be written down in compact form like:
<a/>
<artist type="Person" name="Adele"> </artist>
<artist> <type>Person</type> <name>Adele</name> </artist>
-
Attribute for values without their own identity, for instance age, or
-
Subelement for values with their own identity, for instance date of birth.
-
Attribute for metadata or descriptive information about content. For instance the length of a piece of content or its language, or
-
Subelement for content, for instance a title.
<artist type="Person"> <name>Adele</name> </artist>
-
A piece of XML is marked as such by starting with the following expression:
<?xml version="1.0" encoding="UTF-8"?>
-
In addition to the XML version, it also specifies the encoding used to codify characters, which is especially relevant for those specific to a particular language. XML supports the following codification schemes:
UTF-8, UTF-16, ISO-10646-UCS-2, ISO-10646-UCS-4, from ISO-8859-1 to ISO-8859-9, ISO-2022-JP, Shift_JIS and EUC-JP. ISO-8859-1 is the typical choice for “Western European” languages. However for broader coverage XML defaults to UTF-8, which is the common choice.
2.1.1.Well-formed XML
-
It contains only properly encoded legal Unicode characters.
-
None of the special syntax characters such as '<' and '&' appear except when performing their mark-up delineation roles.
When you need any of these special characters but you do not want them to be interpreted as part of the XML syntax, as part of the content you put between tags or in attributes, use the corresponding entities as shown in Code 3. They are replaced with their corresponding character after the XML syntax has been processed.
Code 3. XML entities for XML special characters
Character Entity & & < < > > ' ' " "
For instance, if you want to include a piece of XML inside the content of an element but it should not be interpreted as XML, you can replace all special characters with the corresponding entities, like in Code 4.
Code 4. Encoding XML syntax to include it in element content
<example> <artist type="Person"> Adele </artist> </example>
Another alternative is to use the <![CDATA[…]]> construct to mark a set of characters that should not be interpreted as XML markup. An alternative to Code 4 based on this option is shown in Code 5.
Code 5. Encoding XML syntax to include it in element content using CDATA region
<example> <![CDATA[ <artist type="Person"> Adele </artist>]> </example>
-
The beginning, end and empty-element tags that mark the elements are correctly nested, with none missing and none overlapping, i.e. not as shown in Code 6.
Code 6. Examples of missing name closing element and overlapping artist and name elements
<artist><name>Adele</artist> <artist><name>Adele</artist></name>
-
The element tags are case-sensitive; the beginning and end tags must match exactly.
-
Tag names cannot contain any of the characters " # $ % & ' ( ) * + , / ; < = > ? @ [ \ ] ^ ` { | } ~, nor a space character, and cannot start with - . or a numeral.
-
There is a single root element that contains all the other elements, which might be nested. Altogether, the XML syntax encodes an underlying tree data structure, as shown in Figure 2 for the XML syntax in Code 7.
Codi 7. A piece of XML syntax that corresponds to the tree structure in Figure 4
<metadata> <artist type="Person"> <name>Adele</name> <full-name> Adele<initials>L.B.</initials>Adkins </full-name> </artist> <release> <title>Rolling in the Deep</title> </release> </metadata>
2.1.2.Namespaces
<metadata xmlns="https://musicbrainz.org/ns/mmd-2.0#" xmlns:custom="https://www.uoc.edu/subjects/adb/ns/custom#"> <artist type="Person"> <name>Adele</name> <full-name> Adele<initials>L.B.</initials>Adkins </full-name> </artist> <release> <title>Rolling in the Deep</title> <custom:metadata>Must buy</custom:metadata> </release> </metadata>
2.1.3.Full XML Example
<?xml version="1.0" encoding="UTF-8"?> <metadata xmlns="https://musicbrainz.org/ns/mmd-2.0#" xmlns:xsi="https://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="https://musicbrainz.org/ns/mmd-2.0# musicbrainz_mmd-2.0.xsd"> <release id="79ef6f41-51a3-3ee5-8b2f-7347de023e30"> <title>Rolling in the Deep</title> <status>Official</status> <quality>normal</quality> <text-representation> <language>eng</language> <script>Latn</script> </text-representation> <artist-credit> <name-credit> <artist id="cC2c9c3c-b7bc-4b8b-84d8-4fbd8779e493"> <name>Adele</name> <sort-name>Adele</sort-name> <disambiguation>UK Soul/Jazz singer</disambiguation> </artist> </name-credit> </artist-credit> <date>2011-01-17</date> <country>GB</country> <barcode>634904152123</barcode> <asin>B004DK49WI</asin> <medium-list count="1"> <medium> <position>1</position> <track-list count="2" offset="0"> <track> <position>1</position> <length>229706</length> <recording id="1a13c710-4b7e-4701-8968-cd61f2e58110"> <title>Rolling in the Deep</title> <length>229000</length> </recording> </track> <track> <position>2</position> <title>If It Hadn't Been For Love</title> <length>186933</length> <recording id="addd7af6-36c5-4626-a623-aC23b8bc3d2e"> <title>If It Hadn't Been for Love</title> <length>188000</length> </recording> </track> </track-list> </medium> </medium-list> </release> </metadata>
2.1.4.Storing XML Documents in Oracle XML DB
-
Unstructured storage provides highest throughput when inserting and retrieving entire XML documents. It also provides the greatest degree of flexibility in terms of the structure of the XML that can be stored. These throughput and flexibility benefits come at the expense of less performance when working with documents at a finer granularity level. There is little the database can do to optimize queries or updates on XML stored using a Character Large Object (CLOB), Binary Large Object (BLOB), Binary File (BFILE), or VARCHAR column.
-
Structured storage is based on the XMLType, a new datatype that makes the database aware that XML is being stored. It has a number of advantages, including optimized memory management, reduced storage requirements, B-tree indexing that optimize XPath queries and in-place updates. These advantages are at a cost of a greater processing overhead during ingestion and retrieval and reduced flexibility in terms of the structure of the XML.
DECLARE res BOOLEAN; BEGIN res := DBMS_XDB.createResource('adele-releasegroups.xml', HTTPURITYPE.createuri( 'https://musicbrainz.org/ws/2/artist/cC2c9c3c-b7bc-4b8b-84d8- 4fbd8779e493?inc=release-groups').getClob()); END; /
CREATE TABLE table1 OF XMLType; INSERT INTO table1 VALUES (XMLType(HTTPURITYPE.createuri( 'https://musicbrainz.org/ws/2/artist/cC2c9c3c-b7bc-4b8b-84d8- 4fbd8779e493?inc=release-groups').getXML()));
2.2.XML Schema
-
DTD (Document Type Definition): this is the first standardized way of doing so. It is simpler but less expressive so most of the modern standards based on XML, which define a XML vocabulary to build meaningful documents in the particular application domain of the standard, use the next alternative instead.
-
XML Schema: this second option is more complex than DTD but it is more expressive and thus allows defining vocabularies that better capture the particularities of the application domain. We will concentrate on this alternative, which is detailed in the next section.
2.2.1.Basic Concepts
2.2.2.XML Schema Root
<xsd:schema xmlns:xsd="https://www.w3.org/2001/XMLSchema" targetNamespace="https://musicbrainz.org/ns/mmd-2.0#" xmlns:ext="https://musicbrainz.org/ns/ext#-2.0" xmlns:mmd-2.0="https://musicbrainz.org/ns/mmd-2.0#">
<xsd:import schemaLocation="local.xsd"/> <xsd:import namespace="https://musicbrainz.org/ns/ext#-2.0" schemaLocation="extensions.xsd"/>
2.2.3.Complex Types
-
Sequence: the subelements for the complex type structure defined inside of a sequence should appear in the appearing order in instance XML documents.
-
All: the collection of subelements must appear, but the order is not relevant, they might appear in the instance XML documents in a order different from the order they appear in the XML Schema complex type.
-
Choice: from the collection of subelements in the choice just one will be chosen in the instance XML documents.
2.2.4.Example
<xsd:complexType name="name-creditType"> <xsd:sequence> <xsd:element minOccurs="0" name="name" type="xsd:string"/> ....<xsd:element ref="mmd-2.0:artist"/> </xsd:sequence> <xsd:attribute name="joinphrase" ....type="xsd:string" default=" & "/> </xsd:complexType>
<xsd:element name="artist-credit"> <xsd:complexType> ....<xsd:sequence> .... <xsd:element maxOccurs="unbounded" .... name="mmd-2.0:name-credit" .... type="mmd-2.0:name-creditType"/> ....</xsd:sequence> </xsd:complexType> </xsd:element>
2.2.5.Simple Types
Simple Types |
Examples (delimited by commas) |
Notes |
---|---|---|
string |
Confirm this is electric |
-- |
normalizedString |
Confirm this is electric |
Newline, tab and carriage-return characters are converted to space characters |
token |
Confirm this is electric |
As normalizedString, and adjacent space characters are collapsed to a single space character, and leading and trailing spaces are removed. |
byte |
-1, 126 |
|
unsignedByte |
0, 126 |
|
base64Binary |
GpM7 |
|
hexBinary |
0FB7 |
|
integer, int |
-126789, -1, 0, 1, 126789 |
|
positiveInteger |
1, 126789 |
|
negativeInteger |
-126789, -1 |
|
nonNegativeInteger |
0, 1, 126789 |
|
nonPositiveInteger |
-126789, -1, 0 |
|
unsignedInt |
0, 1267896754 |
|
long |
-1, 12678967543233 |
|
unsignedLong |
0, 12678967543233 |
|
short |
-1, 12678 |
|
unsignedShort |
0, 12678 |
|
decimal |
-1.23, 0, 123.4, 1000.00 |
|
float |
-INF, -1E4, -0, 0, 12.78E-2, 12, INF, NaN |
equivalent to single-precision 32-bit floating point, NaN is Not a Number |
double |
-INF, -1E4, -0, 0, 12.78E-2, 12, INF, NaN |
equivalent to double-precision 64-bit floating point |
Boolean |
true, false 1, 0 |
|
time |
13:20:00.000, 13:20:00.000-05:00 |
|
dateTime |
1999-05-31T13:20:00.000-05:00 |
May 31st 1999 at 1.20pm Eastern Standard Time which is 5 hours behind Co-Ordinated Universal Time |
duration |
P1Y2M3DT10H30M12.3S |
1 year, 2 months, 3 days, 10 hours, 30 minutes, and 12.3 seconds |
date |
1999-05-31 |
|
gMonth |
--05-- |
May |
gYear |
1999 |
1999 |
gYearMonth |
1999-02 |
the month of February 1999, regardless of the number of days |
gDay |
---31 |
the 31st day |
gMonthDay |
--05-31 |
every May 31st |
Name |
shipTo |
XML 1.0 Name type |
QName |
po:USAddress |
XML namespace QName |
NCName |
USAddress |
XML namespace NCName, that is, QName without the prefix and colon |
anyURI |
https://example.com/doc.html#ID5 |
|
language |
en-GB, en-US, fr |
valid values for xml:lang as defined in XML 1.0 that specify the language |
ID, IDREF, IDREFS, |
XML 1.0 ID attribute type |
|
NMTOKEN |
US, Canada |
XML 1.0 NMTOKEN attribute type, a normalised string without spaces |
NMTOKENS |
US UK, Canada Mexique |
XML 1.0 NMTOKENS attribute type, that is, a whitespace separated list of NMTOKEN values |
-
Range: this restriction can be applied to simple types derived from numeric or temporal types. The lower limit of the range is defined using the minInclusive or minExclusive subelement and the value attribute, whose value is included in the range if inclusive or not if exclusive. For the upper limit the subelements are maxInclusive and maxExclusive.
For instance, to define a new simple type for the range of integers between 10 and 99, both included in the range, the new simple type is derived from integer and the minInclusive facet is set to 10 while the maxInclusive one is set to 99:
<xsd:simpleType name="myInteger"> <xsd:restriction base="xsd:integer"> <xsd:minInclusive value="10"/> <xsd:maxInclusive value="99"/> </xsd:restriction> </xsd:simpleType>
-
Pattern: this restriction allows defining a regular expression that constraints the desired combinations of characters from the simple type string. The constraint is set using a facet called pattern in conjunction with the regular expression [A-Z]{2} that is read "from the range of upper-case letters between A and Z, take two of them".
<!-- A two-letter country code like 'DE', 'UK',... so it does not include 'SPA' or 'USA' --> <xsd:simpleType name="def_iso-3166"> <xsd:restriction base="xsd:string"> <xsd:pattern value="[A-Z]{2}"/> </xsd:restriction> </xsd:simpleType>
Another example of pattern restriction from the MusicBrainz schema is about defining a simple type for ISWC (the international standard for identifying music creations):
<!-- An ISWC code: C - single-letter prefix character NNN.NNN.NNN - 9-digits separated by "." grouped 3x3 C - check digit --> <xsd:simpleType name="def_iswc"> <xsd:restriction base="xsd:string"> <xsd:pattern value="[A-Z]-[0-9]{3}\.[0-9]{3}\.[0-9]{3}-[0-9]"/> </xsd:restriction> </xsd:simpleType>
-
Enumeration: it can be used to constrain the values of almost every simple type, except the boolean type. The enumeration facet limits the new type to a set of distinct values. For example, the quality levels, derived from NMTOKEN (a normalized string without spaces), whose value must be low, normal or high:
<xsd:simpleType name="def_quality"> <xsd:restriction base="xsd:NMTOKEN"> <xsd:enumeration value="low"/> <xsd:enumeration value="normal"/> <xsd:enumeration value="high"/> </xsd:restriction> </xsd:simpleType>
-
Length: another way to limit the set of possible values for a simple type is by restricting their length to a particular quantity or to define a minimum or maximum length. This is done using the length, minLength and maxLength restriction facet elements.
<xsd:element name="password"> <xsd:simpleType> <xsd:restriction base="xsd:string"> <xsd:minLength value="5"/> <xsd:maxLength value="8"/> </xs:restriction> </xsd:simpleType> </xsd:element>
-
NMTOKENS
-
IDREFS
-
ENTITIES
<xsd:element name="listOfLengths"> <xsd:simpleType> <xsd:list itemType="myInteger"/> </xsd:simpleType> </xsd:element>
<listOfLengths>189 187 191</listOfMyInt>
<!-- Two-letter country code like 'DE', 'UK', 'FR' etc. --> <xsd:simpleType name="def_iso-3166"> <xsd:restriction base="xsd:string"> <xsd:pattern value="[A-Z]{2}"/> </xsd:restriction> </xsd:simpleType> <!-- Two-letter country code followed by a 3 letter subdivision --> <xsd:simpleType name="def_iso-3166-2"> <xsd:restriction base="xsd:string"> <xsd:pattern value="[A-Z]{2}\-[A-Z]{2}"/> </xsd:restriction> </xsd:simpleType> <!-- The union of the previous simple types --> <xsd:simpleType name="countryCodes"> <xsd:union memberTypes="def_iso-3166 def_iso-3166-2"/> </xsd:simpleType>
2.2.6.Registering an XML Schema in Oracle XML DB
BEGIN DBMS_XMLSCHEMA.registerSchema( SCHEMAURL => 'https://musicbrainz.org/ns/mmd-2.0#', SCHEMADOC => bfilename('examplesDir', 'musicbrainz_mmd-2.0.xsd') ); END; /
-
The database will ensure that only XML documents that validate against the XML schema can be stored in the column or table.
-
Since the contents of the table or column conform to a known XML structure, Oracle XML DB can use the information contained in the XML schema to provide more intelligent query and update processing of the XML.
-
Constraining the XMLType to an XML schema provides the option of storing the content of the document using structured-storage techniques. Structured storage decomposes the content of the XML document and stores it as a set of SQL objects rather than simply storing the document in an unstructured way, such as text in a CLOB.
2.3.XQuery
2.3.1.XPath
-
A syntax for defining the parts of an XML document.
-
A set of expressions that select parts of an XML document.
-
A set of standard functions for manipulating strings, numbers, dates etc.
-
A sequence of nodes
-
A boolean value, true or false
-
A number
-
A string of characters
Path Expressions
step1/step2/...
/step1/step2/...
axe::nodetest[predicate1][predicate2]...
$doc/child::mmd:metadata/child::mmd:artist/ child::mmd:name/child::text()
declare namespace mmd="https://musicbrainz.org/ns/mmd-2.0#"; declare variable $doc := doc("https://musicbrainz.org/ws/2/artist/ cC2c9c3c-b7bc-4b8b-84d8-4fbd8779e493?inc=release-groups+releases");
Axes
-
ancestor: selects all ancestors (parent, grandparent etc.) of the current node.
-
ancestor-or-self: selects all ancestors (parent, grandparent etc.) of the current node and the current node itself.
-
attribute: selects all attribute nodes of the current node.
-
child: selects all children of the current node.
-
descendant: selects all descendants (children, grandchildren etc.) of the current node.
-
descendant-or-self: selects all descendants (children, grandchildren etc.) of the current node and the current node itself.
-
following: selects everything in the document after the closing tag of the current node.
-
following-sibling: selects all siblings after the current node.
-
namespace: selects all namespace nodes of the current node.
-
parent: selects the parent of the current node.
-
preceding: selects everything in the document that is before the start tag of the current node.
-
preceding-sibling: selects all siblings before the current node.
-
self: selects the current node.
-
child: this is the default axis so it can be omitted.
For instance, to get just the text contained in all name elements inside an artist node inside the metadata root element of the specified XML documents, it is possible to just build the expression:
$doc/mmd:metadata/mmd:artist/mmd:name/text()
-
attribute: it can be abbreviated as '@'.
For instance, to get the attribute named type for all artists in the XML document:
$doc/mmd:metadata/mmd:artist/@type
-
self::node(): is equivalent to a point ('.').
-
parent::node(): can be replaced with two points ('..').
-
descendant-or-self::node(): is equivalent to '//'.
For instance, to get the attribute named count for the parent node of any release element, wherever in the XML document:
$doc//mmd:release/../@count
Node Tests
-
node-name: where node-name is the actual name of the nodes to be selected. For instance release-group will select all nodes names like that.
-
node(): matches any node. It can be abbreviated using '*', for instance child::*.
-
text(): matches any text node.
-
comment(): matches any comment node.
-
element(): matches any element node.
-
attribute(): matches any attribute node.
-
attribute(price): matches any attribute whose name is price.
Predicates
Operator |
Description |
Example |
---|---|---|
| |
Computes two node-sets |
//release | //release-group |
+ |
Addition |
6 + 4 |
- |
Subtraction |
6 – 4 |
* |
Multiplication |
6 * 4 |
div |
Division |
8 div 4 |
= |
Equal |
count = 9 |
!= |
Not equal |
count != 9 |
< |
Less than |
count < 9 |
<= |
Less than or equal to |
count <= 9 |
> |
Greater than |
count > 9 |
>= |
Greater than or equal to |
count >= 9 |
or |
Or |
count < 8 or count > 10 |
and |
And |
count > 8 and count < 10 |
mod |
Modulus (division remainder) |
5 mod 2 |
$doc//mmd:release-group[@type="Album"] $doc//mmd:release-list[@count>10]
2.3.2.Queries
-
for: it links one or more variables to expressions written in XPath, creating a flow of tuples (rows) where each tuple contains the values for all the variables defined by the for.
-
let: it binds a variable to the result of an expression, which might involve variables defined by a for. There might be more than one let in an XQuery. The variable values are added to the tuples generated by a for clause or, if there is no for clause, creating a unique tuple containing these links.
-
where: it filters the tuples by removing all those that do not meet the conditions.
-
order by: it sorts the tuple stream according to the given criterion.
-
return: it is evaluated for each tuple in previously filtered and reordered stream. It builds the result XML of the query by concatenating the results of all return evaluations. It can combine XML code with XPath expressions, the latter are surrounded with '{' and '}' if mixed with XML.
Simple For Expression Example
for $rg in $doc//mmd:release-group return <release-group id="{$rg/@id}"/>
-
The for clause iterates over all release-group elements in adele-releasegroups.xml, loaded in Section 1.4, binding variable $rg to the value of each such element, in turn. That is, it iterates over release-group elements, binding $rg to each release-group, independently of where they appear in the XML. The result at this stage of query evaluation is:
<release-group type='Single' id='29716e9a-4496-4d3f-8570- 2833001cdd9e'> <title>Cold Shoulder</title> <first-release-date>2008-04-21</first-release-date> </release-group> <release-group type='Live' id='36e41dc0-2a0c-4ff7-b043- 097534d52bf6'> <title>Adele Live at the Royal Albert Hall</title> <first-release-date>2011-11-28</first-release-date> </release-group> ... <release-group type='Album' id='e4174758-d333-4a8e-a31f- dd0edd51518e'> <title>21</title> <first-release-date>2011-01-19</first-release-date> </release-group>
-
The return clause constructs release-group elements, one for each tuple. Attribute id of these elements is constructed using attribute id from the input, resulting in the following output from the query:
<release-group id='29716e9a-4496-4d3f-8570-2833001cdd9e'/> <release-group id='36e41dc0-2a0c-4ff7-b043-097534d52bf6'/> ... <release-group id='e4174758-d333-4a8e-a31f-dd0edd51518e'/>
Conditional Expression Example
for $rg in $doc//mmd:release-group where $rg/mmd:first-release-date>="2011-07-01" and $rg/mmd:first-release-date<"2012-01-01" order by $rg/mmd:title return $rg/mmd:title
-
The for clause iterates over all release-group elements in adele-releasegroups.xml, loaded in Section 1.4, binding variable $rg to the value of each such element, in turn and independently of where they appear in the XML. The results is:
<release-group type='Single' id='29716e9a-4496-4d3f-8570- 2833001cdd9e'> <title>Cold Shoulder</title> <first-release-date>2008-04-21</first-release-date> </release-group> <release-group type='Live' id='36e41dc0-2a0c-4ff7-b043- 097534d52bf6'> <title>Adele Live at the Royal Albert Hall</title> <first-release-date>2011-11-28</first-release-date> </release-group> ... <release-group type='Album' id='e4174758-d333-4a8e-a31f- dd0edd51518e'> <title>21</title> <first-release-date>2011-01-19</first-release-date> </release-group>
-
The where clause filters the tuple stream of release groups, keeping only tuples with a subelement first-release-date greater or equal than 2011-07-01 and smaller than 2012-01-01. The result after filtering is:
<release-group type='Live' id='36e41dc0-2a0c-4ff7-b043- 097534d52bf6'> <title>Adele Live at the Royal Albert Hall</title> <first-release-date>2011-11-28</first-release-date> </release-group> <release-group type='Live' id='763f800f-4284-432b-b056- 7f6e0aa26bfe'> <title>iTunes Festival: London 2011</title> <first-release-date>2011-07-13</first-release-date> </release-group> <release-group type='Single' id='7C2071cb-598d-4a0c-b1d5- a53e2cb9b5f8'> <title>Set Fire to the Rain</title> <first-release-date>2011-07-04</first-release-date> </release-group>
-
The order by clause sorts the filtered tuple stream by the value of the subelement title in ascending order, the default, and upper-case before lower-case. The previous output is consequently ordered resuting in:
<release-group type='Live' id='36e41dc0-2a0c-4ff7-b043- 097534d52bf6'> <title>Adele Live at the Royal Albert Hall</title> <first-release-date>2011-11-28</first-release-date> </release-group> <release-group type='Single' id='7C2071cb-598d-4a0c-b1d5- a53e2cb9b5f8'> <title>Set Fire to the Rain</title> <first-release-date>2011-07-04</first-release-date> </release-group> <release-group type='Live' id='763f800f-4284-432b-b056- 7f6e0aa26bfe'> <title>iTunes Festival: London 2011</title> <first-release-date>2011-07-13</first-release-date> </release-group>
-
Finally, the return clause concatenates at the output the title elements for all the filtered tuples and in the appropriate order. The final output for the query is:
<title>Adele Live at the Royal Albert Hall</title> <title>Set Fire to the Rain</title> <title>iTunes Festival: London 2011</title>
Differences Between For and Let Clauses
for $t in $doc//mmd:release-group/mmd:title return <titles>{$t}</titles>
<titles><title>Cold Shoulder</title></titles> <titles><title>Adele Live at the Royal Albert Hall</title> </titles> ... <titles><title>Data on the Web</title></titles>
let $t:=$doc//mmd:release-group/mmd:title return <titles>{$t}</titles>
The result obtained is: <titles> <title>Cold Shoulder</title> <title>Adele Live at the Royal Albert Hall</title> ... <title>Data on the Web</title> </titles>
Combining For and Let
for $rg in $doc//mmd:release-group let $r:=//mmd:release[mmd:title=$rg/mmd:title] let $c:=count($r) where $c>7 return <release-count count="{$c}"> {$rg/mmd:title/text()} </release-count>
-
The for clause iterates over all release-group elements in adele-releasegroups.xml, loaded in Section 1.4, binding variable $rg to the value of each such element, in turn and independently of where they appear in the XML. The result at this stage of query evaluation is:
<release-group type='Single' id='29716e9a-4496-4d3f-8570-2833001cdd9e'> <title>Cold Shoulder</title> <first-release-date>2008-04-21</first-release-date> </release-group> ... <release-group type='Album' id='e4174758-d333-4a8e-a31f-dd0edd51518e'> <title>21</title> <first-release-date>2011-01-19</first-release-date> </release-group>
-
The let clause binds variable $r to the sequence of all of the releases whose title is equal to the title of the release group associated for that tuple to $rg (this is a join operation). Note that, unlike for, let does not iterate over values, $r is bound once per $rg value. Consequently, for each release-group in the output there is a set of release, as shown in the next table:
Together, for and let produce a stream of tuples ($rg, $r), where $rg represents a release-group and $r represents all of the releases with the same title than $rg, for each $rg.
-
The where clause filters this tuple stream, keeping only tuples with $r having 7 or more elements.
-
Finally, the return clause concatenates the output for each return clause, which combine XML code with XPath expression to mix it with the releases count and the release title. Just the release group for the album titled 19 has more than 7 releases with the same title, so the output from the previous query is:
<release-count count="8">19</release-count>
If the query contains more than one for clause or for clauses with more than one variable, the result is the Cartesian product of the involved variables. For instance, the following query explores all the combinations of release groups and releases, though just those whose titles do not match are picked.
for $rg in $doc//mmd:release-group, $r in $doc//mmd:release where $rg/mmd:title != $r/mmd:title return <different>{$rg/mmd:title, $r/mmd:title}</different>
The result contains all the combinations of titles among release groups and releases that do not have the same title:
<different> <title>Cold Shoulder</title> <title>Adele Live at the Royal Albert Hall</title> </different> <different> <title>Cold Shoulder</title> <title>Adele Live at the Royal Albert Hall</title> </different> ... <different> <title>21</title> <title>Make You Feel My Love</title> </different> <different> <title>21</title> <title>Chasing Pavements</title> </different>
Additional Conditional Expressions
for $rg in $doc//mmd:release-group return if ($rg/@type = "Album") then <album>{$rg/mmd:title/text()}</album> else <other>{$rg/mmd:title/text()}</other>
<other>Cold Shoulder</other> <other>Adele Live at the Royal Albert Hall</other> ... <album>21</album>
Existential Quantifiers
for $rg in $doc//mmd:release-group where every $r in $doc//mmd:release[mmd:title=$rg/mmd:title] satisfies ($r/mmd:status = "Official") return <all-official>{$rg/mmd:title/text()}</all-official>
<all-official>Adele Live at the Royal Albert Hall</all-official> <all-official>2011-02-25: Morning Becomes Eclectic, KCRW-FM, Santa Monica, CA, USA</all-official> ... <all-official>21</all-official>
for $rg in $doc//mmd:release-group where some $r in $doc//mmd:release[mmd:title=$rg/mmd:title] satisfies ($r/mmd:status = "Official") return <some-official>{$rg/mmd:title/text()}</some-official>
We get the following result: <some-official>Cold Shoulder</some-official> <some-official>Make You Feel My Love</some-official> <some-official>Chasing Pavements</some-official> <some-official>iTunes Live From SoHo</some-official> <some-official>19</some-official> <some-official>Hometown Glory</some-official>
Operators and Functions
-
Mathematical: +, –, ∗ , div(), mod(),…
-
Comparison: =, !=, <, >, <=, >=
-
Boolean: not(), true(), false()
-
Rounding: round(), floor(), ceiling(),…
-
Aggregate functions: count(), min(), max(), avg(), sum()
-
String functions: concat(), string-length (), startswith(), ends-with (), substring(), upper-case (), lower-case, string(),…
-
Context functions: position(), last(), current-time(),…
-
Date, time and duration: duration-equal(), time-equal(), hours-from-duration(), day-from-date(),…
-
Sequence: union (|), intersect, except, distinct-values…
-
Etc.
for $t in distinct-values($doc//mmd:release/mmd:title/text()) return <distinct-title>{$t}</distinct-title>
<distinct-title>19</distinct-title> <distinct-title>Hometown Glory</distinct-title> ... <distinct-title>iTunes Live From SoHo</distinct-title>
for $r in $doc//mmd:release return <release-without-barcode> {$r/@* } {$r/* except $r/mmd:barcode} </release-without-barcode>
<release-without-barcode id='1596501c-e332-366d-9ad5-b1923bab1005'> <title>19</title> <status>Official</status> <quality>high</quality> <text-representation> <language>eng</language> <script>Latn</script> </text-representation> <date>2008-11-17</date> <country>FR</country> </release-without-barcode>
2.3.3.Comments
(: this is a comment :)
2.3.4.XQuery in Oracle XML DB
SELECT XMLQuery(' for $rg in doc("adele-releasegroups.xml")//release-group return <release-group id="{$rg/@id}"/> 'RETURNING CONTENT) FROM DUAL;
Summary
-
For: links variables to expressions written in XPath and creates a tuple stream where each tuple is composed by variable values.
-
Let: binds a variable to the result of an expression.
-
Where: filters the tuples by removing all those that do not meet the conditions.
-
Order by: sorts the tuple stream according to the given criterion.
-
Return: builds the output XML of the query, combining XML templates and variable values.
Self-evaluation
CREATE TABLE airport ( IATA CHAR(3) PRIMARY KEY, city VARCHAR2(25) NOT NULL, country VARCHAR2(20) NOT NULL, region VARCHAR2(14) NOT NULL ); CREATE TABLE passenger ( id CHAR(10) PRIMARY KEY, EUcitizen CHAR NOT NULL CHECK (EUcitizen = 'Y' OR EUcitizen = 'N'), membership VARCHAR2(9) NOT NULL CHECK (membership IN ('none', 'frequent', 'business', 'gold', 'vip')) ); CREATE TABLE ticket ( IATA1 CHAR(3) CONSTRAINT ticket_FK_origin REFERENCES airport, IATA2 CHAR(3) CONSTRAINT ticket_FK_destination REFERENCES airport, id CHAR(10) CONSTRAINT ticket_FK_passenger REFERENCES passenger, price NUMBER(9,2) NOT NULL, discount INTEGER NOT NULL, PRIMARY KEY (IATA1, IATA2, id) );
CREATE TYPE travel AS OBJECT ( departureAirport CHAR(3), destinationAirport CHAR(3), idPassenger CHAR(10), passengerMembership VARCHAR2(9), price NUMBER(9,2), discount INTEGER);
CLASS centre Name VARCHAR2(20) Code INTEGER List of students CLASS student Name VARCHAR2(30), Surname1 VARCHAR2(30), Surname2 VARCHAR2(30), Course VARCHAR2(30), AverageMark INTEGER
-
First, think of a solution where the list of students is described as a VARRAY.
-
Then, provide a solution where the list of students is a NESTED TABLE.
-
Briefly discuss which solution would be better for this scenario.
declare namespace mmd="https://musicbrainz.org/ns/mmd-2.0#"; declare variable $doc := doc("https://musicbrainz.org/ws/2/artist/cc2c9c3c-b7bc-4b8b-84d8- 4fbd8779e493?inc=release-groups+releases");
<xsd:element name="release-group"> <xsd:complexType> <xsd:sequence> <xsd:element minOccurs="0" ref="mmd-2.0:title"/> <xsd:element minOccurs="0" ref="mmd-2.0:disambiguation"/> <xsd:element minOccurs="0" ref="mmd-2.0:comment"/> <xsd:element minOccurs="0" ref="mmd-2.0:first-release-date"/> <xsd:element minOccurs="0" ref="mmd-2.0:artist-credit"/> <xsd:element minOccurs="0" ref="mmd-2.0:release-list"/> <xsd:element minOccurs="0" maxOccurs="unbounded" ref="mmd-2.0:relation-list"/> <xsd:element minOccurs="0" ref="mmd-2.0:tag-list"/> <xsd:element minOccurs="0" ref="mmd-2.0:user-tag-list"/> <xsd:element minOccurs="0" ref="mmd-2.0:rating"/> <xsd:element minOccurs="0" ref="mmd-2.0:user-rating"/> <xsd:group ref="mmd-2.0:def_release-group-element_extension"/> </xsd:sequence> <xsd:attribute name="id" type="xsd:anyURI"/> <xsd:attribute name="type" type="xsd:anyURI"/> <xsd:attributeGroup ref="mmd-2.0:def_release-group- attribute_extension"/> </xsd:complexType> </xsd:element> <xsd:simpleType name="def_quality"> <xsd:restriction base="xsd:token"> <xsd:enumeration value="low"/> <xsd:enumeration value="normal"/> <xsd:enumeration value="high"/> </xsd:restriction> </xsd:simpleType>
Answer key
-
object methods provide up-to-date data (if the base attributes upon which the derived attribute is built change we must refresh the derived attribute and while it is not done, we would access and old value),
-
we save space, as we do not materialize this data but compute it on-the-fly whenever the method is called.
These exercises can be found in the exs_solution_3 and exs_solution_4 files, respectively.
-
The primary key has an ad hoc size depending on the built-in type used to implement it. For example, an integer is usually 4 bytes long. On the contrary, OIDs are always 16 bytes long. This can affect performance (for example, when building indexes).
-
The primary key provides semantics regarding the scenario. However, an OID is an artificial identifier, with no semantics at all.
-
The OID is guaranteed to be unique. The primary key, however, could just be unique in terms of our database.
CREATE VIEW travels OF travel WITH OBJECT IDENTIFIER (departureAirport, destinationAirport, idPassenger ID) AS SELECT t.IATA1, t.IATA2, t.id, p.membership, t.price, t.discount FROM ticket t, passenger p WHERE t.id = p.id;
CREATE TYPE student AS OBJECT ( Name VARCHAR2(30), Surname1 VARCHAR2(30), Surname2 VARCHAR2(30), Course VARCHAR2(30), AverageMark INTEGER); CREATE TYPE list_of_students AS VARRAY(5000) OF student; CREATE TYPE centre AS OBJECT ( Name VARCHAR2(20), Code INTEGER, Students list_of_students);
CREATE TYPE student AS OBJECT ( Name VARCHAR2(30), Surname1 VARCHAR2(30), Surname2 VARCHAR2(30), Course VARCHAR2(30), AverageMark INTEGER); CREATE TYPE students_t AS TABLE OF student; CREATE TABLE centre AS OBJECT ( Name VARCHAR2(20), Code INTEGER, Students students_t) NESTED TABLE Students STORE AS students_nt;
The solutions for these exercises are available as files Solution2a.txt, Solution2b.txt, Solution3.txt, Solution4a.txt, Solution4b.txt, Solution4c.txt, Solution5.txt and Solution6.txt.
<xsd:element name="quality-list"> <xsd:simpleType> <xsd:list itemType="mmd-2.0:def_quality"/> </xsd:simpleType> </xsd:element>
for $r in $doc//mmd:release where $r/mmd:date<"2008" order by $r/mmd:date return <release>{$r/mmd:title,$r/mmd:date}</release>
for $r in $doc//mmd:release where $r/mmd:date>"2008-01-01" order by $r/mmd:date return $r/mmd:title
<release> <title>Hometown Glory</title> <date>2007-10-29</date> </release> <release> <title>2008-09-22: BBC Radio 1's Live Lounge: London, UK </title> <date>2008</date> </release> <release> <title>Hometown Glory</title> <date>2008</date> </release>
for $c in distinct-values($doc//mmd:country) let $rc := $doc//mmd:release[mmd:country=$c and mmd:date< "2008-06-01"] let $nc := count($rc) where $nc > 0 order by $nc descending return <country name="{$c}">{$nc}</country>
<country name="GB">12</country> <country name="FR">1</country> <country name="JP">1</country> <country name="DE">1</country>
for $rg in $doc//mmd:release-group where not(exists($rg/mmd:first-release-date/text())) return $rg
for $rg in $doc//mmd:release-group where not($rg/mmd:first-release-date/text()) return $rg
$doc//mmd:release-group[not(mmd:first-release-date/text())]
<release-group type="Live" id="37367c39-1c91-4a8c-baa7-0a09c3df4b6b"> <title>2011-02-25: Morning Becomes Eclectic, KCRW-FM, Santa Monica, CA, USA</title> <first-release-date/> </release-group>
for $rg in $doc//mmd:release-group where not(matches($rg/mmd:first-release-date,"\d{4}-\d{2}-\d{2}")) return $rg
<release-group type="Live" id="11ce3c93-0325-439e-8de7-fab397ba839c"> <title>2008-09-22: BBC Radio 1's Live Lounge: London, UK </title> <first-release-date>2008</first-release-date> </release-group> <release-group type="Live" id="37367c39-1c91-4a8c-baa7-0a09c3df4b6b"> <title>2011-02-25: Morning Becomes Eclectic, KCRW-FM, Santa Monica, CA, USA</title> <first-release-date/> </release-group>
for $rg in $doc//mmd:release-group where not(matches($rg/mmd:first-release-date,"\d{4}-\d{2}-\d{2}")) return <mmd:release-group> {$rg/@* } {$rg/* except $rg/mmd:first-release-date} {if ($rg/mmd:first-release-date/text() and matches()) then <mmd:first-release-date> {$rg/mmd:first-release-date/text()}-01-01 </mmd:first-release-date> else ()} </mmd:release-group>
<mmd:release-group type="Live" id="11ce3c93-0325-439e-8de7-fab397ba839c"> <mmd:title>2008-09-22: BBC Radio 1's Live Lounge: London, UK </mmd:title> <mmd:first-release-date>2008-01-01</mmd:first-release-date> </mmd:release-group> <mmd:release-group type="Live" id="37367c39-1c91-4a8c-baa7-0a09c3df4b6b"> <mmd:title>2011-02-25: Morning Becomes Eclectic, KCRW-FM, Santa Monica, CA, USA</mmd:title> </mmd:release-group>
Glossary
- API
- Application Programming Interface.
- Attribute
- A characteristic or property of an element. Attributes are represented as name value pairs on an element tag.
- DBA
- Database Administrator.
- DBMS
- Database Management System.
- Document Object Model (DOM)
- An API that provides an object representation of an XML document. The DOM API represents an XML document as a tree of nodes. Nodes may be created, queried, updated and deleted.
- Document Type Definition (DTD)
- Describes the structure of XML documents.
- Document
- An XML structure containing a root element and its subelements.
- Element
- A component of the tree structure defined in a Document Type Definition (DTD) or Schema. An element may be composed of text, attributes and other elements.
- Final
- In Object-Oriented programming languages, a final class is that which cannot be extended.
- Instantiable
- In Object-Oriented programming languages, a class which can be instantiated and, therefore, creates an object of that kind.
- Item
- An item is either an atomic Value or a Node.
- JDBC
- Java Database Connectivity.
- Metadata
- Metadata are data about data or data that describe other data.
- Namespace
- A feature of XML for using multiple vocabularies in a single XML document and avoid name clashes.
- Node Type
- The types of nodes are, in addition to Element, Attribute and Text, Document, DocumentFragment, DocumentType, ProcessingInstruction, EntityReference, CDATASection, Comment, Entity and Notation.
- Node
- A node can be an element node, an attribute node, a text node or any other of the Node Types.
- ODBC
- Open Database Connectivity.
- OO
- Object-Oriented.
- OODBS
- Object-Oriented Database System.
- OODM
- Object-Oriented Data Model.
- OOPL
- Object-Oriented Programming Language.
- ORDBS
- Object-Relational Database System.
- ORDM
- Object-Relational Data Model.
- Parser
- A tool that reads XML data and breaks it up into elements and attributes, usually structured as a Document Object Model (DOM).
- PL/SQL
- Procedural Language/Structured Query Language.
- Root
- The outermost element in an XML document that contains all other elements. It is the top node in a tree structure.
- Schema
- Defines the structure of XML documents. Schemas address deficiencies in DTDs such as specifying data types.
- Sequence
- A sequence is an ordered collection of zero or more Items.
- Shred
- The process of mapping the data in an XML document to table rows and columns in a relational database.
- Surrogate
- An artificial primary key created and maintained by the system with no semantics extracted from the domain.
- Tag
- The markup language used to describe an XML element. An XML tag is represented by the element name enclosed by angle brackets.
- Tuple Stream
- A tuple stream is an ordered sequence of zero or more tuples.
- Tuple
- It is a set of zero or more named variables, each of which is bound to a Value.
- Value
- In the XQuery data model, a value is always a Sequence.
- Vertical Fragmentation
- The problem of breaking a relation into smaller pieces by grouping attributes.
- Vocabulary
- A dialect or set of XML tags used to describe a particular data structure. A vocabulary is defined using a DTD or Schema.