Slide 1 : 5/13/2010 IJP Unit IV DTD with Example VV 1 Document Type Definition (DTD) V. Vasantha M.E.,
Senior Lecturer,
Dept. Of Information Technology,
National Engineering College,
K. R. Nagar, Kovilpatti.
XML and DTDs : 5/13/2010 IJP Unit IV DTD with Example VV 2 XML and DTDs A DTD (Document Type Definition) describes the structure of one or more XML documents. Specifically, a DTD describes:
Elements
Attributes, and
Entities
(We will discuss each of these in turn)
An XML document is well-structured if it follows certain simple syntactic rules
An XML document is valid if it also specifies and conforms to a DTD
Building blocks of XML : 5/13/2010 IJP Unit IV DTD with Example VV 3 Building blocks of XML XML documents (and HTML documents) are made up by the following building blocks:
What is DTD? : 5/13/2010 IJP Unit IV DTD with Example VV 4 What is DTD? DTD is used to declare each of the building blocks (elements) used in a XML document
DTD defines:
a structure of the XML document
a list of legal elements of the XML document
Motivation : 5/13/2010 IJP Unit IV DTD with Example VV 5 Motivation A DTD adds syntactical requirements in addition to the well-formed requirement
It helps in eliminating errors when creating or editing XML documents
It clarifies the intended semantics
It simplifies the processing of XML documents
Why DTDs? : 5/13/2010 IJP Unit IV DTD with Example VV 6 Why DTDs? XML documents are designed to be processed by computer programs
If you can put just any tags in an XML document, it’s very hard to write a program that knows how to process the tags
A DTD specifies what tags may occur, when they may occur, and what attributes they may (or must) have
A DTD allows the XML document to be verified (shown to be legal)
A DTD that is shared across groups allows the groups to produce consistent XML documents
Parsers : 5/13/2010 IJP Unit IV DTD with Example VV 7 Parsers An XML parser is an API that reads the content of an XML document
Currently popular APIs are DOM (Document Object Model) and SAX (Simple API for XML)
A validating parser is an XML parser that compares the XML document to a DTD and reports any errors
Most browsers don’t use validating parsers
Well-Formed vs. Valid Document : 5/13/2010 IJP Unit IV DTD with Example VV 8 Well-Formed vs. Valid Document Well-formed document – the document that adheres to the XML syntax rules
Valid document – the document that adheres to the rules defined in the corresponding DTD document
Only the valid documents are valuable in terms of sharing and retrieving information.
Well-Formed XML Documents : 5/13/2010 IJP Unit IV DTD with Example VV 9 Well-Formed XML Documents An XML document (with or without a DTD) is well-formed if
Tags are syntactically correct
Every tag has an end tag
Tags are properly nested
There is a root tag
A start tag does not have two occurrences of the same attribute An XML document must be well formed
Valid Documents : 5/13/2010 IJP Unit IV DTD with Example VV 10 Valid Documents A well-formed XML document isvalid if it conforms to its DTD, that is,
The document conforms to the regular-expression grammar,
The types of attributes are correct, and
The constraints on references are satisfied
Adding a DTD to the Document : 5/13/2010 IJP Unit IV DTD with Example VV 11 Adding a DTD to the Document A DTD can be internal
The DTD is part of the document file
or external
The DTD and the document are on separate files
An external DTD may reside
In the local file system
(where the document is)
In a remote file system
Connecting a Document with its DTD : 5/13/2010 IJP Unit IV DTD with Example VV 12 Connecting a Document with its DTD An internal DTD:
… ] >
...
A DTD from the local file system:
A DTD from a remote file system:
Internal vs. External DTD : 5/13/2010 IJP Unit IV DTD with Example VV 13 Internal vs. External DTD What is wrong here ?
Internal vs. External DTD : 5/13/2010 IJP Unit IV DTD with Example VV 14 Internal vs. External DTD External DTD are better because of:
possibility of sharing definitions between XML documents
The documents that share the same DTD are more uniform and easier to retrieve
Linking in the DTD document
Ken Anderson
Lukasz Kurgan
Ok! We can see some progress
An XML example : 5/13/2010 IJP Unit IV DTD with Example VV 15 An XML example This is the great American novel. paragraph> It was a dark and stormy night. Suddenly, a shot rang out!
An XML document contains (and the DTD describes):
Attributes, such as number="1", consisting of a name and a value
Elements, such as novel and paragraph, consisting of tags and content
Entities (not used in this example)
A DTD example : 5/13/2010 IJP Unit IV DTD with Example VV 16 A DTD example ] >
A novel consists of a foreword and one or more chapters, in that order
A foreword consists of one or more paragraphs
A chapter also consists of one or more paragraphs
A paragraph consists of parsed character data (text that cannot contain any other elements)
Each chapter must have a number attribute
Declarations : 5/13/2010 IJP Unit IV DTD with Example VV 17 Declarations Document Type Declarationcan include:
Element Declaration.
Attribute Declaration.
Comment Declaration.
Entity Declaration.
Notation Declaration.
Marked Section Declaration.
Example: An Address Book : 5/13/2010 IJP Unit IV DTD with Example VV 18 Example: An Address Book
Homer Simpson
Dr. H. Simpson
1234 Springwater Road
Springfield USA, 98765
(321) 786 2543
(321) 786 2544
(321) 786 2544
homer@math.springfield.edu
Specifying the Structure : 5/13/2010 IJP Unit IV DTD with Example VV 19 Specifying the Structure name to specify a name element
greet? to specify an optional (0 or 1) greet elements
name, greet? to specify a name followed by an optional greet
Specifying the Structure (cont’d) : 5/13/2010 IJP Unit IV DTD with Example VV 20 Specifying the Structure (cont’d) addr* to specify 0 or more address lines
tel | fax a tel or a fax element
(tel | fax)* 0 or more repeats of tel or fax
email* 0 or more email elements
Specifying the Structure (cont’d) : 5/13/2010 IJP Unit IV DTD with Example VV 21 Specifying the Structure (cont’d) So the whole structure of a person entry is specified by
name, greet?, addr*, (tel | fax)*, email*
This is known as a regular expression
Element Type Definition : 5/13/2010 IJP Unit IV DTD with Example VV 22 Element Type Definition for each element type E, a declaration of the form:
where P is a regular expression, i.e.,
P ::= EMPTY | ANY | #PCDATA | E’ |
P1, P2 | P1 | P2 | P? | P+ | P*
E’: element type
P1 , P2: concatenation
P1 | P2: disjunction
P?: optional
P+: one or more occurrences
P*: the Kleene closure
Summary of Regular Expressions : 5/13/2010 IJP Unit IV DTD with Example VV 23 Summary of Regular Expressions A The tag (i.e., element) A occurs
e1,e2 The expression e1 followed by e2
e* 0 or more occurrences of e
e? Optional: 0 or 1 occurrences
e+ 1 or more occurrences
e1 | e2 either e1 or e2
(e) grouping
The Definition of an Element Consists of Exactly One of the Following : 5/13/2010 IJP Unit IV DTD with Example VV 24 The Definition of an Element Consists of Exactly One of the Following A regular expression (as defined earlier)
EMPTY means that the element has no content
ANY means that content can be any mixture of PCDATA and elements defined in the DTD
Mixed content which is defined as described on the next slide
(#PCDATA)
The Definition of Mixed Content : 5/13/2010 IJP Unit IV DTD with Example VV 25 The Definition of Mixed Content Mixed content is described by a repeatable OR group
(#PCDATA | element-name | …)*
Inside the group, no regular expressions – just element names
#PCDATA must be first followed by 0 or more element names, separated by |
The group can be repeated 0 or more times
An Address-Book XML Document with an Internal DTD : 5/13/2010 IJP Unit IV DTD with Example VV 26 An Address-Book XML Document with an Internal DTD
] > The syntax of a DTD is not XML syntax
The Rest of theAddress-Book XML Document : 5/13/2010 IJP Unit IV DTD with Example VV 27 The Rest of theAddress-Book XML Document
Jeff Cohen
Dr. Cohen
jc@penny.com
Regular Expressions : 5/13/2010 IJP Unit IV DTD with Example VV 28 Regular Expressions Each regular expression determines a corresponding finite-state automaton
Let’s start with a simpler example:
name, addr*, email This suggests a simple parsing program A double circle denotes an accepting state
Another Example : 5/13/2010 IJP Unit IV DTD with Example VV 29 Another Example name,address*,(tel | fax)*,email*
Some Things are Hard to Specify : 5/13/2010 IJP Unit IV DTD with Example VV 30 Some Things are Hard to Specify Each employee element should contain name, age and ssn elements in some order
Suppose that there were many more fields!
Some Things are Hard to Specify (cont’d) : 5/13/2010 IJP Unit IV DTD with Example VV 31 Some Things are Hard to Specify (cont’d)
Suppose there were many more fields! There are n! different
orders of n elements
It is not even polynomial
Specifying Attributes in the DTD : 5/13/2010 IJP Unit IV DTD with Example VV 32 Specifying Attributes in the DTD
The dimension attribute is required
The accuracy attribute is optional
CDATA is the “type” of the attribute – it means “character data,” and may take any literal string as a value.
The Format of an Attribute Definition : 5/13/2010 IJP Unit IV DTD with Example VV 33 The Format of an Attribute Definition
The default value is given inside quotes
attribute types:
CDATA
ID, IDREF, IDREFS
…
Summary of AttributeDefault Values : 5/13/2010 IJP Unit IV DTD with Example VV 34 Summary of AttributeDefault Values #REQUIRED means that the attribute must by included in the element
#IMPLIED
#FIXED “value”
The given value (inside quotes) is the only possible one
“value”
The default value of the attribute if none is given
Recursive DTDs : 5/13/2010 IJP Unit IV DTD with Example VV 35 Recursive DTDs
-- father
...
] >
What is the problem with this?
A parser does not notice it! Each person
should have
a father and a
mother. This
leads to either
infinite data or
a person that
is a descendent
of herself.
Recursive DTDs (cont’d) : 5/13/2010 IJP Unit IV DTD with Example VV 36 Recursive DTDs (cont’d)
-- father
...
] >
What is now the problem with this? If a person only
has a father,
how can you
tell that he has
a father and
does not have
a mother?
Using ID and IDREF Attributes : 5/13/2010 IJP Unit IV DTD with Example VV 37 Using ID and IDREF Attributes
] >
IDs and IDREFs : 5/13/2010 IJP Unit IV DTD with Example VV 38 IDs and IDREFs ID attribute: unique within the entire document.
An element can have at most one ID attribute.
No default (fixed default) value is allowed.
#required: a value must be provided
#implied: a value is optional
IDREF attribute: its value must be some other element’s ID value in the document.
IDREFS attribute: its value is a set, each element of the set is the ID value of some other element in the document.
Some Conforming Data : 5/13/2010 IJP Unit IV DTD with Example VV 39 Some Conforming Data
Lisa Simpson
Bart Simpson
Marge Simpson
Homer Simpson
ID References do not Have Types : 5/13/2010 IJP Unit IV DTD with Example VV 40 ID References do not Have Types The attributes mother and father are references to IDs of other elements
However, those are not necessarily person elements!
The mother attribute is not necessarily a reference to a female person
An Alternative Specification : 5/13/2010 IJP Unit IV DTD with Example VV 41 An Alternative Specification
] >
The Revised Data : 5/13/2010 IJP Unit IV DTD with Example VV 42 The Revised Data
Marge
Simpson
Homer
Simpson
Bart Simpson
Lisa
Simpson
Consistency of ID and IDREF Attribute Values : 5/13/2010 IJP Unit IV DTD with Example VV 43 Consistency of ID and IDREF Attribute Values If an attribute is declared as ID
The associated value must be distinct, i.e., different elements (in the given document) must have different values for the ID attribute (no confusion)
Even if the two elements have different element names
If an attribute is declared as IDREF
The associated value must exist as the value of some ID attribute (no dangling “pointers”)
Similarly for all the values of an IDREFS attribute
ID, IDREF and IDREFS attributes are not typed
Limitations of DTDs : 5/13/2010 IJP Unit IV DTD with Example VV 44 Limitations of DTDs DTDs are a very weak specification language
You can’t put any restrictions on element contents
It’s difficult to specify:
All the children must occur, but may be in any order
This element must occur a certain number of times
There are only ten data types for attribute values
But most of all: DTDs aren’t written in XML!
If you want to do any validation, you need one parser for the XML and another for the DTD
This makes XML parsing harder than it needs to be
There is a newer and more powerful technology: XML Schemas
However, DTDs are still very much in use