Document Type definitions

Add to Favourites
Post to:

Description
Explains about XML DTD!

Comments
Presentation Transcript Presentation Transcript

Slide 1 : 5/13/2010 IJP Unit IV DTD with Example VV 1 Document Type Definition (DTD) V. Vasantha M.E., Senior Lecturer, Dept. Of Information Technology, National Engineering College, K. R. Nagar, Kovilpatti.

XML and DTDs : 5/13/2010 IJP Unit IV DTD with Example VV 2 XML and DTDs A DTD (Document Type Definition) describes the structure of one or more XML documents. Specifically, a DTD describes: Elements Attributes, and Entities (We will discuss each of these in turn) An XML document is well-structured if it follows certain simple syntactic rules An XML document is valid if it also specifies and conforms to a DTD

Building blocks of XML : 5/13/2010 IJP Unit IV DTD with Example VV 3 Building blocks of XML XML documents (and HTML documents) are made up by the following building blocks:

What is DTD? : 5/13/2010 IJP Unit IV DTD with Example VV 4 What is DTD? DTD is used to declare each of the building blocks (elements) used in a XML document DTD defines: a structure of the XML document a list of legal elements of the XML document

Motivation : 5/13/2010 IJP Unit IV DTD with Example VV 5 Motivation A DTD adds syntactical requirements in addition to the well-formed requirement It helps in eliminating errors when creating or editing XML documents It clarifies the intended semantics It simplifies the processing of XML documents

Why DTDs? : 5/13/2010 IJP Unit IV DTD with Example VV 6 Why DTDs? XML documents are designed to be processed by computer programs If you can put just any tags in an XML document, it’s very hard to write a program that knows how to process the tags A DTD specifies what tags may occur, when they may occur, and what attributes they may (or must) have A DTD allows the XML document to be verified (shown to be legal) A DTD that is shared across groups allows the groups to produce consistent XML documents

Parsers : 5/13/2010 IJP Unit IV DTD with Example VV 7 Parsers An XML parser is an API that reads the content of an XML document Currently popular APIs are DOM (Document Object Model) and SAX (Simple API for XML) A validating parser is an XML parser that compares the XML document to a DTD and reports any errors Most browsers don’t use validating parsers

Well-Formed vs. Valid Document : 5/13/2010 IJP Unit IV DTD with Example VV 8 Well-Formed vs. Valid Document Well-formed document – the document that adheres to the XML syntax rules Valid document – the document that adheres to the rules defined in the corresponding DTD document Only the valid documents are valuable in terms of sharing and retrieving information.

Well-Formed XML Documents : 5/13/2010 IJP Unit IV DTD with Example VV 9 Well-Formed XML Documents An XML document (with or without a DTD) is well-formed if Tags are syntactically correct Every tag has an end tag Tags are properly nested There is a root tag A start tag does not have two occurrences of the same attribute An XML document must be well formed

Valid Documents : 5/13/2010 IJP Unit IV DTD with Example VV 10 Valid Documents A well-formed XML document isvalid if it conforms to its DTD, that is, The document conforms to the regular-expression grammar, The types of attributes are correct, and The constraints on references are satisfied

Adding a DTD to the Document : 5/13/2010 IJP Unit IV DTD with Example VV 11 Adding a DTD to the Document A DTD can be internal The DTD is part of the document file or external The DTD and the document are on separate files An external DTD may reside In the local file system (where the document is) In a remote file system

Connecting a Document with its DTD : 5/13/2010 IJP Unit IV DTD with Example VV 12 Connecting a Document with its DTD An internal DTD: … ] > ... A DTD from the local file system: A DTD from a remote file system:

Internal vs. External DTD : 5/13/2010 IJP Unit IV DTD with Example VV 13 Internal vs. External DTD What is wrong here ?

Internal vs. External DTD : 5/13/2010 IJP Unit IV DTD with Example VV 14 Internal vs. External DTD External DTD are better because of: possibility of sharing definitions between XML documents The documents that share the same DTD are more uniform and easier to retrieve Linking in the DTD document Ken Anderson Lukasz Kurgan Ok! We can see some progress

An XML example : 5/13/2010 IJP Unit IV DTD with Example VV 15 An XML example This is the great American novel. It was a dark and stormy night. Suddenly, a shot rang out! An XML document contains (and the DTD describes): Attributes, such as number="1", consisting of a name and a value Elements, such as novel and paragraph, consisting of tags and content Entities (not used in this example)

A DTD example : 5/13/2010 IJP Unit IV DTD with Example VV 16 A DTD example ] > A novel consists of a foreword and one or more chapters, in that order A foreword consists of one or more paragraphs A chapter also consists of one or more paragraphs A paragraph consists of parsed character data (text that cannot contain any other elements) Each chapter must have a number attribute

Declarations : 5/13/2010 IJP Unit IV DTD with Example VV 17 Declarations Document Type Declarationcan include: Element Declaration. Attribute Declaration. Comment Declaration. Entity Declaration. Notation Declaration. Marked Section Declaration.

Example: An Address Book : 5/13/2010 IJP Unit IV DTD with Example VV 18 Example: An Address Book Homer Simpson Dr. H. Simpson 1234 Springwater Road Springfield USA, 98765 (321) 786 2543 (321) 786 2544 (321) 786 2544 homer@math.springfield.edu

Specifying the Structure : 5/13/2010 IJP Unit IV DTD with Example VV 19 Specifying the Structure name to specify a name element greet? to specify an optional (0 or 1) greet elements name, greet? to specify a name followed by an optional greet

Specifying the Structure (cont’d) : 5/13/2010 IJP Unit IV DTD with Example VV 20 Specifying the Structure (cont’d) addr* to specify 0 or more address lines tel | fax a tel or a fax element (tel | fax)* 0 or more repeats of tel or fax email* 0 or more email elements

Specifying the Structure (cont’d) : 5/13/2010 IJP Unit IV DTD with Example VV 21 Specifying the Structure (cont’d) So the whole structure of a person entry is specified by name, greet?, addr*, (tel | fax)*, email* This is known as a regular expression

Element Type Definition : 5/13/2010 IJP Unit IV DTD with Example VV 22 Element Type Definition for each element type E, a declaration of the form: where P is a regular expression, i.e., P ::= EMPTY | ANY | #PCDATA | E’ | P1, P2 | P1 | P2 | P? | P+ | P* E’: element type P1 , P2: concatenation P1 | P2: disjunction P?: optional P+: one or more occurrences P*: the Kleene closure

Summary of Regular Expressions : 5/13/2010 IJP Unit IV DTD with Example VV 23 Summary of Regular Expressions A The tag (i.e., element) A occurs e1,e2 The expression e1 followed by e2 e* 0 or more occurrences of e e? Optional: 0 or 1 occurrences e+ 1 or more occurrences e1 | e2 either e1 or e2 (e) grouping

The Definition of an Element Consists of Exactly One of the Following : 5/13/2010 IJP Unit IV DTD with Example VV 24 The Definition of an Element Consists of Exactly One of the Following A regular expression (as defined earlier) EMPTY means that the element has no content ANY means that content can be any mixture of PCDATA and elements defined in the DTD Mixed content which is defined as described on the next slide (#PCDATA)

The Definition of Mixed Content : 5/13/2010 IJP Unit IV DTD with Example VV 25 The Definition of Mixed Content Mixed content is described by a repeatable OR group (#PCDATA | element-name | …)* Inside the group, no regular expressions – just element names #PCDATA must be first followed by 0 or more element names, separated by | The group can be repeated 0 or more times

An Address-Book XML Document with an Internal DTD : 5/13/2010 IJP Unit IV DTD with Example VV 26 An Address-Book XML Document with an Internal DTD ] > The syntax of a DTD is not XML syntax

The Rest of theAddress-Book XML Document : 5/13/2010 IJP Unit IV DTD with Example VV 27 The Rest of theAddress-Book XML Document Jeff Cohen Dr. Cohen jc@penny.com

Regular Expressions : 5/13/2010 IJP Unit IV DTD with Example VV 28 Regular Expressions Each regular expression determines a corresponding finite-state automaton Let’s start with a simpler example: name, addr*, email This suggests a simple parsing program A double circle denotes an accepting state

Another Example : 5/13/2010 IJP Unit IV DTD with Example VV 29 Another Example name,address*,(tel | fax)*,email*

Some Things are Hard to Specify : 5/13/2010 IJP Unit IV DTD with Example VV 30 Some Things are Hard to Specify Each employee element should contain name, age and ssn elements in some order Suppose that there were many more fields!

Some Things are Hard to Specify (cont’d) : 5/13/2010 IJP Unit IV DTD with Example VV 31 Some Things are Hard to Specify (cont’d) Suppose there were many more fields! There are n! different orders of n elements It is not even polynomial

Specifying Attributes in the DTD : 5/13/2010 IJP Unit IV DTD with Example VV 32 Specifying Attributes in the DTD The dimension attribute is required The accuracy attribute is optional CDATA is the “type” of the attribute – it means “character data,” and may take any literal string as a value.

The Format of an Attribute Definition : 5/13/2010 IJP Unit IV DTD with Example VV 33 The Format of an Attribute Definition The default value is given inside quotes attribute types: CDATA ID, IDREF, IDREFS …

Summary of AttributeDefault Values : 5/13/2010 IJP Unit IV DTD with Example VV 34 Summary of AttributeDefault Values #REQUIRED means that the attribute must by included in the element #IMPLIED #FIXED “value” The given value (inside quotes) is the only possible one “value” The default value of the attribute if none is given

Recursive DTDs : 5/13/2010 IJP Unit IV DTD with Example VV 35 Recursive DTDs -- father ... ] > What is the problem with this? A parser does not notice it! Each person should have a father and a mother. This leads to either infinite data or a person that is a descendent of herself.

Recursive DTDs (cont’d) : 5/13/2010 IJP Unit IV DTD with Example VV 36 Recursive DTDs (cont’d) -- father ... ] > What is now the problem with this? If a person only has a father, how can you tell that he has a father and does not have a mother?

Using ID and IDREF Attributes : 5/13/2010 IJP Unit IV DTD with Example VV 37 Using ID and IDREF Attributes ] >

IDs and IDREFs : 5/13/2010 IJP Unit IV DTD with Example VV 38 IDs and IDREFs ID attribute: unique within the entire document. An element can have at most one ID attribute. No default (fixed default) value is allowed. #required: a value must be provided #implied: a value is optional IDREF attribute: its value must be some other element’s ID value in the document. IDREFS attribute: its value is a set, each element of the set is the ID value of some other element in the document.

Some Conforming Data : 5/13/2010 IJP Unit IV DTD with Example VV 39 Some Conforming Data Lisa Simpson Bart Simpson Marge Simpson Homer Simpson

ID References do not Have Types : 5/13/2010 IJP Unit IV DTD with Example VV 40 ID References do not Have Types The attributes mother and father are references to IDs of other elements However, those are not necessarily person elements! The mother attribute is not necessarily a reference to a female person

An Alternative Specification : 5/13/2010 IJP Unit IV DTD with Example VV 41 An Alternative Specification ] >

The Revised Data : 5/13/2010 IJP Unit IV DTD with Example VV 42 The Revised Data Marge Simpson Homer Simpson Bart Simpson Lisa Simpson

Consistency of ID and IDREF Attribute Values : 5/13/2010 IJP Unit IV DTD with Example VV 43 Consistency of ID and IDREF Attribute Values If an attribute is declared as ID The associated value must be distinct, i.e., different elements (in the given document) must have different values for the ID attribute (no confusion) Even if the two elements have different element names If an attribute is declared as IDREF The associated value must exist as the value of some ID attribute (no dangling “pointers”) Similarly for all the values of an IDREFS attribute ID, IDREF and IDREFS attributes are not typed

Limitations of DTDs : 5/13/2010 IJP Unit IV DTD with Example VV 44 Limitations of DTDs DTDs are a very weak specification language You can’t put any restrictions on element contents It’s difficult to specify: All the children must occur, but may be in any order This element must occur a certain number of times There are only ten data types for attribute values But most of all: DTDs aren’t written in XML! If you want to do any validation, you need one parser for the XML and another for the DTD This makes XML parsing harder than it needs to be There is a newer and more powerful technology: XML Schemas However, DTDs are still very much in use

Want to learn?

Sign up and browse through relevant courses.

Name:
Your Email:
Password:
Country:
Contact no:


Area code Number
Subjects you are interested in:
Word verification: (Enter the text as in image)


Sign Up Already a member? Sign In
I agree to WizIQ's User Agreement & Privacy Policy
Vasantha Vivek
Professor
User
27 Followers

Your Facebook Friends on WizIQ

Give live classes, create & sell online courses

Try it free Plans & Pricing

Connect