DATABASE

Add to Favourites
Post to:

MODULE 13 marks QuestionsList the situations in which we not to use DBMS?(Model Qn 2012)In spite of the advantages of using a DBMS, there are a few situations in which such a system may involve unnecessary overhead costs as that would not be incurred in traditional file processing. The overhead costs of using a DBMS are due to the following: • High initial investment in hardware, software, and training. • Generality that a DBMS provides for defining and processing data. • Overhead for providing security, concurrency control, recovery, and integrity functions. Additional problems may arise if the database designers and DBA do not properly design the database or if the database systems applications are not implemented properly. Hence, it may be more desirable to use regular files under the following circumstances: • The database and applications are simple, well defined, and not expected to change. • There are stringent real-time requirements for some programs that may not be met because of DBMS overhead. • Multiple-user access to data is not required. Why do we designate one of the candidate keys of a relation to be the primary key? (Model Qn 2012)In general, a relation schema may have more than one key. In this case, each of the keys is called a candidate key. For example, the CAR relation have two candidate keys: LicenseNumber and EngineSerialNumber. It is common to designate one of the candidate keys as the primary key of the relation. This is the candidate key whose values are used to identify tuples in the relation. when a relation schema has several candidate keys, the choice of one to become primary key is arbitrary; however, it is usually better to choose a primary key with a single attribute or a small number of attributes.How does multilevel indexing improve the efficiency of searching an index file? (Model Qn 2012)A multilevel index reduces the number of blocks accessed when searching for a record, given its indexing field value. The idea behind a multilevel index is to reduce the part of the index that we continue to search by bfri, the blocking factor for the index, which is larger than 2. Hence, the search space is reduced much faster. The value bfri is called the fan-out of the multilevel index, and we will refer to it by the symbol fo. Searching a multilevel index requires approximately (logfobi) block accesses, which is a smaller number than for binary search if the fan-out is larger than 2.To retain the benefits of using multilevel indexing while reducing index insertion and deletion problems, designers adopted a multilevel index that leaves some space in each of its blocks for inserting new entries. This is called a dynamic multilevel index and is often implemented by using data structures called B-trees and B+-trees.List two reasons why null values might be introduced into the database? (Model Qn 2012)In some cases a particular entity may not have an applicable value for an attribute. For example, the ApartmentNumber attribute of an address applies only to addresses that are in apartment buildings and not to other types of residences, such as single-family homes. Similarly, a College Degrees attribute applies only to persons with college degrees. For such situations, a special value called null is created.Null can also be used if we do not know the value of an attribute for a particular entity—for example, if we do not know the home phone.The unknown category of null can be further classified into two cases. The first case arises when it is known that the attribute value exists but is missing—for example, if the Height attribute of a person is listed as null. The second case arises when it is not known whether the attribute value exists—for example, if the HomePhone attribute of a person is null.What are the responsibilities of the DBA and the database designers?The DBA is responsible for authorizing access to the database, for coordinating and monitoring its use, and for acquiring software and hardware resources as needed. The DBA is accountable for problems such as breach of security or poor system response time. Database designers are responsible for identifying the data to be stored in the database and for choosing appropriate structures to represent and store this data. It is the responsibility of database designers to communicate with all prospective database users, in order to understand their requirements, and to come up with a design that meets these requirements.[Also Mention the functions of DBA given in the notes].Define the following terms: data, database, DBMS, database system, database catalog, program-data independence, user view, DBA, end user, canned transaction, deductive database system, persistent object, meta-data, transaction processing application. DATA:By data, we mean known facts that can be recorded and that have implicit meaning. For example, consider the names, telephone numbers, and addresses of the people you know.DATABASE:A database is a collection of related data.[check notes] A database has the following implicit properties: • A database represents some aspect of the real world, sometimes called the miniworld or the Universe of Discourse (UoD). Changes to the miniworld are reflected in the database. • A database is a logically coherent collection of data with some inherent meaning. A random assortment of data cannot correctly be referred to as a database. • A database is designed, built, and populated with data for a specific purpose. It has an intended group of users and some preconceived applications in which these users are interested. DBMS:A database management system (DBMS) is a collection of programs that enables users to create and maintain a database. The DBMS is hence a general-purpose software system that facilitates the processes of defining, constructing, and manipulating databases for various applications. Defining a database involves specifying the data types, structures, and constraints for the data to be stored in the database. Constructing the database is the process of storing the data itself on some storage medium that is controlled by the DBMS. Manipulating a database includes such functions as querying the database to retrieve specific data, updating the database to reflect changes in the miniworld, and generating reports from the data.DATABASE SYSTEM:A database is a collection of related data. A database management system (DBMS) is a collection of programs that enables users to create and maintain a database. The DBMS is hence a general-purpose software system that facilitates the processes of defining, constructing, and manipulating databases for various applications. The database and DBMS software together a database system. DATABASE CATALOG:A fundamental characteristic of the database approach is that the database system contains not only the database itself but also a complete definition or description of the database structure and constraints. This definition is stored in the system catalog, which contains information such as the structure of each file, the type and storage format of each data item, and various constraints on the data. The information stored in the catalog is called meta-data, and it describes the structure of the primary database. The catalog is used by the DBMS software and also by database users who need information about the database structure. A general purpose DBMS software package is not written for a specific database application, and hence it must refer to the catalog to know the structure of the files in a specific database, such as the type and format of data it will access.PROGRAM-DATA INDEPENDENCE:In traditional file processing, the structure of data files is embedded in the access programs, so any changes to the structure of a file may require changing all programs that access this file. By contrast, DBMS access programs do not require such changes in most cases. The structure of data files is stored in the DBMS catalog separately from the access programs. We call this property program-data independence. For example, a file access program may be written in such a way that it can access only STUDENT records of the structure. If we want to add another piece of data to each STUDENT record, say the Birthdate, such a program will no longer work and must be changed.User View:A database typically has many users, each of whom may require a different perspective or view of the database. A view may be a subset of the database or it may contain virtual data that is derived from the database files but is not explicitly stored. Some users may not need to be aware of whether the data they refer to is stored or derived. A multiuser DBMS whose users have a variety of applications must provide facilities for defining multiple views.DBA:In a database environment, the primary resource is the database itself and the secondary resource is the DBMS and related software. Administering these resources is the responsibility of the database administrator (DBA). The DBA is responsible for authorizing access to the database, for coordinating and monitoring its use, and for acquiring software and hardware resources as needed. The DBA is accountable for problems such as breach of security or poor system response time.[mention responsibilities of DBA].END USER:End users are the people whose jobs require access to the database for querying, updating, and generating reports; the database primarily exists for their use. There are several categories of end users:Casual end users occasionally access the database, but they may need different information each time. They use a sophisticated database query language to specify their requests and are typically middle- or high-level managers or other occasional browsers. Naive or parametric end users make up a sizable portion of database end users or they are unsophisticated users who interact with the system by invoking one of the application programs that have been written previously. Their main job function revolves around constantly querying and updating the database, using standard types of queries and updates—called canned transactions—that have been carefully programmed and tested. The tasks that such users perform are varied: CANNED TRANSACTIONS:Naive or parametric end users make up a sizable portion of database end users or they are unsophisticated users who interact with the system by invoking one of the application programs that have been written previously. Their main job function revolves around constantly querying and updating the database, using standard types of queries and updates—called canned transactions—that have been carefully programmed and tested. The tasks that such users perform are varied:DEDUCTIVE DATABASE SYSTEM:A deductive database system is a database system that includes capabilities to define (deductive) rules, which can deduce or infer additional information from the facts that are stored in a database. (These are different from the active rules discussed in Chapter 23.) Because part of the theoretical foundation for some deductive database systems is mathematical logic, such rules are often referred to as logic databases. Other types of systems, referred to as expert database systems or knowledge-based systems.PERSISTENT OBJECT:Transient objects exist in the executing program and disappear once the program terminates. Persistent objects are stored in the database and persist after program termination. The typical mechanisms for making an object persistent are naming and reachability.METADATA:System catalog, which is at the heart of any general-purpose DBMS. It is a "minidatabase" itself, and one of its main functions is to store the schemas, or descriptions, of the databases that the DBMS maintains. Such information is often called metadata. It includes a description of the conceptual database schema, the internal schema, any external schemas, and the mappings between schemas at different levels. In addition, information needed by specific DBMS modules—for example, the query optimization module or the security and authorization module—is stored in the catalog.TRANSACTION PROCESSING APPLICATION:The DBMS must include concurrency control software to ensure that several users trying to update the same data do so in a controlled manner so that the result of the updates is correct. For example, when several reservation clerks try to assign a seat on an airline flight, the DBMS should ensure that each seat can be accessed by only one clerk at a time for assignment to a passenger. These types of applications are generally called on-line transaction processing (OLTP) applications.What are the different types of database end users? Discuss the main activities of each. End users are the people whose jobs require access to the database for querying, updating, and generating reports; the database primarily exists for their use. There are several categories of end users:Casual end users occasionally access the database, but they may need different information each time. They use a sophisticated database query language to specify their requests and are typically middle- or high-level managers or other occasional browsers. Naive or parametric end users make up a sizable portion of database end users or they are unsophisticated users who interact with the system by invoking one of the application programs that have been written previously. Their main job function revolves around constantly querying and updating the database, using standard types of queries and updates—called canned transactions—that have been carefully programmed and tested. The tasks that such users perform are varied.Sophisticated end users include engineers, scientists, business analysts, and others who thoroughly familiarize themselves with the facilities of the DBMS so as to implement their applications to meet their complex requirements. Stand-alone users maintain personal databases by using ready-made program packages that provide easy-to-use menu- or graphics-based interfaces.. Discuss the capabilities that should be provided by a DBMS. Controlling redundancy. Restricting unauthorized access. Providing persistent storage for program objects and data structures. Permitting inferencing and actions by using rules. Providing multiple user interfaces. Representing complex relationships among data. Enforcing integrity constraints. Providing backup and recovery.What is the difference between controlled and uncontrolled redundancy? Illustrate with examples. Redundancy in storing the same data multiple times leads to several problems. First, there is the need to perform a single logical update—such as entering data on a new student—multiple times: once for each file where student data is recorded. This leads to duplication of effort. Second, storage space is wasted when the same data is stored repeatedly, and this problem may be serious for large databases. Third, files that represent the same data may become inconsistent. This may happen because an update is applied to some of the files but not to others. Even if an update—such as adding a new student—is applied to all the appropriate files, the data concerning the student may still be inconsistent since the updates are applied independently by each user group. In the database approach, the views of different user groups are integrated during database design. For consistency, we should have a database design that stores each logical data item—such as a student’s name or birth date—in only one place in the database. This does not permit inconsistency, and it saves storage space. However, in some cases, controlled redundancy may be useful for improving the performance of queries. Discuss the naming conventions used for ER schema diagrams ?The choice of names for entity types, attributes, relationship types, and (particularly) roles is not always straightforward. One should choose names that convey, as much as possible, the meanings attached to the different constructs in the schema. We choose to use singular names for entity types, rather than plural ones, because the entity type name applies to each individual entity belonging to the entity type. In our ER diagrams, we will use the convention that entity type and relationship type names are in uppercase letters, attribute names are capitalized, and role names are in lowercase letters. As a general practice, given a narrative description of the database requirements, the nouns appearing in the narrative tend to give rise to entity type names, and the verbs tend to indicate names of relationship types. Attribute names generally arise from additional nouns that describe the nouns corresponding to entity types. Another naming consideration involves choosing relationship names to make the ER diagram of the schema readable from left to right and from top to bottom.What are the different applications of Database Management System?Banking: For keeping customer information, accounts, loans, transactions.Universities: For student information, course registration.Airlines: For reservations and schedule information.Telecommunications: For keeping records of call, generating monthly bills, storing information about communication networks.Define the following terms: data model, database schema, database state, internal schema, conceptual schema, external schema, data independence, DDL, DML, SDL, VDL, query language, host language, data sublanguage, database utility,client-server architecture.DATAMODEL:A data model—a collection of concepts that can be used to describe the structure of a database—provides the necessary means to achieve this abstraction .Different types of data model include:Relational data modelEntity Relational data modelObject Based data modelSemi structured data modelHierarchial data modelNetwork data modelDATABASE SCHEMA (Intension)The description of a database or overall design of a database is called the database schema, which is specified during database design and is not expected to change frequently. A displayed schema is called a schema diagram. A schema diagram displays only some aspects of a schema, such as the names of record types and data items, and some types of constraints.DATABASE STATE(Extension)The data in the database at a particular moment in time or collection of information or dfata in the database at a particular moment is called a database state or snapshot. It is also called the current set of occurrences or instances in the database. The DBMS is partly responsible for ensuring that every state of the database is a valid state—that is, a state that satisfies the structure and constraints specified in the schema. The DBMS stores the descriptions of the schema constructs and constraints—also called the meta-data—in the DBMS catalog so that DBMS software can refer to the schema whenever it needs to. The schema is sometimes called the intension, and a database state an extension of the schema.INTERNAL SCHEMA:The internal level has an internal schema, which describes the physical storage structure of the database. The internal schema uses a physical data model and describes the complete details of data storage and access paths for the database. CONCEPTUAL SCHEMAThe conceptual level has a conceptual schema, which describes the structure of the whole database for a community of users. The conceptual schema hides the details of physical storage structures and concentrates on describing entities, data types, relationships, user operations, and constraints. A high-level data model or an implementation data model can be used at this level. EXTERNAL SCHEMAThe external or view level includes a number of external schemas or user views. Each external schema describes the part of the database that a particular user group is interested in and hides the rest of the database from that user group. A high-level data model or an implementation data model can be used at this level. DATA INDEPENDENCEThe three-schema architecture can be used to explain the concept of data independence, which can be defined as the capacity to change the schema at one level of a database system without having to change the schema at the next higher level. We can define two types of data independence: Logical data independence is the capacity to change the conceptual schema without having to change external schemas or application programs.Physical data independence is the capacity to change the internal schema without having to change the conceptual (or external) schemas. DDLData definition language (DDL), is used by the DBA and by database designers to define both schemas. The DBMS will have a DDL compiler whose function is to process DDL statements in order to identify descriptions of the schema constructs and to store the schema description in the DBMS catalog. In DBMSs where a clear separation is maintained between the conceptual and internal levels, the DDL is used to specify the conceptual schema only.DMLOnce the database schemas are compiled and the database is populated with data, users must have some means to manipulate the database. Typical manipulations include retrieval, insertion, deletion, and modification of the data. The DBMS provides a data manipulation language (DML) for these purposes.SDL Storage definition language (SDL), is used to specify the internal schema. The mappings between the two schemas may be specified in either one of these languages.VDLView definition language (VDL), to specify user views and their mappings to the conceptual schema, but in most DBMSs the DDL is used to define both conceptual and external schemas.DCLData Control Language eg:commitHOST LANGUAGEWhenever DML commands, whether high-level or low-level, are embedded in a general-purpose programming language, that language is called the host language and the DML is called the data sublanguage .DATA SUBLANGUAGEWhenever DML commands, whether high-level or low-level, are embedded in a general-purpose programming language, that language is called the host language and the DML is called the data sublanguage .QUERY LANGUAGEOn the other hand, a high-level DML used in a stand-alone interactive manner is called a query language.DATABASE UTILITYDBMSs have database utilities that help the DBA in managing the database system. Common utilities have the following types of functions: Loading: A loading utility is used to load existing data files—such as text files or sequential files—into the database. Backup: A backup utility creates a backup copy of the database, usually by dumping the entire database onto tape. File reorganization: This utility can be used to reorganize a database file into a different file organization to improve performance. Performance monitoring: Such a utility monitors database usage and provides statistics to the DBA. CLIENT SERVER ARCHITECTUREIn a basic client-server architecture, the system functionality is distributed between two types of modules. A client module is typically designed so that it will run on a user workstation or personal computer. Typically, application programs and user interfaces that access the database run in the client module. Hence, the client module handles user interaction and provides the user-friendly interfaces such as forms or menu-based GUIs (graphical user interfaces). The other kind of module, called a server module, typically handles data storage, access, search, and other function.With what other computer system software does a DBMS interact? The DBMS also needs to interface with communications software, whose function is to allow users at locations remote from the database system site to access the database through computer terminals, workstations, or their local personal computers. These are connected to the database site through data communications hardware such as phone lines, long-haul networks, local-area networks, or satellite communication devices. Many commercial database systems have communication packages that work with the DBMS. The integrated DBMS and data communications system is called a DB/DC system. In addition, some distributed DBMSs are physically distributed over multiple machines. In this case, communications networks are needed to connect the machines. These are often local area networks (LANs) but they can also be other types of networks.What is a relationship type? Explain the differences among a relationship instance, a relationship type, and a relationship set. A relationship type R among n entity types , , . . ., defines a set of associations—or a relationship set—among entities from these types. As for entity types and entity sets, a relationship type and its corresponding relationship set are customarily referred to by the same name R. Mathematically, the relationship set R is a set of relationship instances , where each associates n individual entities (, , . . ., ), and each entity in is a member of entity type , 1 1 j 1 n. Hence, a relationship type is a mathematical relation on , , . . ., , or alternatively it can be defined as a subset of the Cartesian product x x . . . x . Each of the entity types , , . . ., is said to participate in the relationship type R, and similarly each of the individual entities , , . . ., is said to participate in the relationship instance = (, , . . ., ). Informally, each relationship instance in R is an association of entities, where the association includes exactly one entity from each participating entity type. Each such relationship instance represents the fact that the entities participating in are related in some way in the corresponding miniworld situation.When is the concept of a weak entity used in data modeling? Define the terms owner entity type, weak entity type, identifying relationship type, and partial key?Entity types that do not have key attributes of their own are called weak entity types. In contrast, regular entity types that do have a key attribute are sometimes called strong entity types. Entities belonging to a weak entity type are identified by being related to specific entities from another entity type in combination with some of their attribute values. We call this other entity type the identifying or owner entity type and we call the relationship type that relates a weak entity type to its owner the identifying relationship of the weak entity type . A weak entity type always has a total participation constraint (existence dependency) with respect to its identifying relationship, because a weak entity cannot be identified without an owner entity. However, not every existence dependency results in a weak entity type. For example, a DRIVER_LICENSE entity cannot exist unless it is related to a PERSON entity, even though it has its own key (LicenseNumber) and hence is not a weak entity.A weak entity type normally has a partial key, which is the set of attributes that can uniquely identify weak entities that are related to the same owner entity. a composite attribute of all the weak entity’s attributes will be the partial key.5 Marks QuestionsDiscuss the entity integrity and referential integrity constraints. Why is each consideredimportant? (Model Qn 2012)The entity integrity constraint states that no primary key value can be null. This is because the primary key value is used to identify individual tuples in a relation; having null values for the primary key implies that we cannot identify some tuples. For example, if two or more tuples had null for their primary keys, we might not be able to distinguish them. Key constraints and entity integrity constraints are specified on individual relations. The referential integrity constraint is specified between two relations and is used to maintain the consistency among tuples of the two relations. Informally, the referential integrity constraint states that a tuple in one relation that refers to another relation must refer to an existing tuple in that relation. To define referential integrity more formally, we first define the concept of a foreign key. The conditions for a foreign key, given below, specify a referential integrity constraint between the two relation schemas R1 and R2. A set of attributes FK in relation schema R1 is a foreign key of R1 that references relation R2 if it satisfies the following two rules: 1. The attributes in FK have the same domain(s) as the primary key attributes PK of R2; the attributes FK are said to reference or refer to the relation R2. 2. A value of FK in a tuple t1 of the current state r1(R1) either occurs as a value of PK for some tuple t2 in the current state r2(R2) or is null. In the former case, we have t1[FK] = t2[PK], and we say that the tuple t1 references or refers to the tuple t2. R1 is called the referencing relation and R2 is the referenced relation. In a database of many relations, there are usually many referential integrity constraints. To specify these constraints, we must first have a clear understanding of the meaning or role that each set of attributes plays in the various relation schemas of the database. Referential integrity constraints typically arise from the relationships among the entities represented by the relation schema. a foreign key can refer to its own relation.All integrity constraints should be specified on the relational database schema if we want to enforce these constraints on the database states. Hence, the DDL includes provisions for specifying the various types of constraints so that the DBMS can automatically enforce them. Most relational DBMSs support key and entity integrity constraints, and make provisions to support referential integrity. These constraints are specified as a part of data definition.Explain the 3 main categories of data model?We can categorize them according to the types of concepts they use to describe the database structure.High-level or conceptual data models provide concepts that are close to the way many users perceive data. Low-level or physical data models provide concepts that describe the details of how data is stored in the computer. Concepts provided by low-level data models are generally meant for computer specialists, not for typical end users. Between these two extremes is a class of Representational (or implementation) data models, which provide concepts that may be understood by end users but that are not too far removed from the way data is organized within the computer. Representational data models hide some details of data storage but can be implemented on a computer system in a direct way. Conceptual data models use concepts such as entities, attributes, and relationships. An entity represents a real-world object or concept, such as an employee or a project, that is described in the database. An attribute represents some property of interest that further describes an entity, such as the employee’s name or salary. A relationship among two or more entities represents an interaction among the entities.Representational or implementation data models are the models used most frequently in traditional commercial DBMSs, and they include the widely-used relational data model, as well as the so-called legacy data models—the network and hierarchical models—that have been widely used in the past. Representational data models represent data by using record structures and hence are sometimes called record-based data models. Physical data models describe how data is stored in the computer by representing information such as record formats, record orderings, and access paths. An access path is a structure that makes the search for particular database records efficient.What is the difference between logical data independence and physical data independence? The three-schema architecture can be used to explain the concept of data independence, which can be defined as the capacity to change the schema at one level of a database system without having to change the schema at the next higher level.Logical data independence is the capacity to change the conceptual schema without having to change external schemas or application programs. We may change the conceptual schema to expand the database (by adding a record type or data item), or to reduce the database (by removing a record type or data item). In the latter case, external schemas that refer only to the remaining data should not be affected. Only the view definition and the mappings need be changed in a DBMS that supports logical data independence. Application programs that reference the external schema constructs must work as before, after the conceptual schema undergoes a logical reorganization. Changes to constraints can be applied also to the conceptual schema without affecting the external schemas or application programs. Physical data independence is the capacity to change the internal schema without having to change the conceptual (or external) schemas. Changes to the internal schema may be needed because some physical files had to be reorganized—for example, by creating additional access structures—to improve the performance of retrieval or update. If the same data as before remains in the database, we should not have to change the conceptual schema. Data independence is accomplished because, when the schema is changed at some level, the schema at the next higher level remains unchanged; only the mapping between the two levels is changed. Hence, application programs referring to the higher-level schema need not be changed. The three-schema architecture can make it easier to achieve true data independence, both physical and logical. However, the two levels of mappings create an overhead during compilation or execution of a query or program, leading to inefficiencies in the DBMS. Because of this, few DBMSs have implemented the full three-schema architecture.What is the difference between procedural and nonprocedural DMLs? There are two main types of DMLs. A high-level or nonprocedural DML can be used on its own to specify complex database operations in a concise manner. Many DBMSs allow high-level DML statements either to be entered interactively from a terminal (or monitor) or to be embedded in a general-purpose programming language. In the latter case, DML statements must be identified within the program so that they can be extracted by a pre-compiler and processed by the DBMS. A low-level or procedural DML must be embedded in a general-purpose programming language. This type of DML typically retrieves individual records or objects from the database and processes each separately. Hence, it needs to use programming language constructs, such as looping, to retrieve and process each record from a set of records. Low-level DMLs are also called record-at-a-time DMLs because of the statement and are hence called set-at-a-time or set-oriented DMLs. A query in a high-level DML often specifies which data to retrieve rather than how to retrieve it; hence, such languages are also called declarative.What is the difference between a database schema and a database state? In any data model it is important to distinguish between the description of the database and the database itself. The description of a database is called the database schema, which is specified during database design and is not expected to change frequently . Most data models have certain conventions for displaying the schemas as diagrams. A displayed schema is called a schema diagram. The actual data in a database may change quite frequently. The data in the database at a particular moment in time is called a database state or snapshot. It is also called the current set of occurrences or instances in the database. In a given database state, each schema construct has its own current set of instances.The distinction between database schema and database state is very important. When we define a new database, we specify its database schema only to the DBMS. At this point, the corresponding database state is the empty state with no data. We get the initial state of the database when the database is first populated or loaded with the initial data. From then on, every time an update operation is applied to the database, we get another database state. At any point in time, the database has a current state The DBMS is partly responsible for ensuring that every state of the database is a valid state—that is, a state that satisfies the structure and constraints specified in the schema. Hence, specifying a correct schema to the DBMS is extremely important, and the schema must be designed with the utmost care. The DBMS stores the descriptions of the schema constructs and constraints—also called the meta-data—in the DBMS catalog so that DBMS software can refer to the schema whenever it needs to. The schema is sometimes called the intension, and a database state an extension of the schema.List the various cases where use of a null value would be appropriate. . . In some cases a particular entity may not have an applicable value for an attribute. For example, the ApartmentNumber attribute of an address applies only to addresses that are in apartment buildings and not to other types of residences, such as single-family homes. Similarly, a College Degrees attribute applies only to persons with college degrees. For such situations, a special value called null is created.Null can also be used if we do not know the value of an attribute for a particular entity—for example, if we do not know the home phone.The unknown category of null can be further classified into two cases. The first case arises when it is known that the attribute value exists but is missing—for example, if the Height attribute of a person is listed as null. The second case arises when it is not known whether the attribute value exists—for example, if the HomePhone attribute of a person is null.Define the following terms: entity, attribute, attribute value, relationship instance, composite attribute, multivalued attribute, derived attribute, complex attribute, key attribute, value set (domain). ENTITYThe basic object that the ER model represents is an entity, which is a "thing" in the real world with an independent existence. An entity may be an object with a physical existence—a particular person, car, house, or employee—or it may be an object with a conceptual existence—a company, a job, or a university course.ATTRIBUTE & ATTRIBUTE VALUEThe basic object that the ER model represents is an entity, which is a "thing" in the real world with an independent existence. An entity may be an object with a physical existence—a particular person, car, house, or employee—or it may be an object with a conceptual existence—a company, a job, or a university course. Each entity has attributes—the particular properties that describe it. For example, an employee entity may be described by the employee’s name, age, address, salary, and job. A particular entity will have a value for each of its attributes. The attribute values that describe each entity become a major part of the data stored in the database.composite attribute, multivalued attribute, derived attribute, complex attribute[refer notes for definitions and differences among each]KEY ATTRIBUTEAn important constraint on the entities of an entity type is the key or uniqueness constraint on attributes. An entity type usually has an attribute whose values are distinct for each individual entity in the collection. Such an attribute is called a key attribute, and its values can be used to identify each entity uniquely. VALUE SET (DOMAIN)Each simple attribute of an entity type is associated with a value set (or domain of values), which specifies the set of values that may be assigned to that attribute for each individual entity. We refer to the value of attribute A for entity e as A(e). The previous definition covers both single-valued and multivalued attributes, as well as nulls. A null value is represented by the empty set. For single-valued attributes, A(e) is restricted to being a singleton for each entity e in E whereas there is no restriction on multivalued attributes (Note 6). For a composite attribute A, the value set V is the Cartesian product of P(), P(), . . ., P(), where , , . . ., are the value sets of the simple component attributes that form A:What is an entity type? What is an entity set? Explain the differences among an entity, an entity type, and an entity set. The basic object that the ER model represents is an entity, which is a "thing" in the real world with an independent existence. An entity may be an object with a physical existence—a particular person, car, house, or employee—or it may be an object with a conceptual existence—a company, a job, or a university course. Each entity has attributes—the particular properties that describe it. For example, an employee entity may be described by the employee’s name, age, address, salary, and job.`An entity type defines a collection (or set) of entities that have the same attributes. Each entity type in the database is described by its name and attributes.The collection of all entities of a particular entity type in the database at any point in time is called an entity set; the entity set is usually referred to using the same name as the entity type.An entity type describes the schema or intension for a set of entities that share the same structure. The collection of entities of a particular entity type is grouped into an entity set, which is also called the extension of the entity type.Explain the difference between attribute and value set[Refer qn 18].12 Mark QuestionsDescribe the three-schema architecture. Why do we need mappings between schema levels? How do different schema definition languages support this architecture? (Model Qn)The goal of the three-schema architecture,is to separate the user applications and the physical database. In this architecture, schemas can be defined at the following three levels: 1. The internal level has an internal schema, which describes the physical storage structure of the database. The internal schema uses a physical data model and describes the complete details of data storage and access paths for the database. 2. The conceptual level has a conceptual schema, which describes the structure of the whole database for a community of users. The conceptual schema hides the details of physical storage structures and concentrates on describing entities, data types, relationships, user operations, and constraints. A high-level data model or an implementation data model can be used at this level. 3. The external or view level includes a number of external schemas or user views. Each external schema describes the part of the database that a particular user group is interested in and hides the rest of the database from that user group. A high-level data model or an implementation data model can be used at this level. The three schemas are only descriptions of data; the only data that actually exists is at the physical level. In a DBMS based on the three-schema architecture, each user group refers only to its own external schema. Hence, the DBMS must transform a request specified on an external schema into a request against the conceptual schema, and then into a request on the internal schema for processing over the stored database. If the request is a database retrieval, the data extracted from the stored database must be reformatted to match the user’s external view. The processes of transforming requests and results between levels are called mappings. These mappings may be time-consuming, so some DBMSs—especially those that are meant to support small databases—do not support external views. Even in such systems, however, a certain amount of mapping is necessary to transform requests between the conceptual and internal levels.[REFER NOTES FOR 3 SCHEMA ARCHITECTURE FIGURE].DDLData definition language (DDL), is used by the DBA and by database designers to define both schemas. The DBMS will have a DDL compiler whose function is to process DDL statements in order to identify descriptions of the schema constructs and to store the schema description in the DBMS catalog. In DBMSs where a clear separation is maintained between the conceptual and internal levels, the DDL is used to specify the conceptual schema only.DMLOnce the database schemas are compiled and the database is populated with data, users must have some means to manipulate the database. Typical manipulations include retrieval, insertion, deletion, and modification of the data. The DBMS provides a data manipulation language (DML) for these purposes.SDL Storage definition language (SDL), is used to specify the internal schema. The mappings between the two schemas may be specified in either one of these languages.VDLView definition language (VDL), to specify user views and their mappings to the conceptual schema, but in most DBMSs the DDL is used to define both conceptual and external schemas.DCLData Control Language eg:commitHOST LANGUAGEWhenever DML commands, whether high-level or low-level, are embedded in a general-purpose programming language, that language is called the host language and the DML is called the data sublanguage .DATA SUBLANGUAGEWhenever DML commands, whether high-level or low-level, are embedded in a general-purpose programming language, that language is called the host language and the DML is called the data sublanguage .QUERY LANGUAGEOn the other hand, a high-level DML used in a stand-alone interactive manner is called a query language.REFER NOTES FOR THE FOLLOWING QUESTIONSExplain the structure of DBMS with the help of a neat diagram and mention the functions of its components?[ (COMPONENTS OF DBMS)].Purpose of Database systems or Disadvantages of File processing system? Discuss the main characteristics of the database approach and how it differs from traditional file systems?Explain ER Model with a neat diagram?Explain data models(explain each with figure)Module II3 Mark QuestionsDiscuss the various reasons that lead to the occurrence of null values in relations?[Refer qn4(module 1)].Discuss the entity integrity and referential integrity constraints. Why is each considered important? [Refer qn16].Why do we designate one of the candidate keys of a relation to be the primary key? [Model qn2012]In general, a relation schema may have more than one key. In this case, each of the keys is called a candidate key. For example, the CAR relation has two candidate keys: LicenseNumber and EngineSerialNumber. It is common to designate one of the candidate keys as the primary key of the relation. This is the candidate key whose values are used to identify tuples in the relation.When a relation schema has several candidate keys, the choice of one to become primary key is arbitrary; however, it is usually better to choose a primary key with a single attribute or a small number of attributes.What is the difference between a key and a superkey? A superkey SK specifies a uniqueness constraint that no two distinct tuples in a state r of R can have the same value for SK. Every relation has at least one default superkey—the set of all its attributes. A superkey can have redundant attributes, however, so a more useful concept is that of a key, which has no redundancy.t1[SK]= t2[pK] A key K of a relation schema R is a superkey of R with the additional property that removing any attribute A from K leaves a set of attributes K’ that is not a superkey of R. Hence, a key is a minimal superkey—that is, a superkey from which we cannot remove any attributes and still have the uniqueness constraint hold.Why are duplicate tuples not allowed in a relation? SQL usually treats a table not as a set but rather as a multiset; duplicate tuples can appear more than once in a table, and in the result of a query. SQL does not automatically eliminate duplicate tuples in the results of queries, for the following reasons: Duplicate elimination is an expensive operation. One way to implement it is to sort the tuples first and then eliminate duplicates. The user may want to see duplicate tuples in the result of a query. When an aggregate function is applied to tuples, in most cases we do not want to eliminate duplicates Why are tuples in a relation not ordered ?According to the definition, a tuple can be considered as a set of (, ) pairs, where each pair gives the value of the mapping from an attribute Ai to a value vi from dom(Ai). The ordering of attributes is not important, because the attribute name appears with its value.therefore t1 and t2 are equal.5 Mark QuestionsDiscuss the characteristics of relations that make them different from ordinary tables and files. [refer note and explain each one]Ordering of Tuples in a Relation Ordering of Values within a Tuple, and an Alternative Definition of a RelationValues in the TuplesInterpretation of a RelationDefine foreign key. What is this concept used for? How does it play a role in the join operation ?The referential integrity constraint is specified between two relations and is used to maintain the consistency among tuples of the two relations. Informally, the referential integrity constraint states that a tuple in one relation that refers to another relation must refer to an existing tuple in that relation.To define referential integrity more formally, we first define the concept of a foreign key. The conditions for a foreign key, given below, specify a referential integrity constraint between the two relation schemas R1 and R2. A set of attributes FK in relation schema R1 is a foreign key of R1 that references relation R2 if it satisfies the following two rules: 1. The attributes in FK have the same domain(s) as the primary key attributes PK of R2; the attributes FK are said to reference or refer to the relation R2. 2. A value of FK in a tuple t1 of the current state r1(R1) either occurs as a value of PK for some tuple t2 in the current state r2(R2) or is null. In the former case, we have t1[FK] = t2[PK], and we say that the tuple t1 references or refers to the tuple t2. R1 is called the referencing relation and R2 is the referenced relation. The JOIN operation, is used to combine related tuples from two relations into single tuples. This operation is very important for any relational database with more than a single relation, because it allows to process relationships. For this purpose foreign key is used .[Refer Notes]12 Mark QuestionsList the operations of relational algebra and the purpose of each?[Model qn2012]Discuss the various types of JOIN operations. Why is theta join required? How are the OUTER JOIN operations different from the (inner) JOIN operations? How is the OUTER UNION operation different from UNION? Explain about different database Constraints and Data Definition with SQL ?[Model qn2012]Explain in detail about different Relational Algebra and Tuple Relational Calculus?[Model qn 2012] What is a view in SQL, and how is it defined? Discuss the problems that may arise when one attempts to update a view. How are views typically implemented? [important]List the data types that are allowed for SQL2 attributes? MODULE III[REFER NOTES FOR ORACLE PORTION+REMANING NOTES INCLUDED IN THIS PORTION]12 Mark QuestionsExplain Hashing? Briefly Discuss about Static and Dynamic Hashing?[similar to hashing studied in Data structures]{Model Qn2012}Another type of primary file organization is based on hashing, which provides very fast access to records on certain search conditions. This organization is usually called a hash file . The search condition must be an equality condition on a single field, called the hash field of the file. In most cases, the hash field is also a key field of the file, in which case it is called the hash key. The idea behind hashing is to provide a function h, called a hash function or randomizing function that is applied to the hash field value of a record and yields the address of the disk block in which the record is stored. A search for the record within the block can be carried out in a main memory buffer. For most records, we need only a single-block access to retrieve that record. Hashing is also used as an internal search structure within a program whenever a group of records is accessed exclusively by using the value of one field.Hash IndicesA hash index organizes the search keys with their associated pointers into a hash file structure.We apply a hash function on a search key to identify a bucket, and store the key and its associated pointers in the bucket (or in overflow buckets).Strictly speaking, hash indices are only secondary index structures, since if a file itself is organized using hashing, there is no need for a separate hash index structure on it.Internal HashingFor internal files, hashing is typically implemented as a hash table through the use of an array of records. Suppose that the array index range is from 0 to M - 1 then we have M slots whose addresses correspond to the array indexes. We choose a hash function that transforms the hash field value into an integer between 0 and M - 1. One common hash function is the h(K) = K mod M function, which returns the remainder of an integer hash field value K after division by M; this value is then used for the record address. Non integer hash field values can be transformed into integers before the mod function is applied. For character strings, the numeric (ASCII) codes associated with characters can be used in the transformation.Other hashing functions can be used. One technique, called folding[folding method studied in DS], involves applying an arithmetic function such as addition or a logical function such as exclusive or to different portions of the hash field value to calculate the hash address. Another technique involves picking some digits of the hash field value—for example, the third, fifth, and eighth digits—to form the hash address .External HashingHashing for disk files is called external hashing. To suit the characteristics of disk storage, the target address space is made of buckets, each of which holds multiple records. A bucket is either one disk block or a cluster of contiguous blocks. The hashing function maps a key into a relative bucket number, rather than assign an absolute block address to the bucket. A table maintained in the file header converts the bucket number into the corresponding disk block address. We can use a variation of chaining in which a pointer is maintained in each bucket to a linked list of overflow records for the bucket. The pointers in the linked list should be record pointers, which include both a block address and a relative record position within the block. Hashing provides the fastest possible access for retrieving an arbitrary record given the value of its hash field. Although most good hash functions do not maintain records in order of hash field values, some functions—called order preserving—do. A simple example of an order preserving hash function is to take the leftmost three digits of an invoice number field as the hash address and keep the records sorted by invoice number within each bucket. The hashing scheme described is called static hashing because a fixed number of buckets M is allocated.DYNAMIC HASHING [Important, Model qn2012]A major drawback of the static hashing scheme just discussed is that the hash address space is fixed. Hence, it is difficult to expand or shrink the file dynamically or Some hashing techniques allow the hash function to be modified dynamically to accommodate the growth or shrinking of the database. These are called dynamic hash functions.It includesExtendible Hashing Linear HashingThe first scheme—extendible hashing—stores an access structure in addition to the file, and hence is somewhat similar to indexing (Chapter 6). The main difference is that the access structure is based on the values that result after application of the hash function to the search field. In indexing, the access structure is based on the values of the search field itself. The second technique, called linear hashing, does not require additional access structures. Extendible Hashing Extendable hashing is one form of dynamic hashing.Extendable hashing splits and coalesces buckets as database size changes.This imposes some performance overhead, but space efficiency is maintained.As reorganization is on one bucket at a time, overhead is acceptably low.How does it work?   Figure 11.9:   General extendable hash structure.We choose a hash function that is uniform and random that generates values over a relatively large range.Range is b-bit binary integers (typically b=32) i.e (2 ^ 32 )over 4 billion buckets and we will not generate that much .Instead we create buckets on demand, and do not use all b bits of the hash initially.At any point we use i bits  .The i bits are used as an offset into a table of bucket addresses.Value of i grows and shrinks with the database.Note that the i appearing over the bucket address table tells how many bits are required to determine the correct bucket.It may be the case that several entries point to the same bucket.All such entries will have a common hash prefix, but the length of this prefix may be less than i.So we give each bucket an integer giving the length of the common hash prefix.To find the bucket containing search key value  :Compute  .Take the first i high order bits of  .Look at the corresponding table entry for this i-bit string.Follow the bucket pointer in the table entry.Advantages:Extendable hashing provides performance that does not degrade as the file grows.Minimal space overhead - no buckets need be reserved for future use. Bucket address table only contains one pointer for each hash value of current prefix length.Disadvantages:Extra level of indirection in the bucket address tableAdded complexityLinear HashingThe idea behind linear hashing is to allow a hash file to expand and shrink its number of buckets dynamically without needing a directory. Suppose that the file starts with M buckets numbered 0, 1, . . . , M - 1 and uses the mod hash function h(K) = K mod M; this hash function is called the initial hash function . Overflow because of collisions is still needed and can be handled by maintaining individual overflow chains for each bucket. However, when a collision leads to an overflow record in any file bucket, the first bucket in the file—bucket 0—is split into two buckets: the original bucket 0 and a new bucket M at the end of the file. The records originally in bucket 0 are distributed between the two buckets based on a different hashing function (K) = K mod 2M. A key property of the two hash functions and is that any records that hashed to bucket 0 based on will hash to either bucket 0 or bucket M based on ; this is necessary for linear hashing to work.What is the difference between a clustering index and a secondary index?Answer: The clustering index is on the field which specifies the sequential order of the file. There can be only one clustering index while there can be many secondary indices. Explain the distinction between closed and open hashing. Discuss the relative merits of each technique in database applications.Answer: Open hashing may place keys with the same hash function value in different buckets. Closed hashing always places such keys together in the same bucket. Thus in this case, different buckets can be of different sizes, though the implementation may be by linking together fixed size buckets using overflowchains. Deletion is difficult with open hashing as all the buckets may have to inspected before we can ascertain that a key value has been deleted, whereas in closed hashing only that bucket whose address is obtained by hashing the key value need be inspected. Deletions are more common in databases andhence closed hashing is more appropriate for them. For a small, static set of data lookups may be more efficient using open hashing. The symbol table of a compiler would be a good example. Why is a hash structure not the best choice for a search key on which range queries are likely?Answer: A range query cannot be answered efficiently using a hash index,we will have to read all the buckets. This is because key values in the range do not occupy consecutive locations in the buckets, they are distributed uniformly and randomly throughout all the buckets. Explain in detail about Indexing?Access structures called indexes, which are used to speed up the retrieval of records in response to certain search conditions. The index structures typically provide secondary access paths, which provide alternative ways of accessing the records without affecting the physical placement of records on disk. They enable efficient access to records based on the indexing fields that are used to construct the index. Basically, any field of the file can be used to create an index and multiple indexes on different fields can be constructed on the same file. The most prevalent types of indexes are based on ordered files (single-level indexes) and tree data structures (multilevel indexes, B+-trees). Indexes can also be constructed based on hashing or other search data structures. Different types of single-level ordered indexes—primary, secondary, and clusteringOrdered IndicesIn order to allow fast random access, an index structure may be used.A file may have several indices on different search keys.If the file containing the records is sequentially ordered, the index whose search key specifies the sequential order of the file is the primary index, or clustering index. Note: The search key of a primary index is usually the primary key, but it is not necessarily so.Indices whose search key specifies an order different from the sequential order of the file are called the secondary indices, or nonclustering indices.Types of Single-Level Ordered Indexes Primary Indexes Clustering Indexes Secondary IndexesPrimary IndexIndex-sequential files: Files are ordered sequentially on some search key, and a primary index is associated with it.A primary index is an ordered file whose records are of fixed length with two fields. The first field is of the same data type as the ordering key field—called the primary key—of the data file, and the second field is a pointer to a disk block (a block address). There is one index entry (or index record) in the index file for each block in the data file. Each index entry has the value of the primary key field for the first record in a block and a pointer to that block as its two field values. We will refer to the two field values of index entry i as .   Figure 11.1:   Sequential file for deposit records.Indexes can also be characterized as dense or sparse. A dense index has an index entry for every search key value (and hence every record) in the data file. A sparse (or nondense) index, on the other hand, has index entries for only some of the search values. A primary index is hence a nondense (sparse) index, since it includes an entry for each disk block of the data file rather than for every search value (or every record). The index file for a primary index needs substantially fewer blocks than does the data file, for two reasons. First, there are fewer index entries than there are records in the data file. Second, each index entry is typically smaller in size than a data record because it has only two fields; consequently, more index entries than data records can fit in one block. A binary search on the index file hence requires fewer block accesses than a binary search on the data file. Major problem with a primary index—as with any ordered file—is insertion and deletion of records. With a primary index, the problem is compounded because, if we attempt to insert a record in its correct position in the data file, we have to not only move records to make space for the new record but also change some index entries, since moving records will change the anchor records of some blocksDense and Sparse IndicesThere are Two types of primary ordered indices:Dense Index:An index record appears for every search key value in file.This record contains search key value and a pointer to the actual record.Sparse Index:Index records are created only for some of the records.To locate a record, we find the index record with the largest search key value less than or equal to the search key value we are looking for.We start at that record pointed to by the index record, and proceed along the pointers in the file (that is, sequentially) until we find the desired record.Figures 11.2 and 11.3 show dense and sparse indices for the deposit file.   Figure 11.2:   Dense index.   Figure 11.3:   Sparse index.Dense indices are faster in general, but sparse indices require less space and impose less maintenance for insertions and deletions. (Why?)A good compromise: to have a sparse index with one entry per block.Why is this good?Biggest cost is in bringing a block into main memory.We are guaranteed to have the correct block with this method, unless record is on an overflow block (actually could be several blocks).Index size still small.Secondary IndicesIf the search key of a secondary index is not a candidate key, it is not enough to point to just the first record with each search-key value because the remaining records with the same search-key value could be anywhere in the file. Therefore, a secondary index must contain pointers to all the records.A secondary index is also an ordered file with two fields. The first field is of the same data type as some nonordering field of the data file that is an indexing field. The second field is either a block pointer or a record pointer. There can be many secondary indexes (and hence, indexing fields) for the same file. We first consider a secondary index access structure on a key field that has a distinct value for every record. Such a field is sometimes called a secondary key. In this case there is one index entry for each record in the data file, which contains the value of the secondary key for the record and a pointer either to the block in which the record is stored or to the record itself. Hence, such an index is dense.   Figure 11.5:   Sparse secondary index on cname.We can use an extra-level of indirection to implement secondary indices on search keys that are not candidate keys. A pointer does not point directly to the file but to a bucket that contains pointers to the file.See Figure 11.5 on secondary key cname.To perform a lookup on Peterson, we must read all three records pointed to by entries in bucket 2.Only one entry points to a Peterson record, but three records need to be read.As file is not ordered physically by cname, this may take 3 block accesses.Secondary indices must be dense, with an index entry for every search-key value, and a pointer to every record in the file.Secondary indices improve the performance of queries on non-primary keys.They also impose serious overhead on database modification: whenever a file is updated, every index must be updated.Designer must decide whether to use secondary indices or not.A secondary index usually needs more storage space and longer search time than does a primary index, because of its larger number of entries. However, the improvement in search time for an arbitrary record is much greater for a secondary index than for a primary index, since we would have to do a linear search on the data file if the secondary index did not existClustering IndexesIf records of a file are physically ordered on a nonkey field—which does not have a distinct value for each record—that field is called the clustering field. We can create a different type of index, called a clustering index, to speed up retrieval of records that have the same value for the clustering field. This differs from a primary index, which requires that the ordering field of the data file have a distinct value for each record. . A clustering index is another example of a nondense index, because it has an entry for every distinct value of the indexing field rather than for every record in the file. A clustering index is also an ordered file with two fields; the first field is of the same type as the clustering field of the data file, and the second field is a block pointer. There is one entry in the clustering index for each distinct value of the clustering field, containing the value and a pointer to the first block in the data file that has a record with that value for its clustering field. A main difference is that an index search uses the values of the search field itself, whereas a hash directory search uses the hash value that is calculated by applying the hash function to the search field.[REFER FIG IN NOTES]Multi-Level IndicesEven with a sparse index, index size may still grow too large. For 100,000 records, 10 per block, at one index record per block, that's 10,000 index records! Even if we can fit 100 index records per block, this is 100 blocks.If index is too large to be kept in main memory, a search results in several disk reads.If there are no overflow blocks in the index, we can use binary search.This will read as many as  blocks (as many as 7 for our 100 blocks).If index has overflow blocks, then sequential search typically used, reading all b index blocks.Solution: Construct a sparse index on the index (Figure 11.4).A multilevel index considers the index file, which we will now refer to as the first (or base) level of a multilevel index, as an ordered file with a distinct value for each K(i). Hence we can create a primary index for the first level; this index to the first level is called the second level of the multilevel index. Because the second level is a primary index, we can use block anchors so that the second level has one entry for each block of the first level.   Figure 11.4:   Two-level sparse index.Use binary search on outer index. Scan index block found until correct index record found. Use index record as before - scan block pointed to for desired record.For very large files, additional levels of indexing may be required.Indices must be updated at all levels when insertions or deletions require it.Frequently, each level of index corresponds to a unit of physical storage (e.g. indices at the level of track, cylinder and disk).[REFER NOTES ]Explain about Structure of the Oracle System and how data are stored in oracle[ Important Model Qn 2012]?Differentiate between internal hashing and external hashing?MODULE IV&VDiscuss the attribute semantics as an informal measure of goodness for a relation schema?[refer note]Discuss insertion, deletion, and modification anomalies. Why are they considered bad? Illustrate with examples? [refer note]Why are many nulls in a relation considered bad? [refer note ,design guidelines]Discuss the problem of spurious tuples and how we may prevent it. [refer note]State the informal guidelines for relation schema design that we discussed. Illustrate how violation of these guidelines may be harmful. What is a functional dependency? Who specifies the functional dependencies that hold among the attributes of a relation schema? Draw a state diagram, and discuss the typical states that a transaction goes through during execution. (refer note states of trnsn).What is a serial schedule? What is a serializable schedule? Why is a serial schedule considered correct? The concept of serializability of schedules is used to identify which schedules are correct when transaction executions have interleaving of their operations in the schedules. Formally, a schedule S is serial if, for every transaction T participating in the schedule, all the operations of T are executed consecutively in the schedule; otherwise, the schedule is called nonserial. Hence, in a serial schedule, only one transaction at a time is active—the commit (or abort) of the active transaction initiates execution of the next transaction. No interleaving occurs in a serial schedule.A schedule S of n transactions is serializable if it is equivalent to some serial schedule of the same n transactions. We will define the concept of equivalence of schedules shortly. Notice that there are n! possible serial schedules of n transactions and many more possible nonserial schedules. We can form two disjoint groups of the nonserial schedules: those that are equivalent to one (or more) of the serial schedules, and hence are serializable; and those that are not equivalent to any serial schedule and hence are not serializable.Every serial schedule is considered correct. We can assume this because every transaction is assumed to be correct if executed on its own (according to the consistency preservation property .Hence, it does not matter which transaction is executed first. As long as every transaction is executed from beginning to end without any interference from the operations of other transactions, we get a correct end result on the database.What is a schedule (history)? Define the concepts of recoverable, cascadeless, and strict schedules, and compare them in terms of their recoverability. A schedule (or history) S of n transactions , , ..., is an ordering of the operations of the transactions subject to the constraint that, for each transaction that participates in S, the operations of in S must appear in the same order in which they occur in .once a transaction T is committed, it should never be necessary to roll back T. The schedules that theoretically meet this criterion are called recoverable schedules and those that do not are called nonrecoverable, and hence should not be permitted. A schedule S is recoverable if no transaction T in S commits until all transactions T that have written an item that T reads have committed. Recoverable schedules require a complex recovery process, but if sufficient information is kept (in the log).In a recoverable schedule, no committed transaction ever needs to be rolled back. However, it is possible for a phenomenon known as cascading rollback (or cascading abort) to occur, where an uncommitted transaction has to be rolled back because it read an item from a transaction that failed. This is illustrated in schedule , where transaction has to be rolled back because it read item X from , and then aborted. Because cascading rollback can be quite time-consuming—since numerous transactions can be rolled back—it is important to characterize the schedules where this phenomenon is guaranteed not to occur. A schedule is said to be cascadeless, or avoid cascading rollback, if every transaction in the schedule reads only items that were written by committed transactions. In this case, all items read will not be discarded, so no cascading rollback will occur. To satisfy this criterion, the (X) command in schedule must be postponed until after has committed (or aborted), thus delaying but ensuring no cascading rollback if aborts.Finally, there is a third, more restrictive type of schedule, called a strict schedule, in which transactions can neither read nor write an item X until the last transaction that wrote X has committed (or aborted). Strict schedules simplify the recovery process. In a strict schedule, the process of undoing a write_item(X) operation of an aborted transaction is simply to restore the before image (old_value or BFIM) of data item X. This simple procedure always works correctly for strict schedules, but it may not work for recoverable or cascadeless schedules.Discuss how serializability is used to enforce concurrency control in a database system. Why is serializability sometimes considered too restrictive as a measure of correctness for schedules? Define Constraint write and Unconstraint write assumption?The definitions of conflict serializability and view serializability are similar if a condition known as the constrained write assumption holds on all transactions in the schedule. This condition states that any write operation (X) in is preceded by a (X) in and that the value written by (X) in depends only on the value of X read by (X). This assumes that computation of the new value of X is a function f(X) based on the old value of X read from the database. However, the definition of view serializability is less restrictive than that of conflict serializability under the unconstrained write assumption, where the value written by an operation (X) in can be independent of its old value from the database. This is called a blind write.Define the violations caused by each of the following: dirty read, nonrepeatable read, and phantoms. Discuss the actions taken by the read_item and write_item operations on a database.(refer note)Discuss the different types of failures. What is meant by catastrophic failure? (5 mark and 3 mark qn)Whenever a transaction is submitted to a DBMS for execution, the system is responsible for making sure that either (1) all the operations in the transaction are completed successfully and their effect is recorded permanently in the database, or (2) the transaction has no effect whatsoever on the database or on any other transactions. The DBMS must not permit some operations of a transaction T to be applied to the database while other operations of T are not. This may happen if a transaction fails after executing some of its operations but before executing all of them.Types of Failures Failures are generally classified as transaction, system, and media failures. There are several possible reasons for a transaction to fail in the middle of execution: 1. A computer failure (system crash): A hardware, software, or network error occurs in the computer system during transaction execution. Hardware crashes are usually media failures—for example, main memory failure. 2. A transaction or system error: Some operation in the transaction may cause it to fail, such as integer overflow or division by zero. Transaction failure may also occur because of erroneous parameter values or because of a logical programming error (Note 3). In addition, the user may interrupt the transaction during its execution. 3. Local errors or exception conditions detected by the transaction: During transaction execution, certain conditions may occur that necessitate cancellation of the transaction. For example, data for the transaction may not be found. Notice that an exception condition (Note 4), such as insufficient account balance in a banking database, may cause a transaction, such as a fund withdrawal, to be canceled. This exception should be programmed in the transaction itself, and hence would not be considered a failure. 4. Concurrency control enforcement: The concurrency control method (see Chapter 20) may decide to abort the transaction, to be restarted later, because it violates serializability (see Section 19.5) or because several transactions are in a state of deadlock. 5. Disk failure: Some disk blocks may lose their data because of a read or write malfunction or because of a disk read/write head crash. This may happen during a read or a write operation of the transaction. 6. Physical problems and catastrophes: This refers to an endless list of problems that includes power or air-conditioning failure, fire, theft, sabotage, overwriting disks or tapes by mistake, and mounting of a wrong tape by the operator. What is meant by the concurrent execution of database transactions in a multiuser system? Discuss why concurrency control is needed, and give informal examples. A database system is classified according to the number of users who can use the system concurrently—that is, at the same time. A DBMS is single-user if at most one user at a time can use the system, and it is multiuser if many users can use the system—and hence access the database—concurrently. Single-user DBMSs are mostly restricted to some microcomputer systems; most other DBMSs are multiuser. For example, an airline reservations system is used by hundreds of travel agents and reservation clerks concurrently. Systems in banks, insurance agencies, stock exchanges, supermarkets, and the like are also operated on by many users who submit transactions concurrently to the system.Multiple users can access databases—and use computer systems—simultaneously because of the concept of multiprogramming, which allows the computer to execute multiple programs—or processes—at the same time. If only a single central processing unit (CPU) exists, it can actually execute at most one process at a time. However, multiprogramming operating systems execute some commands from one process, then suspend that process and execute some commands from the next process, and so on. A process is resumed at the point where it was suspended whenever it gets its turn to use the CPU again. Hence, concurrent execution of processes is actually interleavedWhy are Armstrong’s inference rules—the three inference rules IR1 through IR3—important? The set of dependencies , which we called the closure of F, can be determined from F by using only inference rules IR1 through IR3. Inference rules IR1 through IR3 are known as Armstrong’s inference rulesWhy can we not infer a functional dependency from a particular relation state? A functional dependency is a property of the relation schema (intension) R, not of a particular legal relation state (extension) r of R. Hence, an FD cannot be inferred automatically from a given relation extension r but must be defined explicitly by someone who knows the semantics of the attributes of R.What is meant by the completeness and soundness of Armstrong’s inference rules? By sound, we mean that, given a set of functional dependencies F specified on a relation schema R, any dependency that we can infer from F by using IR1 through IR3 holds in every relation state r of R that satisfies the dependencies in F. By complete, we mean that using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in the complete set of all possible dependencies that can be inferred from F.What is meant by the closure of a set of functional dependencies? Numerous other functional dependencies hold in all legal relation instances that satisfy the dependencies in F. Those other dependencies can be inferred or deduced from the FDs in F. For reallife examples, it is practically impossible to specify all possible functional dependencies that may hold. The set of all such dependencies is called the closure of F and is denoted by F+.Define Boyce-Codd normal Form. How does it differ from 3NF? Why is it considered a stronger form of 3NF? Boyce-Codd normal form (BCNF) was proposed as a simpler form of 3NF, but it was found to be stricter than 3NF, because every relation in BCNF is also in 3NF; however, a relation in 3NF is not necessarily in BCNF.The formal definition of BCNF differs slightly from the definition of 3NF. A relation schema R is in BCNF if whenever a nontrivial functional dependency X-> A holds in R, then X is a super key of R. The only difference between the definitions of BCNF and 3NF is that condition (b) of 3NF, which allows A to be prime, is absent from BCNF.This representation reduces the redundancy of repeating the same information in the thousands of tuples. BCNF is a stronger normal form that would disallow redundancy and disallow decomposing it.12 MarksExplain the design guidelines [or informal design guidelines] in database design?Explain functional dependency and its inference rule?Explain normalisation? Discuss about normal forms using primary key [i.e 1NF, 2NF, 3NF, BCNF]?Explain The General Definition For Each Normal Forms?Explain Multivalued Dependency and 4NF?Explain Join Dependency And 5NF?Discuss the reasons for converting SQL queries into relational algebra queries before optimization is done.?An SQL query is first translated into an equivalent extended relational algebra expression—represented as a query tree data structure—that is then optimized. Typically, SQL queries are decomposed into query blocks, which form the basic units that can be translated into the algebraic operators and optimized.What is a query execution plan? Optimization techniques that apply heuristic rules to modify the internal representation of a query—which is usually in the form of a query tree or a query graph data structure—to improve its expected performance. The parser of a high-level query first generates an initial internal representation, which is then optimized according to heuristic rules. Following that, a query execution plan is generated to execute groups of operations based on the access paths available on the files involved in the queryWhat is meant by cost-based query optimization? A query optimizer should not depend solely on heuristic rules; it should also estimate and compare the costs of executing a query using different execution strategies and should choose the strategy with the lowest cost estimate. For this approach to work, accurate cost estimates are required so that different strategies are compared fairly and realistically. In addition, we must limit the number of execution strategies to be considered; otherwise, too much time will be spent making cost estimates for the many possible execution strategies. Hence, this approach is more suitable for compiled queries where the optimization is done at compile time and the resulting execution strategy code is stored and executed directly at runtime. We call this approach cost-based query optimizationWhat is meant by semantic query optimization? How does it differ from other query optimization techniques? A different approach to query optimization, called semantic query optimization, has been suggested. This technique, which may be used in combination with the techniques discussed previously, uses constraints specified on the database schema—such as unique attributes and other more complex constraints—in order to modify one query into another query that is more efficient to execute.OBJECT RELATIONAL DATABASES A new technology has evolved in which relational and object-oriented concepts have been combined or merged. These systems are called object-relational database systems. Several major software companies including IBM, Informix, Microsoft, Oracle, and Sybase have all released object-relational versions of their products. These companies are promoting a new, extended version of relational database technology called object-relational database management systems also known as ORDBMSs. RDBMSs use Structured Query Language (SQL, currently SQL2) as the data definition language (DDL) and the data manipulation language (DML). SQL includes statements for data definition, modification, querying and constraint specification. The types of queries vary from simple single-table queries to complicated multi-table queries involving joins, nesting, set union/differences, and others. The concept of abstract data types (ADTs) in which the internal data structure is hidden and the external operations can be applied on the object that is specified led to the concept of encapsulation. The programming language SMALLTALK, developed by Xerox, was designed to be object-oriented. Other object-oriented programming languages include C++, Java, etc. The main features of OO programming languages are encapsulation, inheritance and polymorphism. Encapsulation can be thought as a protective layer that prevents the code and the data from being accessed by other code defined outside the layer. The data in object-oriented database management systems (OODBMSs) is managed through two sets of relations, one describing the interrelations of data items and another describing the abstract relationships (inheritance). object-oriented database management systems (OODBMSs)The strong connection between application and database results in less code, more natural data structures, and better maintainability and reusability of code. OO languages, such as C++ or Java, are able to reduce code size by not having to translate code into a database sublanguage such as SQL and ODBC or JDBC. An object-oriented language is the language for both the application and the database. OODBMSs have been integrated with C++, C, Java and LISP. The primary interface in an OODBMS for creating and modifying objects is directly via the object language (C++, Java, etc.) using the native language syntax. The lack of a defining standard was a drawback for OODBMSs. The Object Data Management Group (ODMG) has proposed a standard known as ODMG-93 or ODMG 1.0 standard , now revised into ODMG 2.0.The standard consists of the object model, the object defining language (ODL), the object query language (OQL), and the bindings to OO programming languages. The ODL and OQL are based on the ODMG data model. Data ModelThe data model consists of data types, type constructors, etc., and is similar to the SQL report that describes the standard model for relational databases. The ODL is designed so as to support semantic constructs of ODMG 2.0 object model. It is independent of any programming language. The ODL is used to create object specificationsThe OQL is designed to work closely with the programming languages for which an ODMG binding is defined such as C++, Java and SMALLTALK. The syntax of the OQL queries is similar to the syntax of SQL (a query language for relational databases) with some additional features such as object identity, complex objects, inheritance, polymorphism and relationships. Advantages of ORDBMSsThe main advantages of extending the relational data model come from reuse and sharing.Reuse comes from the ability to extend the DBMS server to perform standard functionality centrally, rather than have it coded in each application.If we can embed the functionality in the server, it saves having to define it in each application that needs it, and consequently allows the functionality to be shared by all applications.The main advantages of ORDBMSs are massive scalability and support for object-oriented featuresDisadvantages of ORDBMSsThe ORDBMSs approach has the obvious disadvantage of complexity and associated increased costs.There are proponents of the relational approach that believe the essential simplicity and purity of the relational model are lost with these types of extension.There are also those that believe that the RDMSs is being extended for what will be a minority of applications that do not achieve optimal performance with current relational technology.Characteristics of an ORDBMSsBase datatype extension, Object-Relational Database Managerment Systems (ORDBMSs) allow users to define datatypes, functions and operators. As a result, the functionality of the ORDBMSs increases along with their performance. Support complex objects, Inheritance, and Rule Systems.

Description
A database management system (DBMS) is a collection of programs that enables users to create and maintain a database or contains a set of interrelated data with set of programs to access the data. DBMS provides an environment that is both convenient and efficient to use. The DBMS is hence a general-purpose software system that facilitates the processes of defining, constructing, and manipulating databases for various applications. Defining a database involves specifying the data types, structures, and constraints for the data to be stored in the database. Constructing the database is the process of storing the data itself on some storage medium that is controlled by the DBMS. Manipulating a database includes such functions as querying the database to retrieve specific data, updating the database to reflect changes in the miniworld, and generating reports from the data. Well known DBMSs include • Oracle• IBM DB2<• Microsoft SQL Server• Microsoft Access• PostgreSQL• MySQL• SQLite.Advantages of DBMS?• Redundancy is controlled.• Unauthorised access is restricted.• Providing multiple user interfaces.• Enforcing integrity constraints.• Providing backup and recovery


Type: doc

Discussion
Neenu Prasad
Looking to Learn for computer subjects
User
16 Followers

Your Facebook Friends on WizIQ

Explore Similar Courses

Oracle SQL & PL/SQL Training

Price:$99

Batch Starting 25th Sept,2014