Database Theory and Advance

Add to Favourites
Post to:

IntroductionThe Database ApproachA database system is a computer-based system to record and maintain information. The information concerned can be anything of significance to the organization for whose use it is intended.Databases contents are divided up into two concepts:SchemaDataSchema is the structure of data, whereas the Data are the "facts". Schema can be complex to understand, but really indicates the rules which the Data must obey.Imagine a case where we want to store facts about employees in a company. Such facts could include their name, address, date of birth, and salary. In a database all the information on all employees would be held in a single storage "container", called a table. This table is a tabular object like a spreadsheet page, with different employees as the rows, and the facts (e.g. their names) as columns... Let's call this table EMP, and it could look something like: NameAddressDate of BirthSalaryJim Smith1 Apple Lane1/3/199111000Jon Greg5 Pear St7/9/199213000Bob Roberts2 Plumb Road3/2/199012000From this information the schema would define that EMP has four components, "NAME", "ADDRESS", "DOB", "SALARY". As designers we can call the columns what we like, but making them meaningful helps. In addition to the name, we want to try and make sure that people don’t accidentally store a name in the DOB column, or some other silly error. Protecting the database against rubbish data is one of the most important database design steps. From what we know about the facts, we can say things like: NAME is a string, and needs to hold at least 12 characters.ADDRESS is a string, and needs to hold at least 12 characters.DOB is a date... The company forbids people over 100 years old or younger than 18 years old working for them.SALARY is a number. It must be greater than zero.Such rules can be enforced by a database. During the design phase of a database schema these and more complex rules are identified and where possibly implemented. The more rules the harder it is to enter poor quality data.User TypesThree broad user types to consider:Application programmer, responsible for writing programs in some high-level language such as Java, C++, etc. End-user, who accesses the database via a query language Database administrator (DBA), who controls all operations on the database Database ArchitectureThe three-level architecture forms the basis of modern database architectures. External: concerned with the way individual users see the dataConceptual: can be regarded as a community user view a formal description of data of interest to the organization, independent of any storage considerations.Internal: concerned with the way in which the data is actually storedExternal ViewA user is anyone who needs to access some portion of the data. They may range from application programmers to casual users with adhoc queries. Each user has a language at his/her disposal.The application programmer may use a high level language while the casual user will probably use a query language.Regardless of the language used, it will include a data sub­language DSL which is that subset of the language which is concerned with storage and retrieval of information in the database and may or may not be apparent to the user.A DSL is a combination of two languages:a data definition language (DDL) - provides for the definition or description of database objectsa data manipulation language (DML) - supports the manipulation or processing of database objects.Each user sees the data in terms of an external view: Defined by an external schema, consisting basically of descriptions of each of the various types of external record in that external view, and also a definition of the mapping between the external schema and the underlying conceptual schema.Conceptual ViewAn abstract representation of the entire information content of the database.It is in general a view of the data as it actually is, that is, it is a “model” of the “real­world”.It consists of multiple occurrences of multiple types of conceptual record, defined in the conceptual schema.To achieve data independence, the definitions of conceptual records must involve information content only.storage structure is ignoredaccess strategy is ignoredIn addition to definitions, the conceptual schema contains authorisation and validation procedures.Internal ViewThe internal view is a low-level representation of the entire database consisting of multiple occurrences of multiple types of internal (stored) records.It is however at one remove from the physical level since it does not deal in terms of physical records or blocks or with any device specific constraints such as cylinder or track sizes. Detail mapping to physical storage is highly implementation specific and are not expressed in the three-level architecture.The internal view described by the internal schema:defines the various types of stored recordwhat indices existhow stored fields are representedwhat physical sequence the stored records are inIn effect the internal schema is the storage structure definition.MappingsThe conceptual/internal mapping:defines conceptual and internal view correspondencespecifies mapping from conceptual records to their stored counterpartsAn external/conceptual mapping:defines a particular external and conceptual view correspondenceA change to the storage structure definition means that the conceptual/internal mapping must be changed accordingly, so that the conceptual schema may remain invariant, achieving physical data independence.A change to the conceptual definition means that the conceptual/external mapping must be changed accordingly, so that the external schema may remain invariant, achieving logical data independence.DBMSThe database management system (DBMS) is the software that:handles all access to the databaseis responsible for applying the authorization checks and validation proceduresConceptually what happens is:A user issues an access request, using some particular DML.The DBMS intercepts the request and interprets it.The DBMS inspects in turn the external schema, the external/conceptual mapping, the conceptual schema, the conceptual internal mapping, and the storage structure definition.The DBMS performs the necessary operations on the stored database.Database AdministratorThe database administrator (DBA) is the person (or group of people) responsible for overall control of the database system. The DBA's responsibilities include the following:deciding the information content of the database, i.e. identifying the entities of interest to the enterprise and the information to be recorded about those entities. This is defined by writing the conceptual schema using the DDLdeciding the storage structure and access strategy, i.e. how the data is to be represented by writing the storage structure definition. The associated internal/conceptual schema must also be specified using the DDLliaising with users, i.e. to ensure that the data they require is available and to write the necessary external schemas and conceptual/external mapping (again using DDL)defining authorization checks and validation procedures. Authorization checks and validation procedures are extensions to the conceptual schema and can be specified using the DDLdefining a strategy for backup and recovery. For example periodic dumping of the database to a backup tape and procedures for reloading the database for backup. Use of a log file where each log record contains the values for database items before and after a change and can be used for recovery purposesmonitoring performance and responding to changes in requirements, i.e. changing details of storage and access thereby organizing the system so as to get the performance that is “best for the enterprise”Facilities and LimitationsThe facilities offered by DBMS vary a great deal, depending on their level of sophistication. In general, however, a good DBMS should provide the following advantages over a conventional system:Independence of data and program - This is a prime advantage of a database. Both the database and the user program can be altered independently of each other thus saving time and money which would be required to retain consistency.Data shareability and non­redundance of data - The ideal situation is to enable applications to share an integrated database containing all the data needed by the applications and thus eliminate as much as possible the need to store data redundantly.Integrity - With many different users sharing various portions of the database, it is impossible for each user to be responsible for the consistency of the values in the database and for maintaining the relationships of the user data items to all other data item, some of which may be unknown or even prohibited for the user to access.Centralized control - With central control of the database, the DBA can ensure that standards are followed in the representation of data.Security - Having control over the database the DBA can ensure that access to the database is through proper channels and can define the access rights of any user to any data items or defined subset of the database. The security system must prevent corruption of the existing data either accidently or maliciously.Performance and Efficiency - In view of the size of databases and of demanding database accessing requirements, good performance and efficiency are major requirements. Knowing the overall requirements of the organization, as opposed to the requirements of any individual user, the DBA can structure the database system to provide an overall service that is “best for the enterprise”.Data IndependenceThis is a prime advantage of a database. Both the database and the user program can be altered independently of each other.In a conventional system applications are data­dependent. This means that the way in which the data is organized in secondary storage and the way in which it is accessed are both dictated by the requirements of the application, and, moreover, that knowledge of the data organization and access technique is built into the application logic.For example, if a file is stored in indexed sequential form then an application must knowthat the index existsthe file sequence (as defined by the index)The internal structure of the application will be built around this knowledge. If, for example, the file was to be replaced by a hash-addressed file, major modifications would have to be made to the application.Such an application is data-dependent - it is impossible to change the storage structure (how the data is physically recorded) or the access strategy (how it is accessed) without affecting the application, probably drastically. The portions of the application requiring alteration are those that communicate with the file handling software - the difficulties involved are quite irrelevant to the problem the application was written to solve.it is undesirable to allow applications to be data-dependent - different applications will need different views of the same data. the DBA must have the freedom to change storage structure or access strategy in response to changing requirements without having to modify existing applications.Data independence can be defines as, ”The immunity of applications to change in storage structure and access strategy”.Data RedundancyIn non-database systems each application has its own private files. This can often lead to redundancy in stored data, with resultant waste in storage space. In a database the data is integrated.The database may be thought of as a unification of several otherwise distinct data files, with any redundancy among those files partially or wholly eliminated.Data integration is generally regarded as an important characteristic of a database. The avoidance of redundancy should be an aim, however, the vigor with which this aim should be pursued is open to question.Redundancy isdirect if a value is a copy of anotherindirect if the value can be derived from other values:simplifies retrieval but complicates updateconversely integration makes retrieval slow and updates easierData redundancy can lead to inconsistency in the database unless controlled. the system should be aware of any data duplication - the system is responsible for ensuring updates are carried out correctly. a DB with uncontrolled redundancy can be in an inconsistent state - it can supply incorrect or conflicting informationa given fact represented by a single entry cannot result in inconsistency - few systems are capable of propagating updates i.e. most systems do not support controlled redundancy.Data IntegrityThis describes the problem of ensuring that the data in the database is accurate...Inconsistencies between two entries representing the same “fact” give an example of lack of integrity (caused by redundancy in the database).Integrity constraints can be viewed as a set of assertions to be obeyed when updating a DB to preserve an error-free state.Even if redundancy is eliminated, the DB may still contain incorrect data.Integrity checks which are important are checks on data items and record types.Integrity checks on data items can be divided into 4 groups:type checkse.g. ensuring a numeric field is numeric and not a character - this check should be performed automatically by the DBMS.redundancy checksdirect or indirect (see data redundancy) - this check is not automatic in most cases.range checkse.g. to ensure a data item value falls within a specified range of values, such as checking dates so that say (age > 0 AND age < 110).comparison checksin this check a function of a set of data item values is compared against a function of another set of data item values. For example, the max salary for a given set of employees must be less than the min salary for the set of employees on a higher salary scale.A record type may have constraints on the total number of occurrences, or on the insertions and deletions of records. For example in a patient database there may be a limit on the number of x-ray results for each patient or the details of a patients visit to hospital must be kept for a minimum of 5 years before it can be deletedCentralized control of the database helps maintain integrity, and permits the DBA to define validation procedures to be carried out whenever any update operation is attempted (update covers modification, creation and deletion).Integrity is important in a database system - an application run without validation procedures can produce erroneous data which can then affect other applications using that data.Database ModelsHierarchical Database ModelA data model in where the data is organized into a tree-like structure. The structure allows repeating information using parent/child relationshipsEntity type mapping is 1:N. Each parent can have many children but each child only has one parent. Example:(Please Provide)Network Database ModelNetwork model allows each record to have multiple parent and child records, forming a generalized graph structure. It allowed a more natural modeling of relationships between entities.Its schema is viewed as a graph in which object types are nodes and relationship types are arcs, is not restricted to being a hierarchy or lattice.Example:(Please Provide)

Description
Database Management System
Concepts of DBMS
Types of Database Systems
Hierarchical
Network
Relational
Distributed
Multimedia
Entity Relationship Modelling
Basic Concept of Database Design and Development
Queries and Subqueries
Query Optimization
Query Execution Algorithms
Indexes
Transactions and Isolation Levels
Database Security
OLAP, Data Warehouse, Data Mining

Comments

Want to learn?

Sign up and browse through relevant courses.

Name:
Your Email:
Password:
Country:
Contact no:


Area code Number
Subjects you are interested in:
Word verification: (Enter the text as in image)


Sign Up Already a member? Sign In
I agree to WizIQ's User Agreement & Privacy Policy

Your Facebook Friends on WizIQ

Explore Similar Courses

Give live classes, create & sell online courses

Try it free Plans & Pricing

Connect