Slide 1 : SUSHIL
KULKARNI DISTRIBUTED DBMS
Slide 2 : DDBMS Concepts
Applications
Characteristics, Properties of DDBMS
Distributed Processing
Advantages & Disadvantages DDBMS
Types & Functions of DDBMS
Main Issues of DDBMS
Component Architecture for DDBMS
Data Allocation & Fragmentation
Transparencies AIM
Slide 3 : CONCEPTS
CONCEPTS : CONCEPTS So far, we assume a centralized database
Data are stored in one location (e.g. a single
hard disk)
A centralized database management system to
handle transaction
To handle multiple requests, a client-server
system is used
- Client send requests for data to server
- Server handle query, transaction management etc. SUSHIL KULKARNI
Slide 5 : This is not the only possibility
In many cases, it may be advantageous for data to be distributed
Branches of a bank
Different part of the government storing different kind of data about a person
Different organizations sharing part of their data
Thus, distributed databases CONCEPTS SUSHIL KULKARNI
Slide 6 : Data spread over multiple machines (also referred to as sites or nodes.
Network interconnects the machines
Data shared by users on multiple machines CONCEPTS SUSHIL KULKARNI
CONCEPTS : CONCEPTS Distributed database
Logical interrelated collection of shared data, along with description of data, physically distributed over a computer network. SUSHIL KULKARNI
CONCEPTS : CONCEPTS Distributed DBMS
The software system that permits the management of the distributed database and makes the distribution transparent to users SUSHIL KULKARNI
CONCEPTS : CONCEPTS Applications
User access distributed database via
applications SUSHIL KULKARNI
CONCEPTS : CONCEPTS TWO types of Applications
Local application : Application that do not
required data from other sites.
Global application : Application that required
data from other sites. SUSHIL KULKARNI
Slide 11 : In a homogeneous distributed database:
All sites have identical software.
Are aware of each other and agree to cooperate in processing user requests.
Each site surrenders part of its autonomy in terms of right to change schemas or software.
Appears to user as a single system. TYPES OF DDBMS SUSHIL KULKARNI
Slide 12 : In a heterogeneous distributed database:
Different sites may use different schemas and software.
Difference in schema is a major problem for query processing.
Difference in software is a major problem for transaction processing.
Sites may not be aware of each other and may provide only limited facilities for cooperation in
transaction processing. TYPES OF DDBMS SUSHIL KULKARNI
Slide 13 : TYPE: HOMOGENEOUS DBMS SUSHIL KULKARNI
Slide 14 : TYPE: HETROGENEOUS DBMS SUSHIL KULKARNI
Slide 15 : Location Transparency
User does not have to know the location of the data.
Data requests automatically forwarded to appropriate sites
Local Autonomy
Local site can operate with its database when network connections fail
Each site controls its own data, security,
logging, recovery OBJECTIVES : DISTRIBUTED ARCHITECTURE SUSHIL KULKARNI
Slide 16 : Synchronous Distributed Database
All copies of the same data are always identical
Data updates are immediately applied to all copies throughout network
Good for data integrity
High overhead ? slow response times Asynchronous Distributed Database
Some data inconsistency is tolerated
Data update propagation is delayed
Lower data integrity
Less overhead ? faster response time NOTE: all this assumes replicated data (to be discussed later) SIGNIFICANT TRADE -OFF
Advantages & Disadvantages : Advantages & Disadvantages Advantages
Increased reliability & availability
Local control
Modular growth
Lower communication costs
Faster response Disadvantages
Software cost & complexity
Processing overhead
Data integrity
Slow response
DISTRIBUTED PROCESSING : DISTRIBUTED PROCESSING A centralized database that can be accessed over a computer network. SUSHIL KULKARNI
DISTRIBUTED PROCESSING : DISTRIBUTED PROCESSING SUSHIL KULKARNI Communication
Network DB
FUNCTIONS OF DDBMS : FUNCTIONS OF DDBMS Functions of a centralized DBMS plus:
extended communication to allow the transfer of
queries and data among sites
extended system catalog to store data distribution
details
distributed query processing , including query
optimization SUSHIL KULKARNI
FUNCTIONS OF DDBMS : FUNCTIONS OF DDBMS extended concurrency control to maintain
consistency of replicated data.
extended recovery services to take account
of failures of individual sites and common
links SUSHIL KULKARNI
TWO MAIN ISSUES IN DDBMS : TWO MAIN ISSUES IN DDBMS Making query from one site to the same or
remote site.
Logical database is partitioned in to different
data streams and located at different sites. SUSHIL KULKARNI
COMPONENT ARCHITECTURE FOR DDBMS : COMPONENT ARCHITECTURE FOR DDBMS Local DBMS
Data Communication Component
Global System Catalog
Distributed DBMS component SUSHIL KULKARNI
Slide 24 : DATA ALLOCATION
DATA ALLOCATION : DATA ALLOCATION Centralized
Fragmented
Complete replication
Selective replication SUSHIL KULKARNI
Distributed Data Storage : Distributed Data Storage Assume relational data model.
Replication:
System maintains multiple copies of data, stored in different sites, for faster retrieval and fault tolerance.
Fragmentation:
Relation is partitioned into several fragments stored in distinct sites
Replication and fragmentation can be combined:
Relation is partitioned into several fragments: System maintains several identical replicas of each such fragment.
Data Replication : Data Replication A relation or fragment of a relation is replicated if it is stored redundantly in two or more sites.
Full replication of a relation is the case where the relation is stored at all sites.
Fully redundant databases are those in which every site contains a copy of the entire database. SUSHIL KULKARNI
Data Replication (Cont.) : Data Replication (Cont.) Advantages of Replication:
Availability: failure of site containing relation r does not result in unavailability of r is replicas exist.
Parallelism: queries on r may be processed by several nodes in parallel.
Reduced data transfer: relation r is available locally at each site containing a replica of r. Data Replication SUSHIL KULKARNI
Data Replication (Cont.) : Data Replication (Cont.) Disadvantages of Replication
Increased cost of updates: each replica of relation r must be updated.
Increased complexity of concurrency control: concurrent updates to distinct replicas may lead to inconsistent data unless special concurrency control mechanisms are implemented.
One solution: choose one copy as primary copy and apply concurrency control operations on primary copy. Data Replication
Data Fragmentation : Data Fragmentation Division of relation r into fragments r1, r2, …, rn which contain sufficient information to reconstruct relation r.
Horizontal fragmentation: each tuple of r is assigned to one or more fragments.
Vertical fragmentation: the schema for relation r is split into several smaller schemas.
All schemas must contain a common candidate key (or superkey) to ensure lossless join property.
A special attribute, the tuple-id attribute may be added to each schema to serve as a candidate key.
Example : relation account with following schema.
Account-schema = (branch-name, account-number, balance).
HORIZONTAL FRAGMENTATION : Fragments contain subsets of complete tuples (all attributes at all sites)
How to reconstruct R= Rs1 Rs2 ……. Rsn HORIZONTAL FRAGMENTATION Original relation Site 2
Slide 32 : A1 A2 A3 A4 A1 A2 A3 A4 Original Relation (R) t1
t2
tn RS1 RS2 t1
t2
tn t1
t2
tn SITE1 SITE2 VERTICAL FRAGMENTATION
Slide 33 : A1 A2 A3 A4 A1 A2 A3 A4 Original Relation (R) t1
t2
tn RS1 RS2 t1
t2
tn t1
t2
tn SITE1 SITE2 VERTICAL FRAGMENTATION
Slide 34 : usa Europe A1 A2 A3 A1 A2 A3 A4 A5 A4 A5 A1 A2 A3 A4 A5 (Salary Attributes) (Benefit Attributes) Rs1 Rs2 Rs3 Rs4 R MIXED FRAGMENTATION
Slide 35 : A1 A2 A3 A4 A1 A2 A3 A4 Original Relation (R) t1
t2
tn RS1 RS2 t1
t2
tn t1
t2
tn SITE1 SITE2 MIXED FRAGMENTATION
Horizontal Fragmentation of account Relation : Horizontal Fragmentation of account Relation branch-name account-number balance Hillside
Hillside
Hillside A-305
A-226
A-155 500
336
62 account1=?branch-name=“Hillside”(account) branch-name account-number balance Valleyview
Valleyview
Valleyview
Valleyview A-177
A-402
A-408
A-639 205
10000
1123
750 account2=?branch-name=“Valleyview”(account) SUSHIL KULKARNI
Vertical Fragmentation of employee-info Relation : branch-name customer-name tuple-id Hillside
Hillside
Valleyview
Valleyview
Hillside
Valleyview
Valleyview Lowman
Camp
Camp
Kahn
Kahn
Kahn
Green deposit1=?branch-name, customer-name, tuple-id(employee-info) 1
2
3
4
5
6
7 account number balance tuple-id 500
336
205
10000
62
1123
750 1
2
3
4
5
6
7 A-305
A-226
A-177
A-402
A-155
A-408
A-639 deposit2=?account-number, balance, tuple-id(employee-info) Vertical Fragmentation of employee-info Relation
Advantages of Fragmentation : Advantages of Fragmentation Horizontal:
allows parallel processing on fragments of a relation
allows a relation to be split so that tuples are located where they are most frequently accessed
Vertical:
allows tuples to be split so that each part of the tuple is stored where it is most frequently accessed
tuple-id attribute allows efficient joining of vertical fragments
allows parallel processing on a relation
Vertical and horizontal fragmentation can be mixed.
Fragments may be successively fragmented to an arbitrary depth. SUSHIL KULKARNI
REPLICATION and FRAGMENTATION : Partition of Attributes/tuples need not be disjoint REPLICATION and FRAGMENTATION A1 A2 A3 A4 A5 A1 A2 A3 A4 A2 A3 A4 A5 Overlap
(replication of attributes)
Slide 40 : TRANSPARENCIES
TRANSPARENCIES IN DDBMS : TRANSPARENCIES IN DDBMS Transparencies hide implementation details from the user
Example in Centralized databases : Data independence
Main types of transparencies in DDBMS:
Distributed Transparency
Transaction Transparency SUSHIL KULKARNI
DISTRIBUTED TRANSPARENCY : DISTRIBUTED TRANSPARENCY Allows the user to see the database as a
single, logical entity.
If this transparency is exhibited then the
user does not need to know that
1. The data are partitioned.
2. Data can be replicated at several
sites.
3. Data location. SUSHIL KULKARNI
EXAMPLE : EXAMPLE Staff (staffNo, position, sex, dob, salary,
fName, lName, branchNo)
Vertical fragmentation: SUSHIL KULKARNI
EXAMPLE : EXAMPLE Fragment S 2 according to branch number.
Assume that there are only three branches.
Horizontal fragmentation: SUSHIL KULKARNI
EXAMPLE : EXAMPLE Assume that :
S 1 and S 2 are at site 5,
S 21 at site 3
S 22 at site 5
S 23 at site 7 SUSHIL KULKARNI
FRAGMENTATION TRANSPARENCY : FRAGMENTATION TRANSPARENCY If it is provided then the user does not need
to know the data is fragmented.
Example:
SELECT fName, lName
FROM Staff
WHERE position = ‘ Manager ’ SUSHIL KULKARNI
LOCATION TRANSPARENCY : LOCATION TRANSPARENCY If it is provided then the user must know
how the data has been fragmented but still
does not have know the location of the data. SUSHIL KULKARNI
LOCATION TRANSPARENCY : LOCATION TRANSPARENCY Example:
SELECT fName, lName
FROM S21
WHERE staffNo IN (SELECT staffNO FROM S1 where
position = ‘ Manager ’)
UNION
SELECT fName, lName
FROM S22
WHERE staffNo IN (SELECT staffNO FROM S1 where
position = ‘ Manager ’) SUSHIL KULKARNI
LOCATION TRANSPARENCY : LOCATION TRANSPARENCY Example:
UNION
SELECT fName, lName
FROM S23
WHERE staffNo IN (SELECT staffNO FROM S1 where
position = ‘ Manager ’ ) SUSHIL KULKARNI
LOCAL MAPPING TRANSPARENCY : LOCAL MAPPING TRANSPARENCY If it is provided then the user must know
how the data has been fragmented as well
as the location of the data. SUSHIL KULKARNI
LOCATION TRANSPARENCY : LOCATION TRANSPARENCY Example:
SELECT fName, lName
FROM S21 AT SITE 3
WHERE staffNo IN (SELECT staffNO FROM S1 AT SITE 5
where position = ‘ Manager ’)
UNION
SELECT fName, lName
FROM S22 AT SITE 5
WHERE staffNo IN (SELECT staffNO FROM S1 AT SITE 3
where position = ‘ Manager ’) SUSHIL KULKARNI
LOCATION TRANSPARENCY : LOCATION TRANSPARENCY Example:
UNION
SELECT fName, lName
FROM S23 AT SITE 7
WHERE staffNo IN (SELECT staffNO FROM S1 AT SITE 3
where position = ‘ Manager ’ ) SUSHIL KULKARNI
TRANSACTION TRANSPARENCY : TRANSACTION TRANSPARENCY It maintains distributed database’s integrity
and consistency. SUSHIL KULKARNI
QUERY PROCESSING IN DDMS : Issues 1:
Parallel Processing across Fragments
?LName(?salary>40,000(Employee))
? ?LName( ?salary>40,000(Emp1)) U ?LName( ?salary>40,000(Emp2)) QUERY PROCESSING IN DDMS 2 Fragments Site 1 Site 2 Execution in Parallel on fragments and union results together Horizontal fragmentations
: (A B) C
A (B C) Site1 Site2 Site3 Joins- symmetric and associative
Parallel Processing
(?xx(A)) (B C) QUERY PROCESSING IN DDMS
Slide 56 : R=? Fnames, Cnames, Dnames (Employee Department)
Strategies:
1)Ship both relations to the result site and join there
2)Ship employee to 2, join at 2, results to 3
3)Ship Department to 1, join at 1, results to 3
? minimize total communication cost of data transfer 1,003,000 bytes transfered 1,002,000 bytes transfered 5,000 bytes transfered Join Strategies Site 3
100 records, 2000 bytes Site 1
10,000 records, 1,000,000 bytes Site 2
100 records, 3000 bytes Mg rssn to ssn QUERY PROCESSING IN DDMS
Slide 57 : THANKS !