Physical Database Design

Database Development Procedure

Ming Wang , Russell K. Chan , in Encyclopedia of Information Systems, 2003

I.Due east. Concrete Design

The aim of physical database blueprint is to make up one's mind how the logical database pattern will be implemented. For the relational database, this involves:

Defining a ready of the table structures, data types for fields, and constraints on these tables such as primary key, foreign key, unique key, not nix and domain definitions to check if data are out of the range.

Identifying the specific storage structures and access methods to recall data efficiently. For example, adding a secondary index to a relation.

Designing security features for the database system including account cosmos, privilege granting/revocation, access protection, and security level assignment.

Physical design is DBMS-specific whereas logical design by contrast is DBMS-independent. Logical design is concerned with the what; physical database design is concerned with the how. In short, physical design is a process of implementing a database on secondary storage with a specific DBMS.

Read full chapter

URL:

https://www.sciencedirect.com/science/commodity/pii/B0122272404000265

Concrete Database Considerations

Charles D. Tupper , in Data Architecture, 2011

Queries, Reports, and Transactions

Part of the consideration for physical database design is the activity being passed against it. The transaction, query, or report creates a unit of measurement of work that threads its mode through the database in a traversal route that can be mapped. Some of the process mapping has been covered in Chapters 9 and x Affiliate 9 Chapter 10 , but a pocket-size recap would not hurt here. Functional decomposition in those chapters was divers as the breakdown of action requirements in terms of a hierarchical ordering and is the tool for analysis of action. The office is at the top of the hierarchy and is divers equally a continuously occurring activity inside the corporation. Inside each function are many processes. Processes have a beginning activity, a procedure activity, and a termination action, which completes the process. Each process may or may not exist broken downwards into subprocesses. Each subprocess or upshot also has an initiation, an action land, and a termination and differs from the process in that it represents activity at the lowest level.

Read total chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780123851260000152

Database Administration

Ming Wang , in Encyclopedia of Information Systems, 2003

II.C. Physical Blueprint

Database administration is typically responsible for concrete database blueprint and much of database implementation. Physical blueprint is the procedure of choosing specific structures and access paths for database files to attain good functioning for the various database applications. Each DBMS provides a multifariousness of options for file organisation and access paths. These include various types of indexing and clustering of related records on disk blocks. Once a specific DBMS is selected, the physical design process is restricted to choosing the well-nigh appropriate structure for the database files from the options offered by that DBMS. One of the advantages of relational database is that users are able to admission relations and rows without specifying where and how the rows are stored. The internal storage representation for relations should exist transparent to users in a relational database.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B0122272404000253

Basic Requirements for Physical Design

Charles D. Tupper , in Information Architecture, 2011

Information Access

In order to do a proper physical database pattern, it is of import to understand how and how frequently data will exist accessed. Where does this information come from? Ideally, process models should contain references to business organisation functions that will bespeak how frequently a business process should be followed. This can be translated to pseudo-SQL (pseudo-code that does not demand to parse but needs to incorporate access and ordering data). The criticality and concurrency of transactions are also important. This department will embrace the following subparts of information vital to concrete design of a high-operation database system.

Access implications: Data gathering and analysis must exist done in the manner in which the user accesses the data. Additionally, the tools used for the admission must be taken into consideration. For example, reporting tools often are broad spectrum—that is, they will work with many dissimilar DBMSs, and equally such they use very generic methods for access. Unless they have a pass-through pick, similar WebFocus does for Microsoft Access and SQLServer, the passed through query will accept poor access performance. If the access method is through a GUI front end end that invokes DBMS stored procedure triggers or functions, then it is far more tunable for functioning.

Concurrent access: Concurrent access is of concern for ii considerations: network load and locking contention. Network load is not discussed here. Locking implications are dependent on the required access. If the information are required to exist held static—that is, unchanged—an exclusive lock must be secured past the program executing the action. This exclusive lock prevents others from accessing the information while information technology is in use. There is an option to allow a read of the information while it is locked, knowing it volition be changed. This is known equally a muddy read and is done when the data needed are not those being updated. When too many programs are trying to access the same data, locking contention develops and a lock protocol is invoked, depending on the DBMS involved. In some cases the lock is escalated to the next higher object level in gild to prevent a buildup of processes waiting to execute.

Read full chapter

URL:

https://world wide web.sciencedirect.com/science/article/pii/B9780123851260000140

Information Warehousing and Caching

AnHai Doan , ... Zachary Ives , in Principles of Information Integration, 2012

ten.1.1 Information Warehouse Blueprint

Designing a data warehouse can be even more than involved than designing a mediated schema in a data integration setting because the warehouse must support very enervating queries, possibly over data archived over time. Physical database blueprint becomes critical — constructive use of partitioning across multiple machines or multiple disk volumes, creation of indices, definition of materialized views that can be used by the query optimizer. Most data warehouse DBMSs are configured for query-only workloads, equally opposed to transaction processing workloads, for performance: this disables about of the (expensive) consistency mechanisms used in a transactional database.

Since the early 2000s, all of the major commercial DBMSs accept attempted to simplify the tasks of physical database design for data warehouses. Well-nigh tools take "alphabetize option wizards" and "view pick wizards" that take a log of a typical query workload and perform a (usually overnight) search of alternative indices or materialized views, seeking to discover the best combination to improve operation. Such tools help, only still there is a demand for expert database administrators and "tuners" to obtain the best data warehouse performance.

Read full affiliate

URL:

https://www.sciencedirect.com/science/article/pii/B9780124160446000107

Foreword

John Zachman , in Information Modeling and Relational Databases (2d Edition), 2008

There is one more interesting dimension of these rigorous, precise semantic models —they have to be transformed into databases for implementation. The authors describe in detail and past analogy the transformation to logical models, to concrete database design, and to implementation. In this context, it is easy to evaluate and compare the various database implementation possibilities including relational databases, object-oriented databases, object-relational databases, and declarative databases; and they throw in star schemas and temporal databases for good mensurate! Once once again, I cannot remember seeing so dispassionate and objective an evaluation and comparison of the various database structures. Within this context, information technology is straight-forward to make a considered and realistic projection of database technology trends into the foreseeable future.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780123735683500011

Designing a Warehouse

Lilian Hobbs , ... Pete Smith , in Oracle 10g Data Warehousing, 2005

two.1.1 Don′t Use Entity Relationship (E-R) Modeling

The typical approach used to construct a transaction-processing system is to construct an entity-relationship (E-R) diagram of the business. It is then ultimately used as the footing for creating the physical database design, considering many of the entities in our model become tables in the database. If you lot have never designed a data warehouse before but are experienced in designing transaction-processing systems, then you will probably call back that a data warehouse is no different from any other database and that you can utilize the aforementioned approach.

Unfortunately, that is non the instance, and warehouse designers volition quickly discover that the entity-relationship model is not actually suitable for designing a data warehouse. Leading authorities on the discipline, such every bit Ralph Kimball, advocate using the dimensional model, and we take found this approach to exist ideal for a data warehouse.

An entity-relationship diagram can show us, in considerable detail, the interaction between the numerous entities in our arrangement, removing back-up in the system whenever possible. The outcome is a very apartment view of the enterprise, where hundreds of entities are described along with their relationships to other entities. While this approach is fine in the transaction-processing world, where we crave this level of detail, it is far too circuitous for the data warehouse. If you ask a database administrator (DBA) if he or she has an entity-human relationship diagram, the DBA will probably reply that he or she did once, when the system was first designed. But due to its size and the numerous changes that have occurred in the organisation during its lifetime, the entity-relationship diagram hasn′t been updated, and it is now merely partially accurate.

If we use a different arroyo for the data warehouse, one that results in a much simpler picture, then information technology should exist very easy to continue information technology upwards-to-date and besides to give information technology to stop users, to help them understand the data warehouse. Another cistron to consider is that entity-relationship diagrams tend to consequence in a normalized database pattern, whereas in a data warehouse, a denormalized pattern is often used.

Read full affiliate

URL:

https://www.sciencedirect.com/scientific discipline/commodity/pii/B9781555583224500047

CASE Tools for Logical Database Design

Toby Teorey , ... H.5. Jagadish , in Database Modeling and Design (5th Edition), 2011

Introduction to the CASE Tools

In this chapter we will innovate some of the well-nigh pop and powerful products available for helping with logical database design: IBM's Rational Data Architect, Computer Associates' AllFusion ERwin Data Modeler, and Sybase's PowerDesigner. These CASE tools help the designer develop a well-designed database by walking through a process of conceptual design, logical design, and physical cosmos, equally shown in Effigy 11.two.

Figure 11.2. Database design procedure.

Computer Associates' AllFusion ERwin Data Modeler has been around the longest. A stand-lone product, AllFusion ERwin'due south strengths stem from relatively strong back up of concrete database modeling, the broadest set of technology partners, and third-party training. What information technology does it does well, just in contempo years it has lagged in some advanced features. Sybase's PowerDesigner has come on strong in the by few years, challenging AllFusion ERwin. It has some advantages in reporting, and advanced features that volition exist described later on in this chapter. IBM's Rational Information Architect is a new product that supplants IBM's previous product Rational Rose Information Modeler. Its force lies in stiff design checking; rich integration with IBM's broad software development platform, including products from their Rational, Data Direction, and Tivoli divisions; and advanced features that will be described below.

In previous capacity, nosotros have discussed the aspects of logical database blueprint that CASE tools assistance design, annotate, utilize, and alter. These include, for example, entity–relationship (ER) and Unified Modeling Linguistic communication (UML) modeling, and how this modeling can be used to develop a logical database design. Within the ER design, in that location are several types of entity definitions and relationship modeling (unrelated, one-to-many, and many-to-many). These relationships are combined and normalized into schema patterns known as normal forms (e.g., 3NF, snowflake schema). An constructive design requires the articulate definition of keys, such as the primary key, the strange central, and unique keys within relationships. The addition of constraints to limit the usage (and abuses) of the system within reasonable bounds or business rules is besides critical. The effective logical design of the database will have a profound bear on on the performance of the organization, every bit well every bit the ease with which the database system can be maintained and extended.

There are several other CASE products that we volition not discuss in this book. A few additional products worth investigating include Datanamic's DeZign for Databases, QDesigner by Quest Software, Visible Analyst by Standard, and Embarcadero ER/Studio. The Visual Studio .Net Enterprise Architect edition includes a version of Visio with some database design stencils that can be used to create ER models. The cost and part of these tools varies wildly, from open-source products up through enterprise software that costs thousands of dollars per license.

The full evolution cycle includes an iterative cycle of understanding business requirements; defining production requirements; analysis and design; implementation; test (component, integration, and arrangement); deployment; administration and optimization; and change management. No single product currently covers that entire scope. Instead, product vendors provide, to varying degrees, suites of products that focus on portions of that cycle. CASE tools for database design largely focus on the analysis and design portion, and to a bottom degree, the testing portion of this iterative cycle.

Instance tools provide software that simplifies or automates some of the steps described in Figure eleven.2 . Conceptual design includes steps such as describing the business entities and functional requirements of the database; logical pattern includes definition of entity relationships and normal forms; and physical database design helps transform the logical pattern into actual database objects, such as tables, indexes, and constraints. The software tools provide significant value to database designers past:

ane.

Dramatically reducing the complication of conceptual and logical pattern, both of which tin can be rather hard to practice well. This reduced complexity results in better database design in less time and with less skill requirements for the user.

ii.

Automating transformation of the logical design to the physical blueprint (at least the basic physical design). This non merely reduces time and skill requirements for the designer, but significantly removes the chance of manual error in performing the conversion from the logical model to the physical data definition language (DDL), which the database server will "consume" (i.e., as input) to create the physical database.

3.

Providing the reporting, roundtrip engineering, and reverse engineering that make such tools invaluable in maintaining systems over a long flow of time. System blueprint can and does evolve over time due to changing and expanding business concern needs. Also, the people who design the system (sometimes teams of people) may not be the aforementioned equally those charged with maintaining the organization. The complication of big systems combined with the need for continuous adaptability virtually necessitates the use of CASE tools to help visualize, reverse engineer, and track the system blueprint over time.

You can find a broader list of bachelor database design tools at the website Database Answers ( www.databaseanswers.com/modelling_tools.htm ), maintained past David Alex Lamb at Queen's Academy in Kingston, Canada.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780123820204000136

Data Modeling: Entity-Relationship Data Model

Salvatore T. March , in Encyclopedia of Information Systems, 2003

I.B. Data Models and Database Implementations

A data model does not specify the physical storage of the data. It provides a precise representation of the data content, construction, and constraints required by an awarding. These must exist supported past the database and software physically implemented for the application. The process of developing a database implementation (schema) from a information model is termed physical database design . In brusque, the data model defines what information must be represented in the application and the database schema defines how that data is stored. The goal of information modeling, also termed conceptual database design, is to accurately and completely represent the information requirements. The goal of physical database design is to implement a database that efficiently meets those requirements.

Clearly there must be a correspondence betwixt a data model and the database schema adult to implement information technology. For example, a information model may specify that each employee must report to exactly one section at any point in time. This is represented as a relationship betwixt employees and departments in the data model. This human relationship must have a physical implementation in the database schema; even so, how it is represented is not of business to the information model. That is a business organisation for the physical database blueprint process. In a relational DBMS (RDBMS), relationships are typically represented by master key-foreign cardinal pairs. That is, the department identifier (principal key) of the section to which an employee reports is stored as a column (foreign key) in the employee'south record (i.e., row in the Employee table). In an object DBMS relationships can exist represented in a number of means, including complex objects and embedded object identifiers (OIDs).

Numerous information modeling formalisms take been proposed; however, the entity-relationship (ER) model and variations loosely termed binary-relationship models are the most widely known and the nearly ordinarily used. Such formalisms have come up to be known as semantic data models to differentiate them from the storage structures used past commercial DBMSs to define a database schema. Data modeling has go a common component of organisation development methodologies. A number of object-oriented system evolution approaches, such as the Unified Modeling Language, have extended data models into what has been termed class diagrams. These use the same basic constructs as data models to correspond the semantic data structure of the system, but typically extended the representation to include operations, system dynamics, and complex constraints and assertions.

Read full affiliate

URL:

https://www.sciencedirect.com/science/article/pii/B0122272404000344

Data Virtualization, Data Management, and Information Governance

Rick F. van der Lans , in Data Virtualization for Business Intelligence Systems, 2012

11.2 Impact of Data Virtualization on Information Modeling and Database Design

Information virtualization has an bear upon on sure aspects of how databases are designed. To evidence conspicuously where and what the differences are, this volume considers this design procedure to consist of iii steps: information modeling, logical database blueprint, and physical database design .

1 of the tasks when developing a business organization intelligence organization is to analyze the users' data needs. On which business objects practise they need reports? What are the properties of those business objects? On which level of particular practise they demand the data? How practice they define those business concern objects? This is information modeling, which is about getting a precise understanding of the business processes, the information these processes need, and the respective decision-making processes. Information technology's an action that requires trivial to no knowledge of database technology. What's needed is business knowledge. The more an analyst understands of the business organisation and its needs, the amend the results of information modeling. This pace is sometimes referred to equally data modeling, conceptual information modeling, or data assay. The term information modeling is used in this book because it'southward the most commonly used term.

The result of information modeling, chosen the information model, is a nontechnical simply formal description of the information needs of a group of users. Normally, it consists of a diagram describing all the core business objects, their properties, and their interrelationships. Diagramming techniques used are commonly based on entity-relationship diagramming (see, for example, [54]). Another diagramming technique used regularly in business intelligence environments is based on multidimensional modeling (see [55]).

In the second step—logical database design—the information model is transformed to tables consisting of columns and keys that are implemented in a staging expanse, data warehouse, or data mart. These tables will agree the users' information needs. This is a semitechnical stride. Normally, the upshot is simply a clarification or model of all the tables with their columns and keys structures.

The third pace—physical database design—focuses on finding the well-nigh constructive and efficient implementation of these tables for the database server in use. In this pace, database specialists written report aspects such as which columns need indexes, whether tables have to exist partitioned, and how the physical parameters of tabular array spaces should be set. They can even decide to restructure tables to improve functioning. For example, data from ii tables is joined to course a more denormalized structure, or derived and aggregated information is added to existing tables. The outcome of physical database design is a database model showing all the tables, their columns, and their keys. An example of such a database model is shown in Figure 11.ane.

Effigy 11.1. An example of a database model.

Reprinted with permission of Composite Software.

Compared to logical database design, concrete database design is a very database server-specific step. This means that the best imaginable solution for an Oracle database server doesn't accept to exist the all-time solution for a Microsoft database server.

For business intelligence systems with a more archetype architecture, early in the project designers determine which information stores are needed. Should the organisation be built around a data warehouse, is a staging area needed, and should data marts be developed? These decisions don't have to exist made when data virtualization forms the heart of a business intelligence system. Initially, merely a data warehouse is created, then no information marts or personal data stores are developed at the outset of the project. For performance reasons, they might exist created later.

Using information virtualization has bear upon on information modeling and database design:

Impact 1—Less Database Design Piece of work: When a concern intelligence system is adult, that three-step pattern process has to be applied to all the information stores needed. So information modeling and logical and concrete database blueprint have to be performed, for case, for the data warehouse, the staging surface area, and the data marts. An data model has to exist created, and a database model has to be adult for each of these data stores. For a organisation based on data virtualization, information modeling is still necessary, but database design only applies to the data warehouse because at that place are no other data stores. Because at that place are fewer data stores, in that location is less database design work.

Impact ii—Normalization Is Applied to All Tables: In a archetype organization, dissimilar database design approaches are used: normalization is quite often practical to the data warehouse, whereas the data marts usually receive a star schema or snowflake schema (see Department 2.6). Compare this to all the tables of a data warehouse in a arrangement based on data virtualization, where initially they receive normalized structures. The reason they are normalized is that this is still the most neutral course of a data structure—neutral in the sense that information technology tin back up the widest range of queries and reports. Next, virtual tables are designed (co-ordinate to the rules in Chapter 7). But for these virtual tables, no physical database blueprint is needed because there are no data stores.

Affect 3—Information Modeling and Database Design Become More Iterative: An iterative approach for information modeling and database blueprint is easier to deploy when data virtualization is used. The initial pattern of a data warehouse doesn't have to include the data needs of all the users, and new information needs can exist implemented step by step.

But why is this easier to deploy? When new information needs are implemented, new tables have to be added, columns may take to be added to existing tables, and existing table structures might take to exist inverse. In a system with a classic architecture, making these changes requires a lot of time. Not only practice the tables in the information warehouse accept to be changed, but the data marts and the ETL scripts that re-create the data must be changed equally well. And irresolute the tables in the information marts leads to changes in existing reports too. Reporting code has to exist inverse to prove the same results.

This is not the case when data virtualization is used. If the information needs to be inverse, the tables in the data warehouse have to be changed, simply this doesn't utilise to data marts and ETL scripts. Those changes can be hidden in the mappings of the virtual tables accessed by the existing reports. The consequence is that the extra amount of work needed to go along the existing tables unchanged is considerably less. The changes to the real tables are hidden for the reports. This is why a more iterative approach is easier to use when data virtualization is deployed.

Bear on four—Logical Database Design Becomes More than Interactive and Collaborative: Unremarkably, logical database design is quite an abstract practice. The designers come up up with a gear up of table definitions. In the optics of the business users, especially if they don't take a computing background, those definitions are quite abstract. Information technology's sometimes difficult for them to see how those tables together correspond their data needs. The main reason is that they don't always think in terms of information structures only in terms of the data itself. For example, a designer thinks in terms of customers and invoices, while a user thinks in terms of client Jones based in London and invoice 6473 which was sent to client Metheny Metals. Therefore, it can exist hard for a user to make up one's mind whether the tabular array structures resulting from logical database design are actually what he needs.

Information technology would be better if the data structures plus the real data are shown so the users can meet what those tables represent. When information virtualization is used, a logical database model can be implemented as virtual tables. The advantage is that when a virtual table is defined, its (virtual) contents can be shown instantaneously—in other words, both the annotator and the user can scan the contents and the user tin confirm that what he sees satisfies his information needs. Logical database design becomes a more than collaborative and more interactive procedure.

Affect five—Physical Database Pattern Decisions Can Be Postponed: Concrete database design changes in two means. First, instead of having to make all the correct physical design decisions upfront, many can exist postponed. For example, if a report is too tedious, a cache can be defined. That cache can be created instantaneously, and no existing reports have to be changed for that. A more than drastic solution might exist to create a data mart to which the virtual tables are redirected.

The assumption made here is that derived data stores are not needed initially and therefore require no physical database design. Second, in that location is less to design. If, indeed, considering of data virtualization, fewer databases accept to be designed, and so there is less physical database blueprint work to do. In a classic architecture where data warehouses and data marts have to exist designed, only the first is designed. This makes information technology a simpler process.

Impact half dozen—Denormalization Is Less Negative: When designing real tables, denormalization leads to duplication of data, increases the size of a database (in bytes), slows downwardly updates and inserts, and can lead to inconsistencies in the data. These have always been seen every bit the main disadvantages of denormalization. Every database designer knows this, and information technology'due south on page ane of every book on database design. If denormalization is applied when designing virtual tables, these assumptions are not truthful, and these disadvantages don't apply anymore. The point is that a virtual tabular array doesn't have a physical content. So if a virtual table has a denormalized structure, no redundant data is stored, the database doesn't increment, it does not past definition irksome downwards updates and inserts, and it does not lead to inconsistent data. However, if a cache is defined for a denormalized virtual table, then the cache does incorporate duplicated data.

Read full affiliate

URL:

https://www.sciencedirect.com/science/article/pii/B9780123944252000113