1. RDBMS Concepts

What is Data and Information

Data is numbers, text or images. Data is raw and has no meaning. Data is processed by humans and machine to drive information. Data is processed, organized, structured to provide meaningful Information.

Data should be contextual and have meaning to the user.

In IT, Data and Information is used interchangeably.


What is a Database

A database is an organized collection of structured information, or data, typically stored electronically in a computer system. A database is usually controlled by a database management system (DBMS). Together, the data and the DBMS, along with the applications that are associated with them, are referred to as a database system, often shortened to just database.

In database data is stored in tables.

Explore MS Excel

An excel application can also be used to store and manipulate data.

  • Excel can accommodate limited amount of data.
  • In Excel Data is stored in a cell.
  • Data maintenance is difficult in excel
  • Accidental changes
  • Less security

What is an RDBMS?

(Youtube video)

RDBMS stands for Relational Database Management System. RDBMS is the basis for SQL, and for all modern database systems like MS SQL Server, IBM DB2, Oracle, MySQL, and Microsoft Access.

A Relational database management system (RDBMS) is a database management system (DBMS) that is based on the relational model as developed by E. F. Codd.

https://ohiocomputeracademy.com/database/what-are-the12-codd-rules-for-rdbms/


Relational data model Characteristics

  • Data is stored in tables and is organized in terms of rows and columns. There can be multiple related tables.
  • The position of a row in a table is of no importance
  • The intersection of a row and column must give a single value
  • All values appearing in the columns are derived from the underlying domain
  • Doesn’t require users to understand its physical representation
  • Rows must be unique
  • Column names must be unique and should contain the same value.
  • Table and columns can have one or more constraints
  • Supports Null values

What is a table?

The data in a RDBMS is stored in database objects which are called as tables. A table is a collection of related data entries and it consists of columns and rows. These columns and rows contain fields.

A table is used for data storage in a relational database. The following is an example of an Employee table −

What is a field?

Every table is broken up into smaller entities called fields. The fields in the Employees table consist of EMPLOYEE_ID, FIRST_NAME, LAST_NAME, EMAIL, and PHONE_NUMER.

A field is a column in a table that is designed to maintain specific information about every record in the table.

What is a Record or a Row?

A record is also called as a row of data is made up of fields. For example, there are 6 records in the above EMPLOYEES table. Record or row is always horizontal and it contains a specific information regarding the table. Following is a single row of data or record in this table with 5 fields–

What is a column?

A column is a vertical entity in a table that contains all information associated with a specific field in a table.

For example, a column in the EMPLOYEES table is FIRST_NAME, which represents NAME of the specific employee. It is as shown below −

Constraints

Constraints are the rules enforced on data columns on a table. These are used to limit the type of data that can go into a table. This ensures the accuracy and reliability of the data in the database.

Constraints can either be column level or table level. Column level constraints are applied only to one column whereas, table level constraints are applied to the entire table.

Following are some of the most commonly used constraints available in SQL −

  • NOT NULL Constraint − Ensures that a column cannot have a NULL value.
  • DEFAULT Constraint − Provides a default value for a column when none is specified.
  • UNIQUE Constraint − Ensures that all the values in a column are different.
  • PRIMARY Key − Uniquely identifies each row/record in a database table.
  • FOREIGN Key − Uniquely identifies a row/record in any another database table.
  • CHECK Constraint − The CHECK constraint ensures that all values in a column satisfy certain conditions.
  • INDEX − Used to create and retrieve data from the database very quickly.

Data Integrity

The following categories of data integrity exist with each RDBMS −

  • Entity Integrity − There are no duplicate rows in a table.
  • Domain Integrity − Enforces valid entries for a given column by restricting the type, the format, or the range of values.
  • Referential integrity − Rows cannot be deleted, which are used by other records.
  • User-Defined Integrity − Enforces some specific business rules that do not fall into entity, domain or referential integrity.

What is a KEY?

A KEY is a value of a column/field used to identify a record in a table uniquely. A KEY could be a single column or combination of multiple columns

Note: Columns in a table that are NOT used to identify a record uniquely are called non-key columns.

What is a Primary Key?

A primary key is a single column value used to identify a database record uniquely. It has following attributes

  • A primary key cannot be NULL
  • A primary key value must be unique
  • The primary key values should rarely be changed
  • The primary key must be given a value when a new record is inserted.  

Table1

In the Table1 Employee ID is a Primary key because it uniquely identifies a record. Notice the other key field in the above table, Department_ID, this key is called the Foreign key.


What is a Foreign Key?

Foreign Key references the primary key of another Table. It helps connect two Tables

  • A foreign key can have a different name from its primary key
  • It ensures rows in one table have corresponding rows in another Unlike the Primary key, they do not have to be unique. Most often they aren’t Foreign keys can be null even though primary keys can not

Foreign key can only have values present in the PrimaryKey. In the above Table1, Department_ID is the foreign key of Primary key present in the Departments Table/Table2.

The two tables are in relation with the help of Primary and Foreign keys and hence the name Relational database.

Database maintains data integrity with the help of keys. An employee cannot work in a department which does not exists or which is not present in the Departments table.


How to design a database

The design process of the database will have the following steps:

  • Analyze the requirements and purpose of the database. What are the user views of the data (present and future)?
  • Define the Problems and constraints.
  • Identify the business rules
  • Source of the data
  • Define the scope of the database system
  • Conceptional design of the database. The purpose of the conceptual design phase is to build a conceptual model based upon the previously identified requirements, but closer to the final physical model. A commonly-used conceptual model is called an entity-relationship model (diagram below).
    • Identify the entities and attributes.
      • An entity is any object, place, person, concept, activity about which an enterprise records data. E.g. Employee, Department, Salary etc.
      • Attributes are data elements that describe an entity. E.g. Employee – Name, email etc.  
    • Create relationships. Entities have association of relationship between them. It is represented by a diamond in the E-R diagram. E.g. there are two entities Student and Course. The relationship can be – A student <enrolls> in a course. 

   Entity Relationship Diagram

An Entity Relationship diagram, also called as ERD or ER-diagram.

  • Shows the relationships of entity sets stored in a database.
  • An entity in this context is an object, a component of data. An entity set is a collection of similar entities.
  • These entities can have attributes that define its properties.
  • By defining the entities, their attributes, and showing the relationships between them, an ER diagram illustrates the logical structure of databases.
  • ER diagrams are used to sketch out the design of a database.

Graphical notations for ER-diagram

Name Symbol Function 
Entity    Text BoxUnique Data object/table in the system 
Attribute  ShapeDescribes the entity 
Relationship Shape
  Relation   
Relates two entities  

Example of an ER-diagram

Cardinality/ Relationships

The relationship or degree of relationship indicates the link between two entities for a specified occurrence of each. It is also called cardinality.

There are 3 types of relationships or cardinality:

  • One to one (1:1)
  • One to Many (1:N)
  • Many to Many (M:N)

One-to-One cardinality

A user can have a single profile.

One-to-Many cardinality

A Department can have many students but many students can be in one department.

Many-to-Many cardinality

A student can enroll into many courses. A course can have many students. Note in the relationship, there is a table (Student_Course) in between which has one-many relationships with Student and Course tables. In Many-to-Many relationships there will always be a third table in between.

  • Logical/physical design. Once the conceptual design is complete, you will move on to Logical or physical design.
    • Previously defined Entities will become tables and attributes will become the fields.
    • Keys will be created
    • Other constraints will be applied
  • Implementation
    • Install database management system
    • Create database and tables
    • Load data
    • Setup users and security


What is a Schema

A database schema is a collection of logical data structures, or schema objects. A database schema is owned by a database user and has the same name as the user name. Schema objects are user-created structures that directly refer to the data in the database. The database supports many types of schema objects, the most important of which are tables and indexes.

SYS and SYSTEM Schemas

All Oracle databases include default administrative accounts. Administrative accounts are highly privileged and are intended only for DBAs authorized to perform tasks such as starting and stopping the database, managing memory and storage, creating and managing database users, and so on.

  • The administrative account SYS is automatically created when a database is created.
  • This account can perform all database administrative functions.
  • The SYS schema stores the base tables and views for the data dictionary.
  • These base tables and views are critical for the operation of Oracle Database. Tables in the SYS schema are manipulated only by the database and must never be modified by any user.

The SYSTEM account is also automatically created when a database is created. The SYSTEM schema stores additional tables and views that display administrative information, and internal tables and views used by various Oracle Database options and tools. Never use the SYSTEM schema to store tables of interest to non-administrative users.


Data warehouse

In a Data warehouse, data is extracted from one or more of the Operational databases for storage and analysis purposes.

Data warehouse contains the entire data of the enterprise and Datamart is business specific database such as Sales, HR etc.


Data Mining

Data mining is the process of analyzing data to find previously unknown trends, patterns, and associations in order to make decisions. Generally, data mining is accomplished through automated means against extremely large data sets, such as a data warehouse. Some examples of data mining include:

  • An analysis of sales from a large grocery chain might determine that milk is purchased more frequently the day after it rains in cities with a population of less than 50,000.
  • A bank may find that loan applicants whose bank accounts show particular deposit and withdrawal patterns are not good credit risks.
  • A baseball team may find that collegiate baseball players with specific statistics in hitting, pitching, and fielding make for more successful major league players.