github twitter email rss
Databases
0001 Jun 1
4 minutes read

Databases

#need to know

little

traditional wisdom

more

column stores
main memory db
advantages of graph, array dbms
nosql movement

https://en.wikipedia.org/wiki/Database_normalization
http://en.wikipedia.org/wiki/Database_transaction

Technicues

sharding
replication

rabbitmq
couchdb
riak

rethingdb
Extradb
Aria db

https://app.mongohq.com
https://mongolab.com

Relational

MySQL, PostgreSQL, MariaDB

SQL
queries 
triggers
stored procedures
advanced indexes (Hash, B-tree)
ACID
CRUD
normalized relational schema
design-first datastore
ERD(entity relationship diagram)
PostgreSQL
    packages:  tablefunc, dict_xsyn, fuzzystrmatch, pg_trgm, cube
    commands:
        createdb dbName
        psql dbName -c "SELECT '1'::cube;"
        psql dbName
        \h CREATE INDEX
        CREATE TABLE
        INSERT
        SELECT
        UPDATE
        SET
        FROM
        WHERE
        INNER JOIN
        FULL JOIN
        LEFT JOIN
        RIGHT JOIN
        GROUP BY

Key-value

Column

HBase(Google BigTable, Hadoop), Cassandra, Hypertable

Document

MongoDB, CouchDB

Graph

Neo4J

http://probcomp.csail.mit.edu/bayesdb/#Examples

Concepts

database schema
DDL data-definition language
DML data-manipulation language
Procedural DML
DeclarativeDML
query language
query
durability
isolation
atomicity
relation
SQL (Structured Query Language)
relational model
XML (eXtensible Modeling Language)
transaction
metadata
schema
constraints
query plan
request
records
tuples
query plan
data warehouse
relationships
relations
actors
triggers
instance
physical schema
logical schema
data model
fine tuning
Ad-hoc queries
Instance
NULL
Key

inforation

buffer, page sizes region
log records
data
statistics
indexes

components

index/file/record manager
transaction manager
query processor

query compiler
    query parser
    query preprocessor
    query optimizer
execution engine

buffer manager
storage manager
concurrency control manager, scheduler
logging and recovery manager
Lock table

DBMS by data models

  • hierarchical, tree based
  • network, graph based
  • relational data model
    collection of tables to repre- sent both data and the relationships among those data
  • object-oriented systems
  • object-relational systems
  • Entity-Relationship Model
  • Semistructured Data Model

DBMS by features

  • deductive database systems
    capabilities for defining deduction rules for inferencing new information from the stored database facts
  • active database systems
    provide active rules that can automatically initiate actions when certain events and conditions occur

metadata

  • structure of each file
  • the type and storage format of each data item
  • various constraints on the data

ACID http://en.wikipedia.org/wiki/ACID

  • Atomicity
    (the all-or-nothing execution of transactions)
  • Consistency
    (any transaction will bring the database from one valid state to anothe)
  • Isolation
    (concurrent execution of transactions results in a system state that would be obtained if transactions were executed serially)
  • Durability
    (once a transaction has been committed, it will remain so)


CAP theorem http://en.wikipedia.org/wiki/CAP_theorem

  • Consistency
    (all nodes see the same data at the same time)
  • Availability
    (a guarantee that every request receives a response about whether it was successful or failed)
  • Partition tolerance
    (the system continues to operate despite arbitrary message loss or failure of part of the system)



characteristics of the database approach versus the file-processing approach

  • Self-describing nature of a database system
  • Insulation between programs and data, and data abstraction
    program-data independence
    program-operation independence
  • Support of multiple views of the data
  • Sharing of data and multiuser transaction processing

advantages of DBMS

  • Controlling Redundancy
  • Restricting Unauthorized Access
  • Persistent Storage for Program Objects
  • Storage Structures and Search Techniques for Efficient Query Processing
  • Backup and Recovery
  • Representing Complex Relationships among Data
  • Enforcing Integrity Constraints


transaction

  • isolation
  • atomicity


Actors

  • database administrator (DBA)
  • Database designers
  • End users
    Casual end users
    naive, parametric end users
    Sophisticated
    Standalone
  • System Analysts and Application Programmers
  • DBMS system designers and implementers
  • Tool developers
  • Operators and maintenance personnel


Approaches

data normalization

ensures consistency and saves storage space

denormalization (controlled redundancy)

to improve the performance of queries
capability to control this redundancy in order to prohibit inconsistencies

impedance mismatch problem

data structures provided by the DBMS were incompatible with the programming language’s data structures

integrity constraints

constraints are derived from the meaning or semantics of the data and of the miniworld it represents
  • Domain Constraints
    specifying a data type
  • restrict the value
  • referential integrity
  • key or uniqueness constraint
  • Assertions
  • Authorization


triggers

trigger is a form of a rule activated by updates to the table, which results in performing some additional operations

stored procedures

DDL data-definition language

create table department
    (dept name char (20),
    building char (15),
    budget numeric (12,2));         
       

DML data-manipulation language

RDBMS

set of named ‘relations’ (tables)
each relation has a set of named ‘attributes’ (columns)
each ‘tuple’ (rows) has a value for each attribute
each attribute has a ‘type’ (domain)

Schema
Instance
NULL - special value for ‘unknown’
Key - set of attributes(or single attribute) whose value is unique in each tuple

Queries return relations (compositionality, closed)


Back to posts


comments powered by Disqus