Introduction to databases
Concepts
flat-file database
Databases that can only contain one table
Multi-valued
http://www.attcanada.net/~kallal.msn/Articles/fog0000000006.html
ISAM
Indexed Sequential Access Method
key/value ("hashing")
""relational" is the key feature of relational databases:
they are interesting not because of a given element (i.e. row in a table) on
its own but because of the relations between rows and tables. If you don't need
relations, then there's no reason to use a relational DB. The whole point is
to use multiple tables ;) Otherwise you just have a flat file."
... or has engines like SleepyCat :-)
relational
object-oriented
XML
Keys
Primary, foreign, referential integrity
Indexes
Views
Normalization
First, second, third normal forms
Transactions
Ie. bunch of SQL instructions that sit between a BEGIN and COMMIT statements
Tuning
SQL
Commands
Triggers
Stored procedures
Atomic transactions, commit/rollback
Pessimistic/optimistic locking
Cursor
Joins
Inner (symetric, ie. returns all the rows common to the tables used; inner
join, cross join, equijoin), outer join (asymetric, ie. only some of the rows
are returned; outer join, left join, right join), self join (using only one
table, and creating temporary aliases)
APIs
MDAC
(From MS) The Microsoft Data Access Components are the key technologies
that enable Universal Data Access. They include ActiveX Data Objects (ADO),
Remote Data Service (RDS, formerly known as Advanced Data Connector or ADC),
OLE DB, and Open Database Connectivity (ODBC).
UDA
Universal Data Access. Here's the UDA architecture as seen on MS site:
OLE DB + ADO
OLE DB is a set of COM interfaces specific to access data sources; OLE DB
providers are usually accessed through ADO, which encapsulates OLE DB to
expose a simple interface through methods, properties and events.
RDS
(MS) Remote Data Service (RDS) is a feature of ADO. RDS delivers a new
Web data access technology that allows developers to create data-centric applications
within ActiveX®-enabled browsers such as Microsoft® Internet Explorer.
DAO/RDO
Here's a comparison of OLEDB-ADO and DAO-RDO as seen on MS site:
?????
ODBC
(MS) Open Database Connectivity (ODBC) is a widely accepted application
programming interface (API) for database access. It is based on the Call-Level
Interface (CLI) specifications from X/Open and ISO/IEC for database APIs and
uses Structured Query Language (SQL) as its database access language.
DSN = Data Source Name, ie. name given in ODBC to a connection profile. User
DSN = user-specific connections; System DSN = system-wide connections, ie. any
user who logs on to this computer can use those DSN's; File DSN = access to
DSN definition files (*.DSN)?; ODBC Drivers = list of drivers installed
ODBC configuration done through Control Panel.
JDBC
Implementing concurrency
If your database doesn't handle concurrent accesses, you'll have to implement
this in your application. Three possibilities are available:
- pessimist locking: A lock is created before any change is made, keeping
other hosts or processes from making any change
- optimist locking: use a variable to check whether data were changed
between the time you read data, and when you're about to record your changes.
If someone else made changes in the meantime, either discard your changes,
or let the user decide
- last in wins: You don't care if another host or process made changes
between the time you read data, and when you made your changes
From optimistic
locking:
- Anyway... back to the specific question... the way I've done it in the
past is to store the timestamp in the Value and View objects (you can store
the timestamp in a hidden field on the jsp). Then when you save the object
you do:
-
- update table foo set x=? where id=? and timestamp=?
-
- Check the return to see how many rows where affected. If non were affected
then you basically have a stale update (either the timestamp was changed
in the interim or someone deleted the row out from under you). I've used
triggers or explicit setting of the timestamp in the past. Note that it
does not even need to be a real timestamp... it could be a verion number
too. In one application we were keeping audit trails for all changes and
we decided to use a sequence instead of a timestamp.
From VFP
8.0 and Sql Server: implementing pessimistic locking:
- Because we're still not sure of how to best implement sql server locking
with a VFP front end, my boss has suggested a customzied locking design.
His idea is that VFP would write to a central "lock" table when
a user decides to modify a record. Then if another user comes in and tries
to modify the same record, his session would check the contents of the central
"lock" table to see if the first user is already modifying the
same record.
From Code
Charge:
- The first method you describe the most common way on implementing concurrency.
Grab the last update datetime when it come to perform the update check this
field and action the update if the datetime has not changed.
An alternative
to actually doing the check before the update statement(that's one extra
round trip remember), is to use the Username and Datetime as part of the
where clause, then to check the number of records modifed by the update
statement. SQL server for example provides @@ROWCOUNT to give the number
of rows affected by the last statement.
An alternative method that
we are using to write custom update that only updates the ACTUAL changes
the user has made. So if User A changes The phone number, and user B changes
city only one field is updated per user, the script that CCS generates out
of the box doesn't handle this and you will need to write code to implement.
This method still has shortcomming but it's an alternative.
- To provide for the memo-type (textarea) problem, one could actually
show a message to the user "memo being edited by user: XYZ" thereby
triggering the users to contact each other (in a closed group) and agree
on a change policy.
- In our support system we use somewhat of an emulated pessimistic locking.
We use "date_locked" and "locked_by" fields to track
who locked each record and when. The records aren't locked automatically
but the user can view a record and then click the "Lock" button
to indicate that he/she is making concious decision to lock and edit the
record. The record stays locked for X amount of time or until the record
is updated. However, other users can press the "Unlock" button
and start making their own changes, which in our situation isn't very dangerous
because the main record isn't actually edited, just new support responses
are added.
From Handling
Concurrency Issues in .NET by By Wayne Plourde
- The only way to prevent concurrency errors is to lock the records that
are being edited. There are two basic approaches towards locking - optimistic
and pessimistic. One would consider an optimistic locking scenario when
the likelihood of a concurrency condition is low. This is usually the case
in systems where the activity is primarily additive, like an order entry
system. On the other hand, one would consider pessimistic locking when the
likelihood of a concurrency condition is high. This is usually true of management
or workflow-oriented systems. [...]
-
- One of the drawbacks of using pessimistic locks is that locks can be
set then forgotten. Therefore, you may need to provide some way for the
application to recover so that the entity in question can be accessed again.
One approach is to allow locks to timeout. In this case, you would need
to provide a DateTime data field to track when the lock was set. Then if
the lock is older then some predetermined amount of time, you would ignore
it, allowing users to access the record for updates.
-
- Another approach is providing the ability to define permissions to allow
an administrator to override or clear previously set locks. This may be
especially important if you are not providing timeouts.
To read:
Embedded databases
Embedded means that the database component should ideally live in a single
DLL containing both the client and server parts and save its data in the filesystem.
Obviously, this means that the database server offers no data protection from
external access, either locally or through the network. In other words, those
lightweight solutions that offer no server process are meant for single-user
usage since the only protection comes from the OS.
SQL
Ocelot
http://www.ocelot.ca
MySQL
- Embedded
MySQL Server in PowerBasic
- As of June 2003, not usable directly from VB due to its lack of static
linking to DLLs (ie. at compile-time). Could shell to a PowerBasic EXE to
make the actual calls
- The DLL is located in C:\mysql\Embedded\DLL\release\libmysqld.dll
FireBird
Speedy
- http://www.geocities.com/wabhar/
- No DLL provided, must compile C source code
- speedy1 is the embedded version; speedy is the TCP-based
client-server version
- speedydb is the SQL engine
- speedysh is the command-line client
- speedyst is used to manually stop speedydb
- Use speedysp to automatically stop speedydb when Windows shuts
down
- DB-Viewer is a GUI application to view/change Speedy-database
contents, DB-Designer is a GUI application to design Speedy-databases
- DB-Preparator generates database access functions, and DB-Generator
transforms database description formats
- To check out performance: launch the server by opening a DOS box and
running speedydb, create a database with "speedysh CREATE dirdb",
and run the test application with "dbdir", whi inserts 30000 rows
(each a pair of integers) and then accesses the rows in various ways
- The Speedy.dat-directory contains all information about the database
- No network support?
hSQLDB
SQLite
A free C library that implements a DBMS. Comes with a standalone command-line
access program that can be used to administer the database. Available here.
Doesn't provide a server process to allow safe simultaneous write access, but
relies on the file system locking mechanisme instead. Very fast when used with
transactions, on par with MySQL when used in asynchronous mode, much slower
than MySQL and PostgreSQL when used in synchronous mode. A mailing list is available
at Yahoo : http://groups.yahoo.com/group/sqlite/
To import tab-delimited records (ie. CSV):
- sqlite.exe mydb
- create table test (i integer primary key, name);
- copy test from 'import.txt';
If the CVS data don't have a primary key, here's a trick to import those
into SQLite:
- sqlite.exe mydb
- create temporary table table1 (...);
- copy table1 from 'import.txt';
- create table table2 (keyfield INTEGER PRIMARY KEY, ...);
- begin;
- insert into table2 select null, * from table1;
- commit;
In case you need reports, ie. long lock on a DB, you could create a temporary
table, and read from this database instead (http://groups.yahoo.com/group/sqlite/message/3565).
CQL++
TurboPower FlashFiler
Maverick
- Technically, not SQL but rather an open-source clone of the MultiValue
Database (MVDB or MVDBMS, originally known as the Pick Operating System),
more flexible than RDMBS'
- Written in Java, so requires JVM
- http://www.maverick-dbms.org/
Hashing, non-SQL
db.*
ZODB
My little base
http://www.mylittlebase.org : Delphi
or C++
EZTree
http://ezgui.com/
It is up to you the programmer, to make sure that you always pass your record
data with the same length for each record and the same length for the Index
Key.
Cheetah
Light, closed-source DBMS from Planet
Squires.
XDB
http://linux.techass.com/projects/xdb/ :
C++
BerkeleyDB
Open-source. Generates hashed files, hence poor man's storage system. Available here.
Tsunami
Light (one DLL), closed-source solution available here.
DvBTree
Available here.
CDB
MetaKit
http://www.equi4.com/metakit/
: C++, Python, Tcl
DynDB
http://www.ohse.de/uwe/dyndb.html
Goods
PureDB
http://www.pureftpd.org/puredb/
Free Database Engines
TurboPower FlashFiler
- FlashFiler is a client-server database for Borland Delphi & C++Builder.
It features a component-based architecture & the server engine can be
embedded in your applications. FlashFiler is easy to configure, performs
well, & includes SQL support
- http://sourceforge.net/projects/tpflashfiler/
Speedy
GNU SQL server
Adabase PE
PostGreSQL
Open-source solution available here.
SAP DB
http://www.sapdb.org/
Borland Interbase
An open-source version is available here,
here
and here.
FireBird Interbase
Enhanced version of Interbase. Available here.
Because Borland's open source efforts regarding InterBase never really took
off beyond prime release of the source code and the company returned its focus
to closed commercial development, Firebird became THE Open Source version of
InterBase. Yet more information here.
Jet
A complete ISAM relational database engine that is 9/10ths of Microsoft Access,
for free. it's preinstalled on Windows 2000. Does not scale. General-purpose
interface.
MSDE
Scaled down version of the SQL 7.0 engine, royalty free, and optimized for
up to 5 concurrent users
MySQL
Open-source solution available here
MDBMS
OpenIsis
http://openisis.org
Commercial Database Engines
Besides the usual suspects (Oracle, DB2, Sybase SQL Server, MS SQL Server,
etc.)...
Advantage
mSQL
CodeBase
Available here.
Informix SE
http://www-3.ibm.com/software/data/informix/se/
APIs
DAO
Designed to access the Jet database. Can't retrieve non-relational data.
Being superseded by ADO, and ADO.Net.
RDO
ADO
Enhancement to DAO. Requires ADO drivers. Used to access SQL DBMS's. Unlike
ODBC, can access both relational and non-relational data.
OLEDB
Used by ADO to access actual data.
ODBC
Limited to relational data.
Q&A
What is MDAC used for?
Basic database-related stuff that you need to install in Windows before installing
such and such database-specific ODBC driver?
Temp stuff
- From: Jon Rista (jrista@hotmail.com)
- Subject: Re: Should ODBC be used ?
-
- ADO is faster if you use it properly. There are lots of little "things"
to be aware of when using ADO, such as lock type, cursor location and type,
etc. Proper combinations of different ADO settings can result in very good
performance.
-
- The best results for SQL Server come when you use the native OLEDB provider
for it. When you use the OLEDB provider for ODBC, you add at least one extra
layer of abstraction between your presentation tier and the data itself.
This decreases performance. Whenever possible, you want to use the native
OLEDB provider for SQL Server, as it provides direct access to the database,
no intermediate layers (other than ADO) are required. If you use a properly
configured connection with a database with proper indexes and constraints,
SQL Server 2000 can be faster than Oracle, and significantly faster than
IBM DB2.
Resources