Wednesday, 15 June 2011

Oracle - Surrogate Key, Natural Key, Primary Key and ROWID

Surrogate Key

Surrogate key is a substitution for the natural primary key.
It is just a unique identifier  or number for each row that can be used for the primary key to the table. The only requirement for a surrogate primary key is that it is unique for each row in the table.

Data warehouse typically use a surrogate key, also known as artificial or identity key. They can use sequence generator, or Oracle sequence, or SQL Server Identity values for the surrogate key.

It is useful because the natural primary key (i.e. Customer number) can change and this makes updates more difficult.

Some tables have columns such as AIRPORT_NAME or CITY_NAME which are stated as the primary keys (according to the business user), but not only can these change, indexing on a nemuric value is probably better and you could consider creating a surrogate key called - AIRPORT_ID, this would be internal to the system and as far as the client is concerned you may display on the AIRPORT_NAME.

The surrogate is internally generated by the system and may not be invisible to the user or application, but you can see it in the Database.

Approaches to generate surrogates include:
  • Universally Unique Identifier (UUID)
  • Globally Unique Identifier (GUID)
  • Sybase or SQL Server identity column IDENTITY or IDENTITY (n,n)
  • Oracle SEQUENCE
  • PostgreSQL serial
  • MySQL AUTO_INCREMENT
  • AS IDENTITY GENERATED BY DEFAULT in IBM DB2

Surrogate keys do not change while the row exists, so the application cannot lose their reference to a row in the database (since the identifier never changes)

To create SEQUENCE in Oracle:
-----------------------------------------
create sequence OraSeq
start with 1
increment by 1
nomaxvalue;
-----------------------------------------
CREATE SEQUENCE supplier_seq
    MINVALUE 1
    START WITH 1
    INCREMENT BY 1
    NOCACHE;
--------------------------------------------------------------
CREATE SEQUENCE supplier_seq
    MINVALUE 1
    START WITH 1
    INCREMENT BY 1
    CACHE 20;
--------------------------------------------------------------
the cache option specifies how many sequence values will be stored in memory for faster access

--------------------------------------------------------------
alter sequence seq_name
increment by 124;

select seq_name.nextval from dual;
alter sequence seq_name
increment by 1;

--------------------------------------------------------------

The SEQUENCE object you created will be listed under Sequences

To use it:

insert into MyTable values(OraSeq.nextval, 'val1', 'val2', 'val3', 'val4');

Note that sequences sometimes appear to skip numbers because Oracle caches them to be sure that they are always unique, so this may not be your ideal solution if you want it to be sequential, it's mostly sequential and always unique.

Oracle does not have IDENTITY Type as in SQL Server.

Natural Key

In relational model database design, a natural key has a logical relationship to the attributes within that row. 
The main advantage of a natural key over a surrogate key, which has no such logical relationship, is that it already exists; there is no need to add a new, artificial column to the schema. Using a natural key also simplifies data quality: It ensures that there can only be one row for a key; this "one version of the truth" can be verified, because the natural key is based on a real-world observation.
The main disadvantage of choosing a natural key is that its value may change and the relational database engine may not be able to propagate that change across the related foreign keys. For example, if person_name is used as the primary key for the person table, and a person gets married and changes name, then all of the 1-to-many related tables need to be updated also. The secondary disadvantage of choosing a natural key is identifying uniqueness. The primary key must consist of the attributes that uniquely identify a row. However, it may be difficult to create a natural key on a table. For example, if person_name is used as a primary key for the person table, many persons may share the same name and all but the first entry will be rejected as a duplication. The uniqueness constraint may be overcome by adding an additional column to the primary key.
Primary Key
 ....
ROWID
Oracle uses ROWID datatype to store the address of every row in database.
Each table in Oracle database internally has a pseudocolumn named ROWID. The pseudocolumn is not evident when listing the structure of a table by executing [select * from...] statement, nor does the pseudocolumn takes up space in the table. However, each row's address can be retrieved with a sql query using the reserved word ROWID as a column name:

select rowid, name from sequencetest;

In my case, I see the rowid is AAANdlAABAAA08iAAA

You can not se the value of the pseudocolumn ROWID in INSERT or UPDATE statement, and you cannot delete ROWID value. Oracle use the ROWID values internally for construction of indexes.