In any real project you should make both Conceptual Models and Normalized Models and express them using data dictionaries and ERDs. As a general rule, the conceptual model give you a simpler picture of the data needed in a project. Normalization provides a complete and detail design for a data base. You can do informal and mental normalization as you do conceptual modelling. Your conceptual models will proved names for tables that you later derive during normalization. And sometimes you will uncover a gigantic mismatch between what people think (their conceptual model) and what the data actually is. In this case you have an interesting problem to solve -- political, psychological, and sociological. Programming will probbly make the situation worse if you don't resolve the conflict first.
In this page I will tell a story of data retrieval, review the words and processes we use, Give some hints, give rules for mapping tables to ERDs, and a
UML profile for modelling data.
Story -- what do you mean by data item
I was once (1970s) analyzing a Social Work Department in a part of London.
One of the workers, Edna, had an effective manual
system for retrieving information
on clients. She used 3><5 cards with data written on
them and a large automated storage divice for them. A typical scenario:
By the way... I recommended not computerizing this process.
Data Tables and Normalization
Pay attention to identifiers and how they are assigned. A system can work or fail depending on how it handles them. For example, a university library (a long way from here) saved thousands of dollars by assigning identifiers to books that were meaningless serial numbers, and letting the computer store the relation between this identifier and the traditional Library of Congress card number.
In computerized system, identifiers are mirrored by keys. All data is stored as a set of records of the same type in a file or table. Each record is a set of data fields -- attributes of some real entity. In these records, there should be special field called a prime key. A prime key uniquely identifies one record in the table (= file). The value of the prime key is unique in the table. The rest of attributes in the record all depend on that key. Keys act as surrogates to records. We can then store the value of the key in another record were it links back to the record. Here it is a foreign key because it identifies a record in a foreign file/table. So a foreign key is an field in a table/file that is a [ prime key ] in a different table/file.
Candidate Key
A candidate key for a table can be any set of fields in the table that
might be used to uniquely identify the rows in the table.
Prime Key
A prime key is a set of fields in a table that uniquely identifies a row in the table.
Foreign Key
A foreign key is a field in a table that is a prime key in a different
table.
A foreign key defines a natural many-to-one relationship between one table
and another. There is one row identified by the prime key. But
the same value can appear in many records as a foreign key.
In a diagram you can connect the
foreign key to its prime key.
You will also hear about OIDs:
You will also find Secondary Keys in many data bases. A secondary key identifies a set of rows in a table. A database system is told that about these so that it can attempt to optimize performance by creating indexes.
Notation for keys
In word processed texts and writing people underline
In the UML keys are attributes.
Prime Keys are marked <<PK>> and foreign keys <<FK>>.
prime keys and put asterisks infront of foreign keys.
In ASCII the simplest technique is to indicate a foreign key by putting
an asterisk in fron of it, and a double asterisk in front of the
prime key.
Report(**number, author, date, title, ...)
. . . . . . . . . ( end of section Keys) <<Contents | End>>
Extracting data structure from given data
( **Identifier, data, (repeating group ) , *foreign_key )
For example, you look at some samples of "Customer Order" which describes some items that a customer wishes to get from us. When we look at our sample we notice that whenever a customer orders an item with product number "WI007B" it is described by "Widget size 7 type B". In fact the product number is all we need to know to predict the product description. We say to our selves: "Product description depends on Product number". This is an example of a dependency.
Looking further we notice that the price seems to depend on the product type in some of our sample, but not in others. Sometimes a "WI007B" costs $5.00 and sometimes $4.95. So we say that "Price does not depend on the Product Type".
If we go and interview the clerk who handles Customer Orders. We discover that different customers get different discounts, and that the price varies with time. So you can say "The price depends on the time, the customer, and the product type".
You'll find that experts tend to use the phrase "Functional Dependency".
Here is an explanation why.
Net
then they say that "sin is a function". They mean that each real number x has a real number called sin(x). Similarly, in C/Java/C++ when we declare
double distance(Point x, Point y);we are saying that the distances depends on two Points.
The rule is that if x R y is a function from x to y then
So, the relation between Product Number and Product Description (above) is a function!
. . . . . . . . . ( end of section Functional Dependencies) <<Contents | End>>
Normalization Process -- The key, the whole key, and nothing but the key
Memorize the mnemonic, and practice the following steps. Normalization
has numbered levels: Zero normalized form, 1st normal form, 2nd normal form, 3rd normal form,
... We abbreviate these: 0NF, 1NF, 2NF, 3NF, ...
Copy/paste each successive stage into the next (DEMO in class if time).
Show keys by typing an asterisk "*" in front. Use "**" for Prime Keys and "*" for Foreign ones. Show repeated groups in parentheses:
Name( **prime key, *foreign key, attribute, (repeating group ) )
First the sample of data:
W-Z0137 - SECURITY - YOU MAY NOT UPDATE ON THIS SCREEN
105 Course Section Scan
S
Screen: ___ SID: _________ Course: CSCI Term: 071 G S
tS SEATS INSTRUCTORS/ PRIM R T CALL
SECeS GT AVAIL DAYS TIMES SUBTITLE MAX ENR UNITS ACT P A NO.
CSCI-121 - COMPTR TECH&PEOPLE
01 76 T 1200-0150PM WATSON P 80 4 2.00 LEC O 10649
CSCI-122 - BIOINFORMATICS
01 113 M 0200-0350PM DEVLIN K 120 7 2.00 LEC O 10650
02 118 T 1200-0150PM DEVLIN K 120 2 2.00 LEC O 10652
CSCI-123 - USING COMPTR SOFTWARE
01 30 M 0900-0950AM PERLEPES T 30 2.00 LEC A O 10653
02 NG 30 M 1000-1150AM PERLEPES T 30 0.00 ACT A O 10654
Activity
CSCI-124 - EXPLORING INFO SUPERHIGHWAY
01 59 R 0200-0250PM WATSON P 60 1 2.00 LEC A O 10655
02 NG 29 R 0300-0450PM WATSON P 30 1 0.00 ACT A O 10656
Activity
03 NG 30 R 1200-0150PM WATSON P 30 0.00 ACT A O 10657
Activity
....
CSCI-365 - SYSTEMS ADMINISTRATION
01 30 TR 0600-0715PM HAN K 30 4.00 LEC O 10681
02 NG 30 T 0730-0920PM HAN K 30 0.00 ACT O 10682
Activity
CSCI-375 - REQUIREMENTS ANALYSIS AND DESIGN
01 30 TR 0400-0600PM BOTTING R 30 4.00 LEC O 10683
...
In this example there are two (nested) groups -- data determined by the class and those depending on the section.
In this example, we notice that the course determines it's title. And extract Course from Offering (inveneted name).
In this case the "mode" and the "call#" determine other data. This gives us two new table. In the Section table we have: the dept, term, course, and section depend on the call# and so the prime keys are demoted to foreign keys.
More: when I asked the head of department about how the maximum enrolment was chosen he explained it was a property of the room.
. . . . . . . . . ( end of section Examples of Normalizing Sample data) <<Contents | End>>
Hints
. . . . . . . . . ( end of section Mapping 3NF back to ERD) <<Contents | End>>
<<table>>These are often associated with a particular graphic notation.
Tags
provide extra information. They take the form: {tag_name=tag_value}
Constraints
describe additional rules that should hold in a model. They
are statements that must be true. There is a specialized language
(the OCL)
for writing them if needed. I've given you links to some data base
profiles below that I think are too complex for this course.
UML & Data bases
| stereotype | a special kind of | meaning |
|---|---|---|
| conceptual model | class diagram | picture of the world outside the computer in terms of entity types and relationships between entities. |
| logical data base | class diagram | A picture of an ideal data base independent of a particular technology or technological requirements. |
| physical data base | class diagram | A detailed solution, for a particular technology, to a set of data base requirements. |
| table | class | database table |
| PK | attribute | Prime key |
| FK | attribute | Foreign key |
| view | class | view of database(for a function) |
| index | class | index into a database(for a function) |
| Constraints |
|---|
| In conceptual models do not show operations, keys, tables, views, indexes, composition, aggregation, etc. |
| In logical and physical data bases unmarked classes are tables. |
| Use <<table>> only if you have a diagram that has both persistent (data base) classes and run time objects in the same diagram. |
. . . . . . . . . ( end of section UML & Data bases) <<Contents | End>>
. . . . . . . . . ( end of section ISSA Flyer) <<Contents | End>>
Info Service Schedule
. . . . . . . . . ( end of section Info Service Schedule) <<Contents | End>>
. . . . . . . . . ( end of section Online Resources and Examples) <<Contents | End>>
Review Questions and Exercises on Normalized Data
. . . . . . . . . ( end of section Template) <<Contents | End>>
. . . . . . . . . ( end of section Detailed Data Design) <<Contents | End>>
Abbreviations
Also see
[ glossary.html ]
for more special abbreviations and phrases.
End