|
VISN 20 Data Warehouse MetaData
Database Overview
The VISN Organizational Data Store (VODS) database is our implementation of a normalized relational database containing data extracted from the VistA database of each facility in VISN 20. The actual database name is vodsdb.
Typically the database schema is presented as an Entity Relationship (ER) diagram. To interpret our ER diagrams the following terms are defined.
Database Terms
A table is the collection of rows and columns that hold data about an article, e.g. Patient. A table corresponds to a file in VistA. Entity is also sometimes used interchangeably with table but usually refers to the logical (as opposed to the physical) correlate of a table.
Each record or row holds data about one instance of that entity, e.g. about Mary Jones.
Each column or field contains data describing an attribute of that entity, e.g. LastName. Attribute is also sometimes used interchangeably with field but usually refers to the logical correlate of field.
Database Keys
The understanding of primary keys and foreign keys is necessary if one is to query the VODS database directly and take greatest advantage of the resource.
Each table has a primary key, which is one or more fields that together make the row unique. For example, in VistA each patient is assigned a unique number (called DFN). The Data Warehouse uses that number (we call it PatientID). The PatientID, however, is not sufficient to uniquely identify a patient because eight VistA databases have been merged together. There may be a PatientID 999 from Puget Sound, for example and a different PatientID 999 from Walla Walla. Therefore we use PatientID together with a Site number to make the primary key for the Patient table. John Smith may have PatientID=123 and Site=663. The primary key for the Prescription table is PrescriptionID and Site, e.g. PrescriptionID=444 and Site=663. Null (empty) values are never allowed in primary key fields.
The relationships between entities can be of several types but are most often one-to-many. For example, there is a one-to-many relationship between the Patient entity and the Prescription entity. One record in the Patient table (for John Smith) can have several associated records in the Prescription table. The one side of the one-to-many relationship is the parent table (e.g. Patient) and the many side is the child table (e.g. Prescription).
Foreign keys are how relationships are implemented; a foreign key is set of field(s) in the child table that corresponds to the primary key of the parent. PatientID and Site make up a foreign key in the Prescription table to the Patient table. Records in the Prescription table that have PatientID=123 and Site=663 belong to John Smith because PatientID=123 and Site=663 defines John Smith in the Patient table. A table can have several foreign keys. The Prescription table has another foreign key consisting of the LocationID and Site fields that points to a LocationList table that has the description of the pharmacy location.
Entity Relationship (ER) Diagrams
The primary references describing the VODS database are the ER diagrams and the Data Dictionary; both are published on this web site. The ER diagrams show the table names, the field names and primary key fields of each table, which tables have relationships, and which fields are included in foreign keys.
Our ER diagrams are divided into subject areas to make it more manageable when viewing the diagrams. However, logically all subject areas are part of the same large model.
An ER diagram is a roadmap for the data model. Such a diagram can describe the relationships in the logical model or the physical model. Our published ER diagrams describe the physical implementation; that is they describe the actual tables of the vodsdb database. If you choose to print the diagrams rather than view them online, you should remember that the database schema of the vodsdb database can change as development continues. Check the web site frequently for the most recent version of the ER diagrams and other documents.
ER diagrams not only show the (one-to-many) relationships but also indicate on which fields to join tables when doing queries. The guiding principle is to use the fields of the primary key of the parent table. It is easily remembered in that both words begin with "p". The actual mechanics of joining tables is taught in the SQL class taught by the Data Management development team. The class emphasizes on the use of the ER diagrams.
Legend
The legend for the ER diagram refers to identifying and non-identifying relationships. An identifying relationship is one in which the primary key of the child table contains the foreign key to the parent. For example, Prescription – PrescriptionFill is an identifying relationship because the primary key of the child PrescriptionID/Site/FillID contains the foreign key PrescriptionID/Site. In practical terms this means that the child depends on the parent for its identity and cannot exist without it. It would make no sense to have a fill when there is no prescription. This is contrasted with the non-identifying relationship of LocationList – Prescription. There could be a Prescription record even if the LocationID for a particular Prescription record were left blank. Generally, look for identifying relationships between data tables and non-identifying relationships between a data table and a lookup table.
One to many relationship (identifying)

One to many relationship (non-identifying)

Primary Key (PK)

Foreign Key (FK)

Primary/Foreign Key (PK/FK)

Non Identifying (NI)

Data Dictionary
The Data Dictionary is a listing of each Data Warehouse field. As of the date of this writing only VODS data elements are listed in the DataDictionary; OLAPdatabase elements are described separately on this website. When a Data Warehouse field comes from a data source (e.g. Vista fields) without any alteration that source is published in the Data Dictionary in the columns labeled "Source…". The columns of the Data Dictionary include:
FieldID: Unique Data Dictionary identifier
DWTableName: Data Warehouse table to which field belongs
DWFieldName: Name of the Data Warehouse field
DWDataType: Datatype in the native platform (Microsoft SQL Server for VODS)
DWFieldDescription: Description of field authored by DW developers. This is often left blank if the field is imported from another source (e.g. VistA). In that case the description from the source is published in SourceFieldDescription.
SourceEntityName: File or table to which the source field belongs.
SourceFieldName: Fieldname used in the source system (e.g. VistA) for the source field.
SourceFieldDescription: Description assigned by source system for the source field.
SourceFileNum: Number used by source system for file or table to which source field belongs.
SourceFieldNum: Number used by source system for source field.
SourceDataType: DataType in source system for source field.
SourceSetOfCodes: Set of codes used as data domain in source system for source field.
SourceFieldPath: Chain of file number and field numbers to designate a multiple fields path to a VistA source field. For example, 200: 8932.1, 3 means file 200, field 8932.1 that is a subfile, field 3 designates the DATE field in the PERSON CLASS file.
SourceSystem: System from which data was collected. Many are "VISTA". Some list the VistA package also (e.g. VISTA:VISIT TRACKING)
The Data Dictionary is presented on the CHIPS web site as a spreadsheet. To search for a particular value, place the cursor in a cell in the column you want to search, pick Find from the Edit menu and type the value. Select Search By Columns for a search of the relevant column first.
Our Data Warehouse
VID
We have chosen to use the identifiers for patients and staff already assigned by the VistA systems as part of our primary keys. Since these identifiers are unique only to each facility, however, we also need to include the site identifier to make them unique in our database.
There are times when identifying corresponding records across facilities is required. For example, we may want to retrieve all records associated with staff Mary Jones but Mary Jones is identified in three different facilities with three different StaffID’s. We have generated VISN-wide identifiers (VID) for staff and patients. Ideally each of the three records in the Staff table for Mary Jones would have the same VID. Where SSN is valid and matches across facilities the same VID is assigned to each Mary Jones. Where SSN is not available or is invalid each Mary Jones is given a different VID. For this definition a "valid" SSN is one that has nine numeric characters and the first three characters are not "000". As of February 18, 2000 the VIDs are persistent, meaning that the same VID is applied to the same person in subsequent refreshes of the data warehouse. Although each VID is unique to an SSN, VIDs are not generated algorithmically from SSN. Note that if an SSN is corrected in the source database, the VID for that person will change from the previous refresh.
Currently there is no mechanism in place for matching staff or patients across sites other than by SSN. We make no attempt to look at names. The above discussion applies to patient VIDs as well as staff VIDs.
Lookup Tables
Tables that contain lists of values or relatively static data are named with a suffix "List". Examples include ClinicStopList, EligibilityCodeList. These tables typically have no patient data. Normally, joins to these list tables require joining on two fields, an identifier and Site. For example, ClinicStopList is joined to OptClinicStop on ClinicStopID and Site. The one table that is an exception to this naming convention is the ProblemList table, which is not a lookup table, but since the VistA contents are well known as the Problem List we chose to use that name.
When the records in a list table are exactly the same for all facilities in the VISN we use a suffix "VISNList". The redundant records have been eliminated and there is no Site field in these VISN list tables.
Naming Conventions
We have tried to give names to tables and fields that are meaningful. A few, PTF for example, reflect the names from the VistA source. All objects in the vodsdb are case sensitive. This means that tables, fields, stored procedures, everything must be spelled with the exact upper/lower case. Thi makes is more cumbersome but pays off in greater database speeds. Our convention is to begin each word in a name with upper case and use no separator characters, e.g. PrescriptionFill. Objects are named in the singular. Abbreviations are done similarly, e.g. OptDiagnosis. Acronyms, however use all upper case, e.g. PTFDiagnosis.
Field names that end in ID are usually key fields (Primary or Foreign), often coming from VistA IEN (Internal Entry Numbers). With only a couple of exceptions they have a datatype of integer.
Field names that end in Record are fields used in some primary keys. The values are arbitrarily assigned each time the Data Warehouse is refreshed and are non-persistent (meaning they will have different values in the next data refresh). These values are used only to join tables and have no inherent meaning.
Relational Model
The Relational Model was developed formally in the 1970’s as a set of rules for generating a logical data model. Since then, many database products have been developed that use the relational model. The language that has developed to manipulate and access a relational database is called Structured Query Language (SQL). Relational databases, sometimes referred to as SQL databases have become the standard for most general-purpose databases used by large organizations.
The Relational Model is characterized by the elimination of data redundancy. This helps in maintaining the integrity of the database. The process called normalization, applies the rules to generate a logical data model consisting of entities, their attributes, and the relationships between entities. A simplified version of the rules that prescribe the model, together with an example application are given in Appendix A.
The physical implementation of a logical model into a real database schema can be made in a number of different ways (including non-relational). Often, relational databases are constructed in a way where the physical tables closely correspond to the logical entities. In this documentation we concentrate on the physical database schema.
Miscellaneous
Transformations
Much of the data that is extracted from VistA is in a structure that needs to be transformed. We transform the data to a normalized schema that largely reflects the logical structure of the data. This also allows flexibility in creating other denormalized structures when they are required. Following are examples of transformations that we make to normalize the data and how queries would look if done on denormalized structures as they might come from VistA versus on normalized structures.
Example Queries
Users that want to take greatest advantage of the Data Warehouse are encouraged to take a course in the SQL language. CHIPS offers a two-day course. Below are some simple queries to illustrate the basic concepts.
-- Retrieve all the fields and all the rows from the Patient table
Select * from Patient
-- Retrieve a count of patients at Puget Sound
Select count(*) from Patient
where Site=663
-- Retrieve counts for each site
Select Site, count(*) from Patient
group by Site
--Retrieve Puget Sound patients prescribed DrugID=277 (ASPIRIN 325MG EC TAB ) sinceJan
Select distinct a.PatientID, a.Site from Patient a
Join Prescription b
on a.PatientID=b.PatientID
and a.Site=b.Site
where b.DrugID=277
and b.IssueDate>'1/1/1999'
and a.Site=663
-- Retrieve Puget Sound patients having LabTestID=2 (GLUCOSE)
Select distinct a.PatientID, a.Site from Patient a
join LabChemResult b
on a.PatientID=b.PatientID
and a.Site=b.Site
where b.TestID=2
and a.Site=663
-- Retrieve Patients having DrugID=277 and LabTestID=2
Select distinct a.PatientID, a.Site from Patient a
join Prescription b
on a.PatientID=b.PatientID
and a.Site=b.Site
join LabChemResult c
on a.PatientID=c.PatientID
and a.Site=c.Site
where b.IssueDate>'1/1/1999'
and b.DrugID=277
and c.LabChemTestID=2
and a.Site=663
-- Retrieve Patients prescribed drugs from NationalDrugFileID=964 after Jan 1, 1999
Select a.PatientID, a.Site from Patient a
join Prescription b
on a.PatientID=b.PatientID
and a.Site=b.Site
join DrugList c
on b.DrugID=c.DrugID
and b.Site=c.Site
where b.IssueDate>'1/1/1999'
and c.NationalDrugFileID=964
Tips for Queries
Most table joins must include the Site as one of the fields on which to join. For example to get the name of a patient’s eligibilities do the query:
Select a.*, b.Name from Patient a
join EligibilityCodeList b
on a.EligibilityCodeID=b.EligibilityCodeID
and a.Site=b.Site
A few tables (CPTVISNList, IcdDiagnosisVISNList, IcdOperationVISNList.) are exceptions where codes are the same for all the sites. For example:
Select a.*, b.CPTCode from OptCPT a
join CPTVISNList b
on a.CPTID=b.CPTID
When joining tables be aware that if any field on which you join is null the record will not be returned. For example, the following query will miss patient records if the Patient.RaceID field is null.
Select a.*, b.Name from Patient a
join RaceList b
on a.RaceID=b.RaceID
and a.Site=b.Site
Create temporary working tables in the researchdb or admindb database. You can join tables these tables with tables in the vodsdb database. For example, you have created a table of 300 patients in the researchdb database called MyPatients. You want to see the hospital admissions for these patients which is in the PTF table in the vodsdb database. The query might be:
Select * from vodsdb..PTF a
join researchdb..MyPatients b
on a.PatientID=b.PatientID
and a.Site=b.Site
When finished delete the table.
Drop table researchdb..MyPatients
For repeated queries on a large table consider creating a smaller table containing only the subset in which you are interested. For example, you have a small cohort (several hundred) patients for which you have created a table called Cohort. You will be querying about pharmacy data many times for these patients with various drugs an in combination with other search criteria. The following query may give the desired result but may be slow because it operates on the large table Prescription.
Select a.* from reseach..Cohort a
join Prescription b
on a.PatientID = b.PatientID
and a.Site = b.Site
where DrugID = 999
and . . .
A better approach might be to populate a table in researchdb called CohortPrescription from the following query.
Insert into research..CohortPrescription
Select b.* from Cohort a
join Prescription b
on a.PatientID = b.PatientID
and a.Site=b.Site
Subsequent queries can then use the much smaller CohortPrescription table.
Select a.* from reseach..Cohort a
join research..CohortPrescription b
on a.PatientID = b.PatientID
and a.Site=b.Site
where DrugID=999
and . . .
|