United States Department of Veterans Affairs
United States Department of Veterans Affairs

VA Northwest Health Network

Overview and History

History and Overview
Data Contents of Data Warehouse
Data Architecture
Data Quality
Resource Citizenship
Software Tools
 

Data Warehouse Navigation
MetaData
ER Diagrams
VISN 20 Cubes
Research Requirements
Home

 

 

 

 

 

 

 

 

 

 

History & Overview

In the 1990s VHA organized into 22 administrative regions called Veterans Integrated Service Networks (VISN).  VISN 20 services veterans in Alaska, Idaho, Oregon and Washington.  However each individual facility continued to maintain its own clinical database (VistA); the databases were not integrated as a VISN.  In 1997-98 VISN 20 began an initiative called CHIPS, to develop a VISN wide information system for decision support, performance measuring and population studies.

The result was the CHIPS Data Warehouse, a collection of databases containing (VistA) data extracted from all the facilities.  It went online in March 1999.  Since that time new elements have been added almost monthly.  In 2002 data streams were added from the National Data Extracts of DSS and the VERA costing model from ARC. 

Having VISN-wide data in a single repository created opportunities that were not present with the stand-alone operational VistA databases.  A centralized VISN database has helped identify and reconcile long standing problems arising from differences in coding practices and procedures between facilities.  Policy issues have also become clearer, such as the need to assign a patient to a single primary care provider across the VISN.  The Data Warehouse has been used for VISN-wide examination of utilizations and costs, disease management, research, measurement of quality, measurement of performance and the testing of clinical guidelines. 

Another important advantage that the Data Warehouse has over the clinical operational systems is it’s accessibility through a variety of modern software tools.  Software can be chosen on the basis of how appropriate it is for the task at hand or how familiar it is to the customer.  This is a departure from the existing model for operational systems where the primary access is through custom applications programs (e.g. CPRS) written to accomplish specific tasks. These operational systems use a closed architecture.  Extracting or querying data often requires a programmer’s time which can be limiting. 

The Data Warehouse uses a more open architecture.  We have created a Data Store (using Microsoft SQL Server) that employs industry standards (SQL, ODBC, OLE-DB, MDX) to allow diverse customers (and software) to connect to the databases.  Both developers and customers can connect directly to the Data Store with query tools, spreadsheet software, statistics packages, etc.  Developers and customers alike have created a myriad of specialized subset databases (data marts) and software applications from the Data Store.

In addition to the Data Store the Data Warehouse includes a suite of topic-specific databases based on the OLAP (Online Analytical Processing) technology.  These have been developed by collaboration between the Data Warehouse developers and content experts (e.g. pharmacists).  Customers get a view of summary (aggregate) information from OLAP databases (also called cubes) using a browser for that purpose.

In 2004 the VISN established a Knowledge Management Office to oversee all VISN data activities, including CHIPS.  The CHIPS name was changed to VISN 20 Data Warehouse.  The Development Team is known as the VISN 20 Data Management Team.

Other VISN's have independently developed data warehouses but VISN 20 remains the leader in the country, together with VISN 16.  Presently there are initiatives to implement national Data Warehouses.  The Financial and Clinical Data Marts (FCDM) that are available to every VISN in the country are one direct result of the work done by VISN Data Warehouses Programs.  The Health Data Repository will also have a Data Warehouse component.

 

Data Contents of Data Warehouse

The VISN 20 Data Warehouse extracts data from most of the clinical packages of VistA, from the National Data Extracts for DSS and from the VA Allocation Resource Center (ARC).  The Data Store contains all of the data collected in the Data Warehouse.  It includes the following subjects, which roughly correspond to specific VistA software packages.

  • Allergy
  • Appointments
  • Consults
  • CPRS Orders
  • DSS inpatient costs
  • DSS outpatient costs
  • Fee Basis (Inpatient)
  • Fee Basis (Outpatient)
  • Lab Autopsy
  • Lab Chemistry
  • Lab Cytology
  • Lab Microbiology
  • Lab Surgical Pathology
  • Inpatient records (PTF)
  • Outpatient Visit
  • Patient demographics
  • Pharmacy (BCMA)
  • Pharmacy (Inpatient IV)
  • Pharmacy (Inpatient Unit Dose)
  • Pharmacy (Non VA)
  • Pharmacy (Outpatient)
  • Prosthetics
  • Pulmonary
  • Radiology (No Nuclear Medicine or report text)
  • Surgery
  • TIU (Text documents e.g. progress notes, discharge summaries. Stats only, not actual text)
  • VERA costs (costs per patient assigned by ARC) and allocations
  • Vitals

In addition, there are specialized data marts that group subsets of data for specific purposes.  These take various forms such as multidimensional (OLAP) databases, a custom web-based reporting application called AHM (Advanced Health Management) and other relational databases.  Subjects include:

  • Congestive Heart Failure (CHF)
  • Credentialing & privileging
  • Data quality
  • Diabetes
  • Drug interactions
  • Outpatient encounters (counts)
  • Eye care
  • Fee basis
  • Hepatitis C
  • Hypertension
  • Ischemic Heart Disease (IHD)
  • Multiple Sclerosis disease registry
  • Nonvested patients
  • Narcotics
  • Pharmacy processing (inpatient)
  • Prescription costs
  • Prosthetics
  • Utilizations/costs for various disease cohorts
  • Utilizations/costs for primary care
  • Utilization/costs for high utilizers

Data Warehouse customers also create data marts.  These vary greatly in scope and complexity.  Examples include research cohorts, data sets for budget analysis and facility planning.

New data is added as the Data Warehouse evolves. The detailed contents are documented in the ER diagrams and in the MS Excel Document   Data Dictionary.  The Metadata Overview gives instruction on how to use those documents.

Data Architecture

Diagram 1 illustrates the architecture of the Data Warehouse. Data are extracted monthly from each medical center's VistA system.  As of this writing data are collected from about 180 files at each medical center; those data are used to generate about 216 tables in the Data Store.

DWArchitecture

Diagram 1

The files are collected onto a staging server.  They are cleaned and imported into the VISN Organizational Data Store (vodsdb) that is the heart of the Data Warehouse. The vodsdb is a "normalized" relational database that contains all the data of the Data Warehouse in all its detail.  "Normalization" refers to a set of rules specifying how the data are structured.  All the other databases, generated for particular uses, draw their data from the vodsdb.  One exception is that the Multiple Sclerosis Disease Registry accepts source data directly from the AAC (Austin Automation Center).

OLAP (Online Analytical Processing) is a technology in which subset databases (cubes) and predefined queries are used for faster performance than would be possible from the large normalized database.  Customers can make use of this technology for example, to display counts or costs when examining practice patterns or pharmacy utilization.  The software that VISN 20 supports for displaying OLAP cubes is from *ProClarity  ©.

The vodsdb database is the foundation for the application programs and other databases and data marts.  A read-only copy of this database is provided for customers that need the most flexibility and power in querying the data.  On the same powerful server there is also a database called userdb, intended as a place where customers can write intermediate or temporary data while using the vodsdb.

 

Data Quality

The VISN20 Data Warehouse provides new ways to view VistA data. As such, many data quality problems that were difficult to detect in the past have surfaced.  Data quality is an ongoing effort, one in which all levels of the organization are engaged.  The Data Warehouse makes it easier for "stewards" of the data to see the problems.

Customers are expected to report problems that they find in the data. Problems might include erroneous values or missing values that should not be missing. These problems should be documented and emailed to the VA Outlook email group VISN20 CIO DM Data Quality. When appropriate, the problems will be passed on to those responsible for the source system.

  

Resource Citizenship

It is expected that customers will educate themselves and act responsibly regarding their use of shared resources. The Data Management Team monitors these resources and only if it becomes necessary will we restrict customer activity.

Network traffic: The huge tables of the Data Warehouse make it very easy for customers to retrieve huge amounts of data (millions of rows) with a simple query. This kind of careless use could immediately clog the network with traffic. Please follow these guidelines.

  • Select only the rows and columns that you need, not entire tables.
  • In rare cases where large queries (>500,000 rows) or downloads are required, please check with the Data Management Team or the local network manager.

CPU: This resource is used for all processes run on the server. The same guidelines apply as for the network bandwidth resource.  Before large queries are run test them on small subsets of data.

Database disk space: Customers can create their own tables in the userdb database. This database is shared among many customers and will not be managed by the Data Management Team. It is intended for working space and not for permanent storage. Customers should delete their tables when they are finished with them. Customers should consider using temporary tables (designated with a #) which do not remain on the server after the customer disconnects.

 

Software Tools

A number of different software tools can be chosen to access the Data Warehouse. Listed below are some broad categories of tools and some typical software in each category.  No tool is appropriate for all uses, and no tool can simplify a task.  Much of the effort of translating a question into the "language" of the tool often involves refinement of the question (i.e. eliminating ambiguity) and understanding the complex data.  This understanding can often only come from people who “own” the data (i.e. content experts).  Investigations often require a collaboration of technical and content experts.

Command line query tools. Typing SQL queries from a command prompt represents the fastest-executing and most flexible environment. While the syntax of the SQL language is simple, its use can be complex if the data is complex. The Microsoft command line tool is called Query Analyzer.  There is also an interface for the operating system command line called ISQL.  

PC database front ends. Programs such as Microsoft Access are useful as general-purpose tools to manipulate databases. Even though these programs can create their own file-based databases, they can also be used to query and display SQL data directly from the Data Warehouse. They often have a drag-and-drop type of interface in addition to the command line option.  The connections are typically through ODBC drivers.

Report generators. These tools would be appropriate for routine reports that display the same information each time.  Likely uses include paper reports for operations or management. Crystal Reports and ReportSmith are examples. Microsoft Access also has a report generator built in.  While a programmer is not necessary for the development of these kinds of reports, some technical expertise is required.  Usually only when a report is required routinely is it justified to spend the resources in generating a nice looking report.

Custom programs. Custom programs are appropriate when there are routine tasks or calculations that are clearly defined. Any of the mainstream programming platforms used for PCs and web development can be used to write programs that access SQL data. The open architecture means that customers have a large pool of programmers to draw from, including experts in Visual Basic, Delphi, C++, JAVA and others.

Statistics Packages. Analysts that are accustomed to downloading data sets and importing into their package should not do so on the large Data Warehouse tables.  Fortunately, all mainstream packages (e.g. SPSS, SAS) now have an ODBC driver that allows for the direct connection to the SQL server, eliminating the need to download and import.

DTS.  Microsoft's Data Transformation Services that come with Microsoft SQL Server offers a wide range of tools to manage data, databases, and metadata.  There is also facility to create, automate, and schedule jobs.

OLAP. This technology is different in that it requires both a distinct type of database (cube) and uses a specialized front end tool (OLAP browser) to display values in that cube.  The Data Warehouse uses Microsoft Analysis Services to generate the cubes and uses ProClarity as the browser.  This technology is used to quickly display predefined measures (e.g. patient bed days of care) along any of several dimensions (e.g. location or time). Each dimension can have a hierarchy (e.g. ward/facility/VISN or daily/monthly/yearly). The capabilities for graphing and display are excellent. The development of cubes typically requires both technical expertise/resources and subject matter expertise. The development overhead should be weighed against the economies of scale; if many people need to view the data or if a few people need to view the data repeatedly it may be worth the cost.

Web Browsers. The strength of web browsers is their ubiquity. Data that can be displayed simply and that is frequently updated is a good candidate to be published to a web site for browsing. Using browsers eliminates having to create a mechanism (application) for customers to access the information. The Data Warehouse web site publishes a wide range of information and metadata.  There is also an extensive web application called Advanced Health Management (AHM) that displays utilization/cost for many different patient cohorts and individuals.  Cohorts are defined by high utilizers, disease conditions, primary care and other criteria.

* Links will take you outside of the Department of Veterans Affairs Website. VA does not endorse and is not responsible for the content of the linked websites. The link will open in a new window.