Data Warehouses
The basic reasons organizations implement data warehouses are: To
perform
server/disk bound tasks associated with querying and reporting on
servers/disks
not used by transaction processing systems most firms want to
set up transaction
processing systems so there is a high probability that
transactions will be
completed in what is judged to be an acceptable amount
of time. Reports and
queries, which can require a much greater range of
limited server/disk resources
than transaction processing, run on the
servers/disks used by transaction
processing systems can lower the
probability that transactions complete in an
acceptable amount of time. Or,
running queries and reports, with their variable
resource requirements, on
the servers/disks used by transaction processing
systems can make it quite
complex to manage servers/disks so there is a high
enough probability that
acceptable response time can be achieved. Firms
therefore may find that the
least expensive and/or most organizationally
expeditious way to obtain high
probability of acceptable transaction processing
response time is to
implement a data warehousing architecture that uses separate
servers/disks
for some querying and reporting. To use data models and/or
server
technologies that speed up querying and reporting and that are not
appropriate
for transaction processing There are ways of modeling data that
usually speed up
querying and reporting (e.g., a star schema) and may not be
appropriate for
transaction processing because the modeling technique will
slow down and
complicate transaction processing. Also, there are server
technologies that that
may speed up query and reporting processing but may
slow down transaction
processing (e.g., bit-mapped indexing) and server
technologies that may speed up
transaction processing but slow down query and
report processing (e.g.,
technology for transaction recovery.) - Do note that
whether and by how much a
modeling technique or server technology is a help
or hindrance to
querying/reporting and transaction processing varies across
vendors' products
and according to the situation in which the technique or
technology is used. To
provide an environment where a relatively small amount
of knowledge of the
technical aspects of database technology is required to
write and maintain
queries and reports and/or to provide a means to speed up
the writing and
maintaining of queries and reports by technical personnel
Often a data warehouse
can be set up so that simpler queries and reports can
be written by less
technically knowledgeable personnel. Nevertheless, less
technically
knowledgeable personnel often "hit a complexity wall" and need
IS
help. IS, however, may also be able to more quickly write and maintain
queries
and reports written against data warehouse data. It should be noted,
however,
that much of the improved IS productivity probably comes from the
lack of
bureaucracy usually associated with establishing reports and queries
in the data
warehouse. To provide a repository of "cleaned up"
transaction
processing systems data that can be reported against and that
does not
necessarily require fixing the transaction processing systems the
data warehouse
provides an opportunity to clean up the data without changing
the transaction
processing systems. Some data warehousing completions provide
a means to capture
corrections made to the data warehouse data and feed the
corrections back into
transaction processing systems. Sometimes it makes more
sense to handle
corrections this way than to apply changes directly to the
transaction
processing system. To make it easier, on a regular basis, to
query and report
data from multiple transaction processing systems and/or
from external data
sources an/or from data that must be stored for
query/report purposes only For a
long time firms that need reports with data
from multiple systems have been
writing data extracts and then running
sort/merge logic to combine the extracted
data and then running reports
against the sort/merged data. In many cases this
is a perfectly adequate
strategy. However, if a company has large amounts of
data that need to be
sort/merged frequently, if data purged from transaction
processing systems
needs to be reported upon, and most importantly, if the data
need to be
"cleaned", data warehousing may be appropriate. To provide
a repository of
transaction processing system data that contains data from a
longer span of
time than can efficiently be held in a transaction processing
system and/or
to be able to generate reports "as was" as of a previous
point in time Older
data are often purged from transaction processing systems so
the expected
response time can be better controlled. For querying and reporting,
this
purged data and the current data may be stored in the data warehouse
where
there presumably is less of a need to control expected response time or
the
expected response time is at a much higher level. - As for "as
was"
reporting, some times it is difficult, if not impossible, to generate a
report
based on some characteristic at a previous point in time. For example,
if you
want a report of the salaries of employees at grade Level 3 as of the
beginning
of each month in 1997, you may not be able to do this because you
only have a
record of current employee grade level. To be able to handle this
type of
reporting problem, firms may implement data warehouses that handle
what is
called the "slowly changing dimension" issue. To prevent persons
who
only need to query and report transaction processing system data from
having any
access whatsoever to transaction processing system databases and
logic used to
maintain those databases The concern here is security. For
example, data
warehousing may be interesting to firms that want to allow
report and querying
only over the Internet. Some firms implement data
warehousing for all the
reasons cited. Some firm implement data warehousing
for only one of the reasons
cited. I do believe that the achievement of a
"business" objective for
a data warehouse necessarily comes about because of
the achievement of one or
many of the above objectives. If you examine the
list you may be struck that
need for data warehousing is mainly caused by the
limitations of transaction
processing systems. These limitations of
transaction processing systems are not,
however, inherent. That is, the
limitations will not be in every implementation
of a transaction processing
system. Also, the limitations of transaction
processing systems will vary in
how crippling they are. Finally, to repeat the
point I made initially, a firm
that expects to get business intelligence, better
decision making, closeness
to its customers, and competitive advantage simply by
plopping down a data
warehouse is in for a surprise. Obtaining these next order
benefits requires
firms to figure out, usually by trial and error, how to change
business
practices to best use the data warehouse and then to change their
business
practices. And that can be harder than implementing a data warehouse.