DATA WAREHOUSING SUCCESS AND FAILURE

Senjie Bao                   Hanson Lee

Abstract

Data warehousing has been widely used in organizations to support management decision making. Successful data warehousing projects can bring valuable information to organizations; however, there are large portions of data warehousing projects failed. Previous research suggests several factors are crucial to a data warehousing project success. This paper examined several most commonly discussed critical success factors affecting two key dimensions of data warehousing success which are data quality and system quality. Based on the previous discussion on data warehousing success factors, this paper proposed a data warehousing success factors model to map the success factors to data warehousing success. Finally, this study finds that strong data governance together with other technical factors have a strong influence on data quality while system quality is affected by several types of success factors.

Keywords

Data warehousing, success factors, data quality, system quality.

INTRODUCTION

A data warehouse is an integrated data repository specially designed and created to support all levels of management decision making process (March and Hevner 2005). Its data can be extracted from various internal or external existing systems with different formats and transformed, cleaned, aggregated before stored in the data warehouse (Wixom and Watson 2001). Data warehouse was emerged as a platform to integrate data from different sources to support decision making (Shin 2003) and helps to improve business performance including discover the most favoured product, identify key customers for organization’s business, and improve operational efficiency (Cooper et al. 2000). It has been used to deliver useful information for managers for decision making since early 1990s (Shim et al. 2002) and become more and more crucial to businesses as the market is being more global and sophisticated, and customers are becoming more informed. Organizations in this increasingly competitive and volatile business context need to acquire more information to make right decisions. Thus, they have used data warehouses to help to conduct various tasks such as customer service and target marketing (Shin 2003). With the help from recent technologies such as cloud computing and Hadoop platform, the analytical capabilities of data warehousing have been significantly improved and provide real time analysis results for large volume of data. Data warehouse is changing the way organizations conduct business by providing information draw from data especially in sales and marketing (Wixom and Watson 2001).

Data warehouse can bring large benefits to organizations through effective business intelligence; however, having a data warehouse in house does not guarantee a success. The process of building a data warehouse is costly and risky (Jukic 2006). A survey reported that nearly two thirds of companies having a data warehouse rated their data warehouse successfully meeting business expectations (Stedman 1998). A more recent study shows the failure rate of a data warehouse project is around half to three quarters (Hwang and Xu 2008). The reason to this high failure rate varies from company to company, but previous research has found several common reasons. Watson and Haley (1997) found the data warehouses at that time were unable to provide users easy access to timely and high quality data. Watson, Gerard, Gonzalez, Haywood, and Fenton (1999) also found lack of top management support, sponsorship, and end user involvements are most common reasons for data warehouse project failure.

Similar to other information system projects, a number of factors can affect the success of a data warehouse project. Wixom and Watson (2001) developed a model to evaluate and measure several success factors identified by empirical studies. They analysed three key dimensions of data warehouse success namely data quality, system quality, and perceived net profits. Their analysis result shows data quality and system quality are associated with perceived net profits, and several factors contribute to system quality (Wixom and Watson 2001). However, they found the success factors they studied in their survey do not have significant association with data quality which suggests the factors affecting data quality are not among the factors they researched and further research should be conducted to discover factors that affecting data quality.

Based on their research, this paper examines key factors affecting two key dimensions of data warehouse success (data quality and system quality) from a wider range of factors in order to identify critical success factors for both dimensions. As Wixom and Watson’s research has clearly pointed out the association between data quality, system quality and net benefit, this paper will not focus on the relationship among them. Instead, this paper will open the success factors affecting data quality and system quality to a broader range and identify key factors for data quality which is not identified in Wixom and Watson’s paper based on literature research.

Data Quality

Data Quality Definition

Data quality is also named information quality by some researchers (Nelson et al. 2005). It is frequently discussed in previous data warehouse academic papers and also the fundamental of building a valuable data warehouse (Watson and Haley 1997). Mahanti (2014) defines data quality as the capability of the data available in the data warehouse to meet the business requirements which can be explained as a fitness between data and the given business context. Nelson, Todd, and Wixom (2005) summarised two views on data quality: an intrinsic view which considers data quality on its value itself and a context-based view which considers data quality on its degree on helping end user to complete a task. Therefore, data quality is closely associated with end user net benefits and the ability that system can improve the user’s work performance (Shin 2003).

Data Quality Dimensions

Researchers have developed a few metrics to measure data quality from both intrinsic and contextual view (Nelson et al. 2005). Mahanti (2014) raised 7 dimensions to measure data quality including completeness, conformity, consistency, accuracy, duplication, integrity, and timeliness. However, based on Wang and Strong’s (1996) research, the dimensions can be reduced to few key determinant dimensions such as accuracy, completeness, timely, and format (Nelson et al. 2005). For each of them, the assessment should be tied to specific context that the end user will use the data to perform tasks.

  • Accuracy

The correctness to map the real world information to the information stored in the data warehouse. The information provided by the data warehouse should be correct, meaningful, consistent, clear, objective and believable.

  • Completeness

Completeness is the extent of all possible states of real world objectives are captured and stored in the data warehouse. The assessment of this dimension should be based on the information user. The data should be considered as complete as long as the information is enough to support the user’s decision making.

  • Timely

Timely refers to the information stored in the data warehouse should be up to date or precisely reflects the real world. Its assessment should be based on different purpose and highly depend on the task and user perceptions.

  • Format

Format means the way of presenting the information is understandable and interpretable. Research results show that the way of presenting information is highly contingent on user’s mental model and the task (Vessey 1991). Therefore, its assessment should be based on the perception of user for different tasks.

Success Factors for Data Quality

Wixom and Watson (2001) build a research model to discover 7 critical success factors’ influence on data warehousing success and how these factors affecting data warehousing success through organizational implementation success, project implementation success, and technical implementation success. These factors are identified by previous research including management support, champion, resources, user participation, team skills, source systems, and development technology. However, through the regression analysis on data collected from 111 organizations, they found the R2 value for the factors covered in their research model was 0.016 which mean these factors are not crucial to data quality.

Chenoweth, Corral and Demirkan (2006) provide a new insight into data warehouse success. They provided a case study on a large organization which implemented a data warehouse with some successful units and some failed units and identified 7 key interventions for data warehouse success based on adaptive structuration theory. However, each of the interventions identified in their research can be linked to a success factor in other success factor literature. The seven interventions are: top management support, user championship, data completeness, user accessibility, functional fit, training & education, resources support.

Mahanti (2014) summarised 10 most commonly discussed critical success factors with 35 attributes of those factors for data quality derived from both academic and practitioners. The 10 critical success factors are: leadership and top management commitment, organizational infrastructure, culture change, training and education, strong data governance, teamwork, business user involvement, use of data profiling tools, project prioritization and selection, and documentation. The survey result shows the top 5 factors are: strong data governance, teamwork, culture change, documentation, and organizational infrastructure (Mahanti 2014). This result is reasonable as strong data governance can ensure the data warehouse aligned to the business context and business requirements.

Hwang and Xu (2008) group previous identified success factors into 4 categories as operational, technical, schedule, and economic factors. They found that higher quality data can provide higher quality information and bring more benefits to user. Finally, the users’ individual benefits can bring benefits to the organization. Their survey result shows the technical factors are more influential on data quality and no association on system quality. This result does not consistent with Wixom and Waston’s (2001) research. Hwang and Xu (2008) believed this is due to the actual factors included in this category.

System Quality

System Quality Definition

Beside data quality, system quality is another critical success factor to achieve net benefits of data warehousing (Wixom & Watson 2001). System quality is associated with the performance of the systems that produce information. How system quality is defined and measured is a question. It is stated in a literature that system quality can be defined and measured through operational measures of ease of use (Rai et al. 2002). Similarly, Davis (1989) addressed that the factors of system quality are closely related with user perceptions of interaction with the system and therefore high quality systems should be perceived as easier to use.

System Quality Dimensions

Nelson, Todd and Wixom (2005) studied critical success factors for data warehousing in a more system perspective. They started from an assumption that there are unique system dimensions that acts as determinants to system quality. The study therefore focused on 5 key system dimensions that are assumed to indirectly influence user satisfaction with the system quality in regards to usage of three key business intelligence tools (analysis, query and predefined report). The 5 Key dimensions that were analysed in this study are:

  • Accessibility – The degree to which a system and the information can be accessed with relatively low effort.
  • Reliability – The dependability of a system over time.
  • Response time – The degree to which a system offers quick (or timely) responses to requests for information or action.
  • Flexibility – The degree to which a system can adapt to a variety of user needs and to changing conditions.
  • Integration – The degree to which a system facilitates the combination of information from various sources to support business decisions.

Throughout their study, it was found that reliability is a key system dimension and is the most influential determinant to system quality. Accessibility and flexibility are, in the same magnitude, the next influential determinants. Integration also appears to have an influence across the three tools, as one of the main purposes of data warehousing is to integrate data sourced from various systems. However, it is appeared that response time was not significant for the three tools. It is interpreted that response time is not a critical determinant to the user satisfaction with system quality, as data warehouses studied in this paper were not typically used for an ongoing, real-time information process.

Success Factors for System Quality

Wixom and Watson (2001) studied critical success factors of data warehousing by analysing several key factors known to be related to three key aspects of implementation success that are assumed to influence quality of data warehouse. The key factors and three aspects of implementation success analysed by Wixom and Watson are:

  • Organisational implementation success – Management support, Champion, Resource, User participation
  • Project implementation success – Champion, Resources, user participation, team skills
  • Technical implementation success – Team Skills, Source Systems, and Development technology

Wixom and Watson (2001) found that not every aspects of implementation success have significant effects on system quality. Their study suggested that only organisational and project implementation aspects have significant effects on system quality while technical implementation aspect does not. It is interpreted that it will be much easier for an organisation to build a flexible and integrated data warehouse when organisational issues are effectively removed and a well-managed team is responsible for the project.

Hwang and Xu’s academic research paper (2008) on critical success factors for data warehousing is similar to Wixom and Watson’s study (2001) discussed above in that it focused on similar factors although their approach was slightly different. They analysed factors by grouping them into four key aspects. The 4 aspects of the key factors analysed in this paper are:

  • Operational factor – Clearly defined business needs, top management support, user participation
  • Technical factor – Source data quality, proper development technology, adequate IS staff and consultant, project management
  • Schedule factor – Practical implementation scheduling, proper planning and scoping project
  • Economic factor – Adequate funding, measurable business benefits

Although Hwang and Xu’s study (2008) focused on similar factors to Wixom and Watson’s study (2001), the result turned out to be different. It was concluded that operational and economic factors have significant influences on system quality, as those factors are enablers for achieving quality data through data warehouse. The influence of technical factor was not significant in system quality while interestingly was it in data quality. One of the key findings of Hwang and Xu’s study (2008) was that system quality is a factor that has a positive influence on information quality while, in Wixom and Watson’s study (2001), such a cross effect between data quality and system quality was not deeply nor significantly supported.

Key findings & Implications

Based on our investigation, we combined the results of the academic research papers we studied and created a critical success factor model as shown in the Figure 1. Then, we have identified four key findings in relation to key factors and the relationships of those with quality of data warehouse.

Figure1: Data Warehousing Critical Success Factor Model
Figure 1: Data Warehousing Critical Success Factor Model

The first key finding is that, given that data quality and system quality are the primary factors of the success of data warehousing, the key factors that affect system quality and information quality are quite different. Wixom and Watson’s study (2001) suggested that the three key aspects of implementation success have significant associations with system quality only. Other academic research papers we studied such as Hwang and Xu’s study (2008) also suggested that operational

and economic factors are only significantly associated with system quality while technical quality is associated only with information quality. The figure 1 mapped those relationships to show how each of the key factors are related with data quality and system quality of data warehouse.

Secondly, we found that technical factors do have more significant influence on data quality than the system quality, while organisational and operational factors do have more significant influence on system quality. The academic research papers we investigated had a hypothesis that technical factors are associated with system quality as system quality should be determined by technical performance. However, the results of the papers suggested that technical factors actually do not have strong relationship with system quality. Rather, organisational and operational factors such as management support and governance are significantly related with system quality. The technical factors are more associated with information quality than system quality because the most significant measure of data quality such as accessibility, completeness, timeliness and format are the aspects that can be significantly affected by technical issues, especially in transformation process of the source data.

The third point is that the success of data warehousing is determined by user perception and satisfaction of interaction with the data warehouse, and therefore the types of critical success factors may vary in dependence upon users’ circumstances and their purposes of use the data warehouse. The study of Nelson, Todd and Wixom (2005) showed that the degree of the association between critical success factors and the quality of data warehouse varies in difference of their usage of data warehouse, such as analysis, query and predefined report.

Finally, it is found that system quality has a strong influence on data quality and data quality is the most important factor of individual benefits which lead to organisational benefits. The result of Hwang and Xu’s study (2008) suggested that system quality has a significant association with information quality. It makes sense because data is the main product of the data warehouse system. This is an important point as it indicates that the success of the data warehousing can be achieved only when the basic benefits of it are realised.

Limitations

The model we build in this study is based on previous research literatures. Some views from those data warehousing literatures are not consistent with each other. For example, Wixom and Watson (2001) found technical factors do not affect data quality while Hwang and Xu (2008) found technical factors have significant influence on data quality. This controversial result may due to different research methodologies adopted by academics as mentioned in Hwang and Hu’s (2008) study that the classification of technical factors is different. Other reason may due to the different business context has changed over time. Our study makes adjustments on those conflicting results to build the research model; however, due to limited resource and time, this model has not been verified by quantitative analysis. Thus, further research should be conducted to verify the success factors identified in our model.

Conclusion

This paper studied and discussed critical success factors of data warehousing by investigating factors that were identified and analysed to find association with the success of data warehousing in several academic research papers. The primary goal of this study was to understand relationships between the keys factors and the success of data warehousing. As suggested in many research papers, data quality and system quality are the primary factors to the success of data warehousing. Therefore we initially focused on finding key factors that have strong influences on data quality and system quality. Then we focused on finding relationships between the factors and the quality of data and systems. Although each of the academic research papers has slightly different discussions and conclusions, we analysed the relationships by combining common aspects of those papers, and have come out with some key findings as discussed in the above section, which we believe will be useful to understand how and why data warehousing can succeed and fail.

REFERENCES

Chenoweth, T., Corral, K. and Demirkan, H. 2006, “Seven Key Interventions for Data Warehouse Success,” Communications of ACM (49:1), January, pp 114-119.

Cooper, B.L., Watson, H.J., Wixom, B.H., and Goodhue, D.L. 2000. “Data Warehousing Supports Corporate Strategy at First American Corporation,” MIS Quarterly (24:4), pp 547-567.

Davis, F.D. 1989. “Perceived Usefulness, Perceived Ease of Use, and User Acceptance of Information Technology,” MIS Quarterly (13:3), September, pp 319-340.

Hwang, M.I., and Xu, H. 2008. “A Structural Model of Data Warehousing Success,” Journal of computer information systems, Fall, pp 48-56.

Jukic, 2006. “Modeling Strategies and Alternatives for Data Warehousing Projects,” Communications of the ACM (49:4), pp 83-99.

Mahanti, R. 2014. “Critical Success Factors for Implementing Data Profiling: The First Step Toward Data Quality,” Software Quality Professional (16:2), March, pp 13-26.

March, S.T., and Hevner, A.R. 2005. “Integrated Decision Support System: A Data Warehousing Perspective,” Decision Support Systems, vol. 43, pp 1031-1043.

Nelson, R.R., Todd, P.A., and Wixom, B.H. 2005. “Antecedents of Information and System Quality: An Empirical Examination Within the Context of Data Warehousing,” Journal of Management Information Systems (21:4), pp 199-235.

Rai, A., Lang, S.S., Welker, R.B. 2002, “Assessing the Validity of IS Success Models: An Empirical Test and Theoretical Analysis,” Information Systems Research (13:1), pp 50-69.

Shim, J.P., Warkentin, M., Courtney, J.F., Power, D.J., Sharda, R., and Carlsson, C. 2002. “Past, Present, and Future of Decision Support Technology,” Decision Support Systems (33:2), pp 111-126.

Shin, B. 2003. “An Exploratory Investigation of System Success Factors in Data Warehousing,” Journal of the Association for Information Systems, Vol. 4, pp 141-170.

Stedman, C. 1998. “Warehousing Projects Hard to Finish,” Computerworld (32:12), p 29.

Vessey, I. 1991. “Cognitive Fit: A Theory-based Analysis of The Graphs Versus Tables Literature,” Decision Sciences (22: 2), pp 219-240.

Wang, R.Y., and Strong, D.M. 1996. “Beyond accuracy: What Data Quality Means to Data Consumers,” Journal of Management Information Systems (12:4), Spring, pp 5-34.

Watson, H.J., and Haley, B.J. 1997. “Data Warehousing: A Framework and Survey of Current Practices,” Journal of Data Warehousing (2:1), pp 10-17.

Watson, H.J., Gerard, J.G., Gonzalez, L.E., Haywood, M.E., and Fenton, D. 1999. “Data Warehousing Failures: Case Studies and Findings,” Journal of Data Warehousing (4:1), pp. 44-55.

Wixom, B.H., and Watson, H.J. 2001. “An Empirical Investigation of The Factors Affecting Data Warehousing Success,” MIS Quarterly (25:1), March, pp 17-41.