'A vital book for all IS professionals (from business analysts to web developers) who need to understand the effective management of that critical resource, information.'
Tony Jenkins, BSc, MSc, PGCE, CITP, CEng, Past Chair, BCS Data Management Specialist Group --
'I've used and recommended the first edition of Keith's comprehensive text for several years. I've found that both practitioners and students are able to easily make use of these important concepts. I'm very pleased with the expanded and updated treatments in the second edition.'
Peter Aiken PhD, Founding Director, Data Blueprint/VCU --
'Keith Gordon has done an excellent job of laying out the full set of dimensions to be addressed for the effective management of an organization's information.'
David Hay, President, Essential Strategies, Inc.
|Publisher:||BCS Learning & Development Limited|
|Edition description:||2nd Revised ed.|
|Product dimensions:||6.75(w) x 9.62(h) x (d)|
About the Author
Keith Gordon is an independent consultant and trainer specialising in data management issues. He has spent over 50 years in technical, education and training environments as an engineer, computer consultant, business analyst, education and training manager. He was also an associate lecturer with the Open University for 10 years.
Read an Excerpt
DATA AND THE ENTERPRISE
This chapter introduces the concepts of information and data and discusses why they are important business resources within the enterprise. We start to discuss some of the problems caused by data which is of poor quality or inconsistent, or both.
INFORMATION IS A KEY BUSINESS RESOURCE
When asked to identify the key resources in any business, most business people will readily name money, people, buildings and equipment. This is because these are the resources that senior business managers spend most time managing. This means that in most businesses there is a clear investment by the business in the management of these resources. The fact that these resources are easy to manage and that the management processes applied to these resources can be readily understood by the layman means that it is seen to be worthwhile investing in their management. It is usually easy to assess how much the business spends on managing these resources and the return that is expected from that investment.
But there is a key resource missing from that list. That missing resource is 'information'. Without information, the business cannot function. Indeed, it could be said that the only resource that is readily available to senior management is information. All important decisions made within an enterprise are based on the information that is available to the managers.
Despite its importance, most business people do not recognise information as a key business resource. Because of its association with technology (with 'information technology' having become in effect one word, generally with more emphasis on the 'technology' than on the 'information'), information is seen as something mystical that is managed on behalf of the business by the specialist information technology or information systems department. The management of information is seen, therefore, as something requiring special skills beyond the grasp of the layman. It is very difficult to determine how much the business spends on managing information or, indeed, the return it can expect from that expenditure.
Information is a business resource that is used in every aspect of a business: it supports the day-to-day operational tasks and activities; it enables the routine administration and management of the business; and it supports strategic decision making and future planning.
For a supermarket chain the operational tasks and activities include the processing of customers' purchases through the electronic point-of-sale system and the ordering of goods from suppliers; for a high street bank they include the handling of customers' cash and cheques by the cashiers, the processing of transactions through ATMs and the assessment of the credit status of a customer who is requesting a loan; for an online book 'store' they include the collection of customers' orders, the selection and dispatch of the books and the production of a customer profile enabling the 'store' to make recommendations to customers as they log on to the website.
For all types of business, information in various forms is routinely used by managers to monitor the efficiency and effectiveness of the business. Some of this information comes in the form of standard reports. Other information may come to the managers as a result of their ad hoc questions, perhaps directed to their subordinates but, increasingly, directed to the information systems that support the business.
All businesses need to plan for their future and take high-level strategic decisions. In some cases the consequence of making an incorrect strategic decision could be the ultimate collapse of the business. To carry out this future planning and strategic decision making, the senior management of the business relies on information about the historic performance of the business, the projected future performance of the business (and this, to a large extent, will be based on an extrapolation of the historic information into the future), its customers' present and future needs and the performance of its competitors. Information relating to the external environment, particularly the economy, is also important. For a supermarket chain these decisions may include whether to diversify into, say, clothing; for a high street bank they may include the closure of a large number of branches; and for an online book 'store' whether to open new operations overseas.
Information is important, therefore, at every level in the business. It is important that the information is managed and presented in a consistent, accurate, timely and easily understood way.
THE RELATIONSHIP BETWEEN INFORMATION AND DATA
Wisdom, knowledge, information and data are all closely related through being on the same continuum – from wisdom, to knowledge, then to information and, finally, to data. This book is about managing data to provide useful information so we will concentrate on the relationship between information and data.
An often-heard definition of information is that it is 'data placed in context'. This implies that some information is the result of the translation of some data using some processing activity, and some communication protocol, into an agreed format that is identifiable to the user. In other words, if data has some meaning attributed to it, it becomes information.
For example, what do the figures '190267' represent? Presented as '19/02/67' it would probably make sense to assume that they represent a date. Presented on a screen with other details of an employee of a company, such as name and address, in a field that is labelled 'Date of Birth' the meaning becomes obvious. Similarly, presented as '190267 metres', it immediately becomes obvious that this is a long distance between two places, but for this to really make sense the start point and the end point have to be specified as well as, perhaps, a number of intermediate points specifying the route.
While these examples demonstrate the relationship between data and information, they do not provide a clear definition of either data or information.
There are many definitions of data available in dictionaries and textbooks, but in essence most of these definitions basically say that data is 'facts, events, transactions and similar that have been recorded'. Furthermore, as I pointed out earlier, the definition of information is usually based on this definition of data. Information is seen as data in context or data that has been processed and communicated so that it can be used by its recipient.
The idea that data is a set of recorded facts is found in many books on computing. However, this concept of data as recorded facts is used beyond the computing and information systems communities. It is, for example, also the concept used by statisticians. Indeed, the definition of data given in Webster's 1828 Dictionary – published well before the introduction of computers – is 'things given, or admitted; quantities, principles or facts given, known, or admitted, by which to find things or results unknown'.
However, developing our definitions by looking at data first appears to be starting at the wrong point. It is information that is important to the business and it is there that our definitions, and our discussion about the relationship between information and data, should really begin.
We start by considering the everyday usage of information – something communicated to a person – and with that we can find a definition of data that is relevant to the theme of this book. That definition is found in ISO/IEC 2382-1, 1993 (Information Technology – Vocabulary – Part 1: Fundamental terms) and it states that data is 'a re-interpretable representation of information in a formalised manner suitable for communication, interpretation or processing'. There is a note attached to this definition in the ISO/IEC standard which states that data can be processed by human or automatic means; so this definition covers all forms of data but, importantly, includes data held in information systems used to support the activities of an organisation at all levels: operational, managerial and strategic.
Figure 1.1 provides an overview of the relationship between data and information in the context of a computerised information system. The user of the system extracts the required information from their overall knowledge and inputs the information into the system. As it enters the system it is converted into data so that it can be stored and processed. When another system user requires that information, the data is interpreted – that is, it has meaning applied to it – so that it can be of use to the user.
For most of this book we consider data stored in a database. This is often called 'structured data'. However, it must be understood that a considerable proportion of an organisation's information may be held in information systems as 'unstructured data' – in word-processed documents, drawings and so on.
THE IMPORTANCE OF THE QUALITY OF DATA
Since information is an important resource for any organisation, information presented to users must be of high quality. The information must be up to date, complete, sufficiently accurate for the purpose it is required, unambiguously understood, consistent and available when it is required.
It is essential that information is up to date. When customers buy their shopping at the supermarket they need to be charged the current price for the items they have bought, not the price that was current yesterday before the start of today's cut-price promotion. Similarly, managers reordering stock need to be aware of the current, not last week's, stock levels in order to ensure that they are not over- or under-stocked.
Only when the information available is complete can appropriate decisions be made. When a bank is considering a request for a loan from a customer, it is important that full details of the customer's financial position are known to safeguard both the bank's and the customer's interests.
Information on which important decisions are made must be accurate; any errors in the potential loan customer's financial information could lead to losses for the bank, for example. While it is important that information is accurate, it is possible for the information to be 'too accurate' or 'too precise', leading to the information being misinterpreted. Earlier I quoted '190267 metres' as the distance between two points, say London and Birmingham. But the figure '190267' implies that this distance has been measured to the nearest metre. Is this realistic? Would it be more appropriate to quote this figure as '190 kilometres (to the nearest 10 kilometres)'? I cannot answer that question without knowing why I need to know the distance between London and Birmingham. Information should be accurate, but only sufficiently precise for the purpose for which it is required.
To be accurate from a user perspective, information must also be unambiguously understood. There should be no doubt as to whether the distance the user is being given is the straight-line distance or the distance by road. The data should also be consistent. A query asking for the distance between London and Birmingham via a specified route should always come up with the same answer.
Information has to be readily available when and where it is required to be used. When it is time to reorder stock for the supermarket, the information required to decide the amount of replacement stock to be ordered has to be available on the desk of the manager making those decisions.
Information is derived from the processing of data. It is vital, therefore, that the data we process to provide the information is of good quality. Only with good-quality data can we guarantee the quality of the information. Good-quality data is data that is accurate, correct, consistent, complete and up to date. The meaning of the data must also be unambiguous.
THE COMMON PROBLEMS WITH DATA
Unfortunately, in many organisations there are some major, yet unrecognised or misunderstood, data problems. These problems are generally caused by a combination of the proliferation of duplicate, and often inconsistent, occurrences of data and the misinterpretation and misunderstanding of the data caused by the lack of a cohesive, enterprise-wide regime of data definition.
Whenever it is possible for any item of information to be held as data more than once, there is a possibility of inconsistency. For example, if the addresses of customers are held in more than one place – or, more specifically, in more than one information system – and a customer informs the company that they have changed their address, there is always the danger that only one instance of the address is amended, leaving the other instances showing the old incorrect address for that customer. This is quite a common scenario. Another scenario is where the marketing department and the finance department may have separate information systems: the marketing department has a system to help it track customers and potential customers while the finance department has a completely separate system to support its invoicing and payments received accounting functions. With information systems independently designed and developed to support individual business areas or specific business processes, the duplication of data, and the consequent likelihood of inconsistency, is commonplace. Unfortunately, in most organisations, the potential for inconsistency through the duplication of data is getting worse because of the move away from centralised mainframe systems, the proliferation of separate departmental information systems and the availability of personal desktop computing power, including the provision of spreadsheet and database software.
Even where it is understood that it would be to the advantage of the organisation for information to be shared between these separate systems, this is often impossible without there being the possibility of misinterpretation or misunderstanding of the information that is shared.
In its 1994 publication Corporate Data Modelling, the Central Computer and Telecommunications Agency – now part of the Office of Government Commerce – recognised that there are a number of possible reasons for sharing information. These are:
when central reference data is used by independent operational units, such as product codes and product prices;
when public domain datatypes are used and exchanged, for example when publicly available statistical data sets are to be used;
when operational results need to be collated across several profit centres, for example to collate or compare the sales figures from stores within a supermarket chain;
when the output from one system forms the input to another, for example the output of a forecasting system is used by another system to determine resource and budget implications;
when application systems performing a similar function for distinct autonomous units are required to harmonise their data to permit close collaboration, for example the command and control systems for the police, fire and ambulance services need to 'work together' in the event of an emergency.
The sharing of information between independently designed and developed information systems is technically straightforward. It is a relatively simple matter to electronically connect two or more information systems together using a network and then to transfer data between them. The difficulties come after the data has been transferred and the receiving information system cannot interpret the data or, worse still, interprets the received data according to its understanding of the meaning of the data, but this interpretation differs from that used in the originating system. This possibility of the misinterpretation of transferred data is very common in organisations and the situation is getting worse.
This is also a consequence of the proliferation of independently designed and developed departmental or single-function information systems. At the heart of an information system is a database whose purpose is to provide persistent storage of the data. Each of these databases is designed to ensure that the data is available when required by the applications supported by that information system and, possibly, to maintain the integrity of the data within that particular database. A database is designed to provide effective and efficient support to the business area or function that the information system is being designed to support by meeting the immediate data requirements for that business area or function as they are understood by the database designer. It is very rare for a wider view of current or future data requirements to be taken.(Continues…)
Excerpted from "Principles of Data Management"
Copyright © 2013 Keith Gordon.
Excerpted by permission of BCS The Chartered Institute for IT.
All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.
Table of Contents
List of figures and tables xi
Foreword to the first edition xv
PART 1: PRELIMINARIES
1. DATA AND THE ENTERPRISE
2. DATABASE DEVELOPMENT
3. WHAT IS DATA MANAGEMENT?
PART 2: DATA ADMINISTRATION
4. CORPORATE DATA MODELLING
5. DATA DEFINITION AND NAMING
7. DATA QUALITY
8. DATA ACCESSIBILITY
9. MASTER DATA MANAGEMENT
PART 3: DATABASE AND REPOSITORY ADMINISTRATION
10. DATABASE ADMINISTRATION
11. REPOSITORY ADMINISTRATION
PART 4: THE DATA MANAGEMENT ENVIRONMENT
12. THE USE OF PACKAGED APPLICATION SOFTWARE
13. DISTRIBUTED DATA AND DATABASES
14. BUSINESS INTELLIGENCE
15. OBJECT ORIENTATION
17. WEB TECHNOLOGY