time. "Enterprise architecture may seem to be a tool for IT,"  but, "in reality, it is a tool to be used by IT executives to support the business owners." For an enterprise architecture exercise to succeed, business constituents must be involved from the beginning, because their priorities must drive the underlying initiative.

\r\n

 

', 'Enterprise Architecture', 'Ram S', '-', '', 'ERP', 'Enterprise Architecture', '2004-08-12 00:00:00'); INSERT INTO `articles` VALUES (179, '

Linux has proved its mettle in relatively commoditized but increasingly important uses, such as Web infrastructure and file/print/network services. Now it''s moving toward prime-time status for mission-critical applications. For more and more CIOs, the question of whether to deploy Linux has been resolved, thanks to widespread agreement on its reliability, cost efficiency and solid support from a growing list of top vendors. Instead, CIOs are now turning their attention to the nitty-gritty question of how and where the software fits into their IT programs. "Particularly for early adopters who are comfortable with new technologies in general, it has definitely become a strategic asset. Of course, few if any companies see technology products as strategies unto themselves, and Linux is no exception.

\r\n

Linux does have some clear benefits, such as helping us meet the business objective of reducing overall IT expense, but Linux is only one vehicle for doing that.

\r\n

Certainly, Linux has its limitations: Most agree that its performance still doesn''t scale particularly well, especially compared to Unix, its chief competitor for large-scale infrastructure needs. And, of course, there''s the reality of Linux''s relative dearth of enterprise business applications, compared with Windows.

\r\n

But those concerns are diminishing. Its benefit is clear: significant cost savings over the Unix-based architecture he was using for data-tabulation applications.

\r\n

Open-source products, such as Linux, MySQL, the Apache Web server and the Perl programming language, have long spawned passionate debates about their various technological pros and cons compared with their counterparts in Windows, Unix and proprietary software environments. But for CIOs, the inevitable weighing of technology advantages today centers primarily on two issues: security and scalability.

\r\n

Unquestionably, commercial Linux software has yet to suffer the same crippling impact of worms, viruses and denial-of-service attacks endured by the Windows camp; there have yet to be high-profile equivalents of viruses like Blaster and SoBig in the Linux world. But is that because Linux is, by virtue of its code or the open-source development process, inherently more secure than other operating systems? Or is it simply because Linux''s smaller market share makes it a less inviting target for attacks?

\r\n

But while the source of Linux''s supposed security advantage over other operating systems continues to be a subject of debate, there''s been little doubt about the historic scalability limitations of Linux. Most agree that Linux''s strength has been in low-end Intel-based server environments, often topping out at 4- or 8-processor server configurations used for Web servers and simple file/print servers. That has limited the software''s appeal to tactical situations, rather than the mission-critical data-center requirements needed to support up to 64-processor servers used for number-crunching applications such as energy exploration or sophisticated financial modeling.

', 'Linux: Moving towards Open Source', 'Kate Peale', '-', '', 'Application Development', 'linux, open source', '2004-08-12 00:00:00'); INSERT INTO `articles` VALUES (180, '

The transaction-oriented system structures data in a way that optimizes the processing of transactions. These systems typically deal with many users accessing a few records at a time. For reasons too numerous to discuss here, minimizing record and table size improves overall system performance. System architects normalize transaction databases to structure the database in an optimal way. Although a complete discussion of normalization is outside the scope of this book, it is sufficient to note that data pertaining to a specific subject is distributed across multiple tables within the database. For example, an employee works in a department that is part of a division. The employee, department, and division information is all related to the subject employee. Yet, that data will be stored in separate tables.

\r\n

Operational data is distributed across multiple applications as well as tables within an application. A particular subject may be involved in different types of transactions. A customer appearing in the accounts receivable system may also be a supplier appearing in the accounts payable system. Each system has only part of the customer data. We are back to the blind men and the elephant. Nowhere is there a single consolidated view of the one organization.

\r\n

Considering the way in which the decision maker uses the data, this structure is very cumbersome. First, the decision maker is interested in the behavior of business subjects. To get a complete picture of any one subject, the strategist would have to access many tables within many applications. The problem is even more complex. The strategist is not interested in one occurrence of a subject or an individual customer, but in all occurrences of a subject and all customers. As one can easily see, retrieving this data in real time from many disparate systems would be impractical.

\r\n

The warehouse, therefore, gathers all of this data into one place. The structure of the data is such that all the data for a particular subject is contained within one table. In this way, the strategist can retrieve all the data pertaining to a particular subject from one location within the data warehouse. This greatly facilitates the analysis process, as we shall see later. The task of associating subjects with actions to determine behaviors is much simpler.

', 'Data Warehousing and Data Orientation', 'Samantha C', '-', '', 'Data Processing', 'data warehousing, data orientation', '2004-08-12 00:00:00'); INSERT INTO `articles` VALUES (181, '

Data cleansing is the process of removing errors from the input stream and is part of the integration process. It is perhaps one of the most critical steps in the data warehouse. If the cleansing process is faulty, the best thing that could happen is that the decision maker will not trust the data and the warehouse will fail. If that''s the best thing, what could be worse? The worst thing is that the warehouse could provide bad information and the strategist could trust it. This could mean the development of a corporate strategy that fails. The stakes are indeed high.

\r\n

A good cleansing process, however, can improve the quality of not only the data within the warehouse, but the operational environment as well. The extraction log records errors detected in the data cleansing process. The data administrator in turn examines this log to determine the source of the errors. At times, the data administrator will detect errors that originated in the operational environment. Some of these errors could be due to a problem with the application or something as simple as incorrect data entry. In either case, the data administrator should report these errors to those responsible for operational data quality. Some errors will be due to problems with the metadata. Perhaps the cleansing process did not receive a change to the metadata. Perhaps the metadata for the cleansing process was incorrect or incomplete. The data administrator must determine the source of this error and take corrective action. In this way, the data warehouse can be seen as improving the quality of the data throughout the entire organization.

\r\n

There is some debate as to the appropriate action for the cleansing process to take when errors are detected in the input data stream. Some purists feel the warehouse should not incorporate records with errors. The errors in this case should be reported to the operational environment, where they will be corrected and then resubmitted to the warehouse. Others feel that the records should be corrected whenever possible and incorporated into the warehouse. Errors are still reported to the operational environment, but it is the responsibility of those maintaining the operational systems to take corrective action. The concern is making sure that the data in the warehouse reflects what is seen in the operational environment. A disagreement between the two environments could lead to a lack of confidence in the warehouse.

\r\n

The cleansing process cannot detect all errors. Some errors are simple and honest typographical mistakes. There are errors in the data that are more nefarious and will challenge the data administrator. For example, one system required the entry of the client''s SIC code for every transaction. The sales representatives did not really care and found two or three codes that would be acceptable to the system. They entered these standby codes into the transaction system whenever the correct code was not readily available. These codes were then loaded into the data warehouse during the extraction. While there are many tools available on the market to assist in cleansing the data as it comes into the warehouse, errors such as these make it clear that no software product can get them all.

\r\n

Data cleansing is the child of the data administrator. This is an essential position on the data warehouse team. The data administrator must take a proactive role in routing out errors in the data. While there is no one component that will guarantee the success of a data warehouse, there are some that will ensure its failure. A poor cleansing process or a torpid data administrator is definitely a key to failure.

', 'Data Cleansing', 'Lothar B', '-', '', 'Database Management', 'data cleansing', '2004-08-12 00:00:00'); INSERT INTO `articles` VALUES (182, '

There are two basic types of data mining: classification and estimation. With classification, objects are segmented into different classes. In a marketing data warehouse, for example, we could look at our customers and prospects and categorize them into desirable and undesirable customers based on certain demographic parameters. The second type of data mining, estimation, attempts to predict or estimate some numerical value based on a subject''s characteristics. Perhaps the decision maker is interested in more than just desirable and undesirable customers. The strategist may be interested in predicting the potential revenue stream from prospects based on the customer demographics. Such a prediction might be that certain types of prospects and customers can be expected to spend x percentage of their income on a particular product. It is common to use both classification and estimation in conjunction with one another. Perhaps the strategist would perform some classification of customers and then perform estimations for each of the different categories.

\r\n

Whether performing a classification or an estimation, the process of data mining is basically the same. We begin with the data, or more appropriately, a subset of the data. This is our test data. The size of the data set is dependent on the deviation of characteristics of the data. In other words, if there are relatively few variables whose values do not greatly deviate from one another, then we can test on a small number of records. If the data has many variables with many possible values, then the test data is much larger. As with the data warehouse, the data is cleansed and merged into one database. If we are working directly from a data warehouse, we would expect this process to have already been carried out. This does not mean that we assume the data is cleansed and transformed. The data quality must still be verified to ensure accurate results.

\r\n

We then define the questions that are to be posed of the data. Despite the common misconception of data mining, the strategist must be able to define some goal for the mining process. Perhaps we would like to segment our market by customer demographics or we would like to know the market potential of certain economic groups. In either case, we need to specify what we want to discover.

\r\n

Using the test data, we construct a model that defines the associations in which we are interested. We have known results in the test data set. We know that certain records in the data set represent desirable or undesirable customers, or we know the market potential of a set of clients. The model will look for similarities in the data for those objects with similar results. Once we have built the model, we train it against subsequent test data sets. When we are confident in the model, we train it against the actual data we wish to mine. At times the model will not include some records that should be included, or it will include records that it should not. In either case, there will be some level of inaccuracy in the data model. No model can predict with perfect accuracy, so we should expect some margin of error. The models come in a variety of types.

', 'Data Mining - Classification and Estimation', 'JM', '-', '', 'Business Intelligence', 'data mining', '2004-08-12 00:00:00'); INSERT INTO `articles` VALUES (183, '

Years ago, your ERP options were straightforward: Either you brought in a large-scale service provider to manage your entire shop, or you did it yourself. But today''s range of offerings can be as confusing as a long restaurant menu written in a language you don''t understand. You''ll find it easier to analyze an outsourcer''s offerings by matching them up to the characteristics of your ERP infrastructure, as measured along five axes:

\r\n

\r\n

Legacy or newer models? How deep are your ties to your existing ERP systems? Are you still running custom enterprise applications built years ago from the ground up, such as process control manufacturing software or drug research management? Have you switched to client-server? Or are you running so-called "Web-native" applications that are fully Web-based in the server and on the desktop? Complex legacy systems are more difficult to offload to a service provi