TPC photo
The TPC defines transaction processing and database benchmarks and delivers trusted results to the industry
    Document Search         Member Login    
     Home
About the TPC
Benchmarks
      Newsletter
      Join the TPC
      Downloads
      Technical Articles
      TPCTC



What is the TPC Good For?
or, the Top Ten Reasons in Favor of TPC Benchmarks

by Gary Burgess, Ideas International

The TPC is presently in a transition phase. The General Implementation Guidelines have been implemented in all three current benchmarks in order to address concerns raised about benchmark specials. The TPC-A and B benchmarks are into the first phase of their ultimate obsolescence. It is hoped that the TPC-C benchmark will fill the breach left by TPC-A and TPC-B, but the initial take up of results has been slower than expected. The TPC's next benchmark, TPC-D, is in its final review stages before being officially sanctioned, and is being eagerly awaited by many. All of this change brings with it uncertainty, and in some cases people are questioning the worth of TPC testing in general.

Inevitably, one hears statements to the effect that TPC benchmarks are a waste of time, and that they are of no value. Often such opinions get high profile press coverage thus promoting this perception throughout the general computing community.

It is so easy to get carried away with the perceived negatives, not seeing the forest for the trees, as it were. But we at IDEAS International believe that while the TPC faces its fair share of challenges, we also believe current TPC Benchmarks do have value. We have compiled a list of 10 good reasons why.

  1. Provides Cross Platform Performance Comparisons
    This point is where most of the criticism is aimed at TPC benchmarks. Claims of unrealistic workloads and benchmark specials have been hotly debated. So how could it be that we have listed this point at all, let alone at the number one position?

    It is all a matter of degree. If one wishes to use TPC Results as a basis for predicting performance in a user's own environment then we agree there are some issues. The configuration complexity of the systems being tested in TPC Benchmarks make such specific analysis virtually impossible.

    Performance can be influenced by so many factors: the basic configuration model, database, disk subsystem, memory subsystem, even the way the application is utilized. Thus the results of any benchmark, not just TPC, no matter how real world, could not be confidently translated to a user's environment.

    If one recognizes that TPC benchmarks, and other industry standard benchmarks for that matter, are unlikely to provide a view of relative performance within a very fine tolerance, and instead use the data from a more global perspective, the influences of optimization become less significant. We therefore believe if the data is used in this way, that TPC results are a useful approximate, rather than definitive, guide to relative cross platform performance.

    Configuring systems in response to a tender from a potential customer is an inexact science. One only needs to look at different responses from computer systems vendors to invitations, or requests to tender. One vendor may propose a system of very different performance potential to that proposed by another vendor. There may be very good reasons for this, but it may not always be apparent. TPC results may offer a relatively inexpensive and independent positioning fix. For example, if one of two systems proposed for a tender recorded double the TPC performance of another, this would act as a catalyst to find out why possibly two disparate performing systems have been proposed for the same workload. The TPC results would not provide the answer, but would act as a trigger to resolve the issue.

    Without TPC results it would not be so easy to even gain a rough feel for how disparate architecture systems may compare under any workload.

  2. The Cost of Performance
    Performance, be it a computer or a car, comes at a price. Across any given system's performance spectrum, at the lower end of the scale a low cost change in configuration, like adding cache, may have a significant positive effect on performance. However, at the higher end of the scale where a system is highly tuned, it is likely it would cost a great deal more than the price of adding the cache earlier on, to achieve the same performance improvement.

    Thus two systems may offer similar levels of performance, but one may cost significantly more than the other. An important factor in any buying decision. The same warnings apply to analyzing pricing as performance. Vendors can, and do, optimize configurations to minimize cost. So the price performance, like performance, should be treated as an approximate guide, rather than a definitive figure for detailed comparisons.

    A good example of the importance of price is the concept of increasing performance by linking two, or more, systems together in a loosely coupled configuration. Several companies have used this technology in TPC testing to report performance above the level of their highest performing single system. In some cases, the price/performance of the loosely coupled configurations increases along with the performance when compared with the price/performance of their highest performing single system result. The TPC price/performance does not allow you to accurately predict the costs that would apply to your individual computing environment, but this information gives some trend information that, in the absence of TPC tests, would not be very easy to determine.

  3. Processor versus Real Performance
    The advent of TPC testing allowed vendors and users alike to abandon MIPS as a measure of relative system performance. MIPS has now been effectively replaced by the SPEC suite of tests, but like MIPS, SPEC tests are primarily a measure of processor complex performance. There is more to transaction processing than just the processor, because the I/O and disk subsystems can substantially influence performance.

    To illustrate this, let's compare an 8-way SPARCserver 1000 and HP 9000 Series 800 H70. Table 1 compares the tpmC result with that of SPEC's SPECrate_int92 metric.

    Table 1
     
    SPECrate_int92
    tpmC
    $/tpmC
    SPARCserver 1000 8-way 10,113 1,079.4 $1,032
    HP 9000 H70 3,757 1,290.9 $961

    The processor-based SPEC test implies the Sun system has a significant performance superiority, nearly 2.7 times better. However, the TPC-C test shows that in multi-user transaction testing the systems are much more evenly matched.

    TPC testing provides a platform to look at the entire computing workload. Just using a processor benchmark can be very misleading.

  4. Balances Vendors' Marketing Claims
    When was the last time you read in a vendor's press release, or heard a vendor's sales or marketing representative say, that they have the second best performing system on the market? Instead everyone is a winner. TPC results help to act as a sanity check on such claims of performance leadership.

    We believe vendor technology fits into one of three categories of availability: intended, announced or currently supported. A company may have an intention to support a particular product, but at present no such product exists, except perhaps in the lab. A product may be announced, but not for general release for some time (perhaps because bugs have to be ironed out etc.). Or a product is announced and available today and will perform satisfactorily. Sometimes it is not clear into which category a particular product or technology fits.

    Symmetric Multi-Processing (SMP) is a very good example of how the lines can be blurred. Many vendors have announced, and in many cases released, hardware platforms which support a certain number of processors. For example, Hewlett-Packard's T500 supports a maximum of (12) CPUs, Sun SPARCcenter (20) and Data General AViiON (16) processors. What they all have in common is that at the time of writing their best non-clustered TPC result consisted of a configuration with exactly half the number of potential maximum stated CPUs.

    One interpretation of this, is that systems may at present not scale well above the tested number of CPUs. It is possible that at any given time the maximum processor and practical processor limit may be different. We are not saying this is the case with the above vendors. They may have very good reasons as to why they have used the configurations they have.

    However, this is a good example of how TPC results can highlight an issue that may be important to a buyer, that otherwise may not be readily apparent without TPC testing. It is not just SMP implementations either. If a vendor is not using their latest and greatest products and technologies in TPC testing, there is a reason for it. TPC results once again provide a trigger for further inquiries if those new products not featured in testing are being offered for sale.

  5. The Myth of SMP Linear Scalability
    This next issue could arguably be linked to the previous point. Linear SMP scalability is the Holy Grail to which all aspire, but no one achieves, and TPC results show this. Some vendors come closer than others. TPC results provide some real data to allow one to make their own assumptions about the scalability of one vendor's SMP implementation over another's. Something that would be impossible if comparing vendors' own claims, or very costly to conduct your own testing.

  6. Compare Technologies
    With a sufficient pool of results one can step back from system to system comparisons and look at technology comparisons and trends. An example being parallel processing. Interest in commercial applications for parallel processing has grown during 1994. We may be familiar with the theory of how it differs from SMP. But how does it actually compete? Tandem has released TPC-C results which demonstrate quite clearly the potential of parallel computing. Hopefully other companies will release results of parallel architecture machines. This will provide an insight, not only into how various parallel processing platforms compare, but how parallel computing, in general, compares both in performance and price terms with SMP implementations.

    Companies have adapted their fault resilient products for performance enhancement. Yet another alternative to SMP. AT&T has released a result using LifeKeeper/DLM, IBM with HACMP/6000, Sequent with ptx/CLUSTERS and Digital with VMSclusters. Generally these products are marketed as fault resilient offerings. Two or more systems are linked in a loosely coupled configuration sharing a common disk subsystem. Systems would monitor the health of each other and in the event of one system failing, depending on the configuration, a hot standby system may take over operation of the failed system. Alternatively, an existing working system may take on the extra workload of the failed system. Performance may be degraded in the second instance, but at least processing could still continue.

    This technology also lends itself to two or more systems sharing the same database in order to increase transaction throughput. With TPC results available for such implementations one can examine performance and price of these and compare them with non- clustered offerings. As highlighted previously in this article, one obvious finding from these clustered TPC configurations is that in some cases the price/performance rises with a clustered configuration.

  7. Software and Performance
    It is very easy to get carried away with the performance of the hardware. However, often we forget the influence the software can have on OLTP performance, be it the operating system or the database. For example, Digital has tested a system with the same database, but with different operating systems, DEC OSF/1 in one case and OpenVMS in the other. The OpenVMS system offered 16.5% more performance than the DEC OSF/1 system. Apart from these two results we have never seen any publicly available information that compares the performance of the two operating systems for a given hardware platform. This information may be largely irrelevant to many people, but would be very enlightening for someone who is weighing up the pros and cons of DEC OSF/1 versus OpenVMS. Once again it does not provide a definitive answer, but provides a trigger for further inquiries.

    The database too can have a profound effect on performance. There are many examples of this. At the high-end of the performance spectrum, of the two TPC-A tested IBM mainframes, the smaller of the two systems offers more than twice the performance of the larger. The reason! The higher performing system uses the specialized transaction processing TPF database, whilst the other uses IMS/DB.

    Hewlett-Packard recorded a now withdrawn result for their HP 9000 I50 c/s of 184.55 tpsA @ $9,137/tpsA. Also, HP has recorded a result of 303.10 tpsA @ $5,612/tpsA for a standalone I50 configuration. The major difference between the two was the use of ORACLE7 in the former result and ADABAS in the latter.

    This sort of insight into database performance is not readily available by other means. We are not saying that TPC results will provide all encompassing comparisons of the various databases. Instead TPC results provide an inexpensive forum for users to examine the effect databases can have on performance, and thus provide so me stimulus to investigate this issue further if deemed important.

  8. Cost of Software
    The performance of databases is one issue, pricing is another. Many different databases are featured in TPC testing. The Full Disclosure Reports or Executive Summaries can give some insight into the relative prices of the various databases. There can be large variations.

    Not only can comparisons be made about the pricing structure of competing database products, the pricing practices of the same database on different platforms may provide interesting reading. Despite the desire to move to user-based pricing, many databases still have as part of their pricing structure an element of processor, or platform tiering. That is, the same database may cost more on one system than another.

    Oracle's pricing of its ORACLE7 database is a case in point. The TPC-A results for the HP H70 c/s and Sun SPARCserver 1000 c/s show that the ORACLE7 database costs 50% more on the H70. Despite both systems emulating roughly the same number of users. We understand Oracle charges per user up to an unlimited user ceiling. This ceiling can vary by system. In the case of the H70 there is a 192 user limit, whilst the Sun system enjoys a 128 user ceiling, before the licence becomes unlimited.

    Not profound knowledge, we admit. But it certainly highlights an issue that should be canvassed, especially if the purchasing decision has come down to two hardware platforms. The cost of the software may not necessarily be the same on both systems, despite the workload being the same on both.

  9. Trademarks Provide Consistency
    The TPC's trademarking of TPC Benchmarks is very important. There are very strict rules to the usage of TPC trademarks which include the benchmarks themselves and the metrics. Companies cannot claim TPC Benchmark results which don't adhere to the TPC's rules of disclosure. On occasions people may disagree with the rules that apply to TPC testing, but at least all play to the same rules and thus some consistency in testing is achieved. Look at the alternative.

    Most readers may be aware of vendors, OLTP tps, or just, tps claims. These claims often accompany product releases and are the vendors own measure of transaction performance. But what performance? What workload was used? What was the price/performance? Was more than one system used to achieve the desired performance? Can such a system be delivered today, or next year? Maybe! Was a configuration actually tested, or was it merely an estimate?

    TPC Benchmark results therefore provide a relatively consistent view of performance with a well defined set of rules and accountability. Merely offering a benchmark specification without requiring accountability opens the door for active use of benchmark specials with very little chance of their use being disclosed, and thus the level playing field quickly disappears.

  10. Competition Provides Fertile Ground for Engineering Improvements
    Finally, we come to a more obscure benefit of testing. A competitive environment, be it computer systems or cars, invariably promotes innovation. This is one reason why car companies are involved with motorsport, such as Indy Car or Formula 1 racing. It is not easy to see benefits of such involvement translated to the average motor vehicle, because the improvements and innovation are often quite subtle. An example may be using motor racing as a testing ground for parts utilizing new alloys or compounds to improve vehicle reliability. Over time, such a part may be quietly adapted for use in standard motor vehicles. The competition also promotes creative thought to solve problems. From this creativity may come future products for mainstream cars.

    The space race is another good example of competition in a very specialized field having an effect on the wider community. We all only have to look at our non-stick frypans for an example of this.

    Similarly, TPC testing provides a competitive testing ground for computers. The competition provides the impetus for innovation. Anecdotal evidence suggests that the companies involved in TPC testing are seeing technical, and not just marketing, benefits from their participation. Unlike the marketing advantages which tend to be quite high profile, the technical benefits are more subtle. Hopefully, over time these may be translated into the products and services the vendors provide. Who knows, this may have already happened.

In Review
We accept that the ten reasons outlined in this article address issues at an overview, rather than a detailed level. As we have shown individual, or even groups of like results, provide trend data which otherwise may be impossible, or very expensive to get from other sources. TPC results should not be used as a substitute for benchmarking of one's own application if performance is a critical decision criteria. If used correctly, however, they can become an inexpensive tool for the user in providing independent preliminary information on current vendor positioning. Not only on performance, but, as we have shown, on other issues such as price and technology support as well.
 

Valid XHTML 1.0 Transitional Valid CSS!