Cover Story
BASEBALL HAS BEEN VERY, VERY GOOD TO ARI

David Enders

10/01/99
Contract Professional

Independent contractor Ari Kaplan hopes to someday leverage his skills as a statistician and Oracle guru to land his dream job: general manager of a major league baseball team.
by David Enders

Consider the 1990 Upper Deck Series baseball card #17—the Slammin' Sammy Sosa rookie card. In the hands of a 10-year-old, the edges are already worn and dog-eared. On the front of the card is Sosa's youthful, smiling face in living color. On the back is the science of his game set forth in black and white in a simple table of statistics. The man and his numbers— therein lies our national pastime's inherent beauty. Like no other sport, it is as simple or as complex as you care to make it.

Baseball is both rich and lousy with statistics. Some fans love them...some fans hate them. Many would argue that our ability to create statistics has far outdistanced our ability to make any sense of them. To contract professional and baseball statistician Ari Kaplan, the art and science of baseball are but two sides of the same trading card. Both sides have their value, but the game is won and lost around those worn and dog-eared edges.

We all knew an Ari Kaplan growing up. We raced through Little League and collected our cards as fast as possible, throwing them in a box to sort them out later. But the cards of the Kaplans of the world were organized. Kaplan had sold his first software program at age 11 to a magazine sponsoring a one-liner contest. Wielding the mighty 1977 Radio Shack TRS-80 like Sosa wields the lumber, the 255-character program of an adventure through a haunted house was written in what he calls "Bill Gates Basic." Those programming skills and his penchant for organization, not his ability to play ball, eventually took him all the way to the big leagues.

A Major League Contractor

Kaplan has made a living finding order in chaos. The 29-year-old from Chicago has put up some amazing stats in his first 10 years in major league contracting. Kaplan is a certified Oracle DBA, the co-author of four books, three on Oracle and another on Windows 2000 (see "The Road to Oracle Stardom"). He maintains a Web site featuring hundreds of Oracle tips, was named Caltech's Alumni of the Decade for the 1990s (see "Caltech All-Star of the '90s"), and has compiled an impressive stat sheet of diverse contracts that have kept his skills fresh and innovative.

Growing up a Mets fan in New Jersey, he and brother Todd were two of the original Mets "Coneheads," fans of ace pitcher David Cone who wore conical domes on their heads (mimicking Saturday Night Live's alien Conehead family).

Contracting as a lifestyle allows him to take on rewarding pro-bono projects and pursue the sport he loves. Kaplan's long-term goal is to become general manager of a major league baseball team. He's taken the first step by writing several new chapters for the book on baseball statistics. In so doing, he has captured the attention of major league management and scouts.

While working toward his degree in engineering and applied sciences at Caltech, Kaplan took his first big cut at merging baseball and computer science and hit it out of the park. He proposed and directed an undergraduate research fellowship project on the history of baseball from a mathematical perspective, a field of study he calls SABREmetrics. It's named after the Society for the Advancement of Baseball Research, which scrutinizes baseball stats. His research led to the development of new baseball statistics using mathematical "normalization" formulae—and to his break into the big leagues. For the past 10 years, Kaplan has served as a consultant to five major league baseball teams.

Kaplan's Caltech thesis provided the pinch hit that launched his baseball career. He presented "How Do You Spell Relief? An Analysis of Baseball Pitching, 1876 to Present" to the Caltech board of trustees. One trustee in particular was interested. Eli Jacobs, then owner of the Baltimore Orioles, hired Kaplan on the spot. He fondly recalls a first meeting with the legendary Frank Robinson and Orioles scouts to demonstrate a new scouting database. "The scouts came walking in all carrying buckets and I am wondering, 'Why in the world would anybody bring a bucket to a meeting?' The answer soon became clear...they simply needed somewhere to spit their tobacco chews."

The 20-year-old Kaplan spent his summer designing and implementing the Orioles' Computer System and the Earl Weaver Statistics Program to provide instant information on both major and minor league ball players. The Orioles system uses FoxBase, dBASE IV, and Pascal on a Novell network. Kaplan assisted in the design and implementation of two databases, developed and debugged several screens and reports, and customized a search feature from a "pick-and-choose" screen into one using complex program statements.

"It was just a case of being in the right place at the right time," Kaplan laughs, but the project was just the first of many in which he has drug major league baseball, kicking and screaming, into the computer age.

His new way of making sense of baseball stats also grabbed the attention of the national media and gave Kaplan priceless exposure on television's Today Show and Sean Caleb's sports program on CNN, and a write-up in the L.A. Times. More importantly, it gave him a foot in the door of major league baseball.

"It is very hard to break into the major leagues in any capacity," Kaplan says, "but once you're in, you'll find it is a very close-knit community." He has since contracted with five ball clubs. Kaplan took a contract with the San Diego Padres to build a comprehensive scouting database. He met Dan Duquette, then general manager of the Expos, in the Padres owners' box during a game. "We talked about the Scouting and Player Development system that I had developed for the Padres, and he really wanted it implemented for the Expos," he recalls. In 1991, such systems were a rarity in baseball. "He [Duquette] saw the immense value that transforming a paper-trail system to an interactive real-time system database would add to the Expos organization."

Kaplan oversees the development of the Expos Scouting Project, including a relational database to provide front-end screens to retrieve major and minor league player data for scouts, managers, and other players. Since that project began, the Expos have twice been awarded Topps' "Major League Organization of the Year," in 1993 and 1996. The Scouting Project uses Paradox 3.5 on a Novell NetWare system.

Bringing Baseball Statistics into the Computer Age

The fellowship project that started it all zeroed in on pitching statistics that, even in high school, Kaplan realized were glaringly misleading. The earned run average, or ERA, is the yardstick against which all pitchers are traditionally measured. That single stat, one that could be worth millions to a pitcher, had a bug. The problem lies in "inherited runners"—runners left on base when one pitcher leaves the game to be replaced by another. If the relief pitcher lets those runners score, those runs are charged against the pitcher who allowed them to reach base, tarnishing his ERA—the ratio of earned runs scored (without the help of fielding errors) per nine innings pitched. The reliever, on the other hand, might have pitched poorly in allowing the inherited runners to score, but can still come out of the game with his own ERA unharmed.

That's where the RE (reliever effectiveness) statistic comes in. It is a ratio of inherited runners that scored to the number "expected to score." The number "expected to score" is determined by a matrix of probabilities based on actual major league games played in the past five years. The matrix takes into account the variety of scenarios a relief pitcher encounters when called into a game (runners at first and third with two outs, bases loaded, no outs, and so on) and determines what scoring is "normal" given each scenario. If a reliever enters the game in a situation where two runs scoring is considered "normal" and he allows only one to score, his RE is .5. Like the ERA, lower scores are better.

"It is surprising that no one had properly quantified how misleading many pitching statistics are," says Kaplan. Although the RE stat might at first seem nitpicky, his research suggests otherwise. To illustrate, Kaplan tracked the pitching effectiveness of the 1989 Dodgers and found that the club's most often used reliever, Tim Cruise, had a decent ERA, but the worst RE in the National League. With an RE of 1.5, he allowed 50 percent more runs to score than what is "normal" considering his given relief situations. While Cruise's ERA was unaffected by the inherited runners scoring, you can bet the starting pitchers grimaced when he took the mound in relief.

To make more sense of pitching stats, Kaplan developed the PERA (potential ERA) and the WERA (worst-case ERA). They show what a starting pitcher's ERA would be if none of the runners left on base score and if all the runners left on base score, respectively. "The difference is remarkable," he says. When comparing those best- and worst-case scenarios, starting pitchers' traditional ERA can vary by an average of more than 30 percent for an entire season. "A statistic with that much variance should not be used as it has been for determining trades and player salaries," Kaplan argues.

The ERAs of Dodger pitchers Orel Hershiser and Mike Morgan, for example, were relatively close in 1989 at 2.31 and 2.54, respectively. When viewed from the perspective of the PERA vs. the WERA, a very different picture emerges. If none of the runners these starters left on base scored when the reliever came in (PERA), Hershiser's ERA would have dropped to 2.24 and Morgan's to 2.35. But if all those inherited runners had scored (WERA), Hershiser's ERA would have increased only to 2.54 while Morgan's would have ballooned to 3.36. Either Hershiser didn't leave his relievers in quite as precarious predicaments as did Morgan, or Morgan's stats were positively influenced by excellent relief pitching.

Another favorite of Kaplan's is his Save Value statistic—a much-needed improvement, he says, over the traditional save. Under the save, a relief pitcher is given credit for winning a game after pitching three or fewer innings with an inherited lead of three or fewer runs. "What makes the save statistic misleading," he says, "is that a pitcher who comes in with two outs in the ninth and a three-run lead gets one save, the same credit as a pitcher who pitches for three full innings and just a razor-thin one-run lead." His save value statistic, again through normalization formulae, mathematically credits the reliever proportionally for the innings he worked and the smaller lead he inherited coming into the game.

Making the Case

Kaplan's baseball programs help teams make sense of the data. They store not only the cold, hard stats on a ballplayer, but subjective qualities as well. "A player's aggressiveness, emotional control, arm accuracy, and dozens of other factors not described by traditional statistics are kept in the baseball databases," he says. "The biggest challenge in getting teams to use the program is that many managers and scouts are unfamiliar with what new technology can accomplish."

Kaplan decided early on to use baseball's almost anti-technology bent to his advantage. "I developed screens that are extremely intuitive and look like the paper forms that the organizations are used to working with." He used data mining techniques to develop a "customized search" screen where scouts and management could interactively select both objective and subjective traits of a desired player and the program would produce a list of players that meet that criteria. "For example, a manager can ask for the strongest-armed catcher in the National League East who is under 28 years old." To his amusement, Kaplan discovered some female staff testing the flexibility of the program on their own while conducting searches for "all single ballplayers in my area code who make over a million dollars."

"There is still a fear of the unknown with computers going on in major league baseball," says long-time major league baseball scout and Kaplan client Bill Harford. He thinks Kaplan's love of the game and technological knowledge go a long way in overcoming those fears. "I pay him next to nothing, so it's obvious he does it because he enjoys the game."

Working Toward a Dream

Kaplan's dream of becoming a GM is not as far-fetched as it once may have seemed. Many a baseball manager has spent seasons learning statistics and scouting before heading to the dugout, he points out. Making sense of players and their stats is probably the single most important job of the manager.

Harford agrees that Kaplan's dream is not out of reach. "In baseball, anything is possible."

Kaplan has come a long way in his first 10 seasons. He plans to pursue a business degree before attempting to break into the scouting department of a major league team, to better position himself for his long-term goal. His education in complexity theory, computer science, and engineering arm him with powerful tools to analyze the black-and-white side of the baseball card, using past statistics to predict future performance, knowing who to keep and who to trade. Baseball, IT contracting, the emerging Internet: all are self-organizing, complex, adaptive systems that operate on what complexity theory mystics would call "the edge of chaos and order."

"People tend to think that baseball statistics are accurate because they describe individual performances," Kaplan says. "What is really happening during a game is that a team of people is performing and affecting each other's statistics."

That's how our national pastime is played. Take another look at the stat side of Sosa's 1990 rookie card. The numbers say he is 22 years old and 165 pounds of potential. Turn the card over and look at the man himself. Turn it over again. Keep turning.

------------------------------------------------------------------------

David Enders is a freelance writer based in Austin, Texas.

THE ROAD TO ORACLE STARDOM

In the red-hot Oracle DBA market, Kaplan is an all-star, in many ways thanks to the Internet. At 13, he started his first electronic BBS, which grew to be one of the largest in New Jersey. Later, he discovered Usenet newsgroups. "The Oracle newsgroup, in particular, was full of people asking technical questions," he recalls, "and I would try to answer as often as I could." He noticed many questions were repeated often enough that it simply made sense to build his own Web site to address these "frequently asked questions."

"Although it was not my intention, the Web page has opened up many opportunities for me. Whenever I speak at conferences, people from all over the world come up to me and say that they visit my Web page. Things like that give me the greatest satisfaction," says Kaplan. "People feel that they can trust me to work with them, mainly because my work is displayed and used."

Kaplan paid his dues as a full-timer for the Oracle Corp. from 1992 to 1994. Like a journeyman ballplayer in the minors, he moved quickly during short assignments throughout the country.

Finally called up to the big leagues, he left Oracle and used the Booz-Allen & Hamilton agency to secure a contract with the Merck Corp. in West Point, Pa., setting up a mission-critical inventory system for the world's largest pharmaceutical company. As the Oracle DBA, Kaplan focused on tuning the database, designing and implementing the production architecture, preparing disaster-recovery plans, and installing and upgrading software. The project was one of the first Oracle 7.1.3 implementations on Sun Solaris. The Merck contract was both the first and last time Kaplan has used an employment agency.

He followed up with the U.S. Department of Veterans Affairs in Maywood, Ill., as DBA and developer for the VetsNet project to automate claims processing and benefits. With production architecture in more than 58 offices worldwide, the network runs on Sequent servers running Oracle 7 and keeps tabs on veterans from the Civil War onward. "As the only Oracle consultant on a very large project," says Kaplan, "I found myself training other professionals in Oracle." The project was also his first in Chicago, where he has remained, since becoming a die-hard Cubs fan, as if there were any other kind.

Kaplan served as lead Oracle consultant for more than 30 applications in a contract with 3Com/U.S. Robotics, working on a huge network of more than 50 Oracle 7.3 and 7.1 installations on HP servers running across T3 and T1 lines. That contract was followed with another important career breakthrough, a contract with InterAccess, one of the nation's largest ISPs. "That project propelled me straight to the top of the Internet world," Kaplan says, where the Oracle database dominates.

At InterAccess, he served as lead Oracle consultant on the company's new billing and authentication system running the Oracle 8 database engine on Sun Solaris servers. Thanks to InterAccess, he says, Wrigley Field—a throwback to the good old days of baseball—ironically became the first stadium where fans could e-mail television announcers and have their questions read and answered by the announcers during the broadcast.

Having studied complexity theory and artificial intelligence at Caltech, Kaplan then went to bat as the lead Oracle 8 consultant for the Emergent Solutions Group of PricewaterhouseCoopers in New York. The group used "breakthrough approaches" in applying complexity theory to uncover hidden patterns in how products are accepted by consumers. The project called for a database of more than 150,000 "synthetic consumers"—artificial agents endowed with both objective and subjective attributes. Algorithms were formulated to run simulations on the database and test products on these "consumers." ESG customers, like Macy's, can then better predict what fashions will be a hit next year and movie studios can guess when to release a film to better its chance of becoming a box-office smash.

The ESG work, innovatively applying complexity theory to the marketplace, resulted in a book, How Hits Happen, by the group's leader Winslow Farrell.

"Tuning played a large part of my role at ESG," Kaplan says. "For instance, the amount of time it would take to load in data weekly from one of its customers took about two days. Through tuning the database, this time was cut down to just a few hours."

Database tuning is becoming even more critical, Kaplan says, as the IT industry expands much faster than the workforce, resulting in what he sees as a lack of Oracle skills. Up-to-date skills are absolutely necessary, he says, "because the default database is really no longer a good choice. In most instances, it is much slower than it could be."

He doesn't see the problem righting itself soon with Oracle's exploding Internet growth. "Constant improvements to the core database and ever-emerging database-related tools require adaptive and self-motivating people," he explains. "Although there is a lot of programming involved, many factors still cannot always be learned in a training class. Project experience is key to many lead database roles, which slows the growth of Oracle professionals."

Kaplan keeps his Oracle skills cutting edge in an ongoing contract with TextWise (Syracuse, N.Y.). The company, he says, is doing groundbreaking work in linguistic modeling and artificial intelligence. Relationships of words, contexts of the words, synonyms, themes, and similar ideas are all quantified by a complex relational database. "For example, if you are looking for something dealing with the Middle East, conventional search engines just look blindly for the word 'Middle' and the word 'East.' The TextWise program searches for not only 'Middle East,' but breaks it down into 'Israel,' 'Syria,' 'Lebanon,' and so on. It determines if 'Middle East' is referred to as a geographical region, or as a political group, or something entirely different, based on how it is used in the sentence."

 

CALTECH ALL-STAR OF THE '90s

Film director Frank Capra, Nobel Prize winner Linus Pauling, former Israeli Defense Minister Moshe Arens, Cogit Corporation co-founder Lounette Dyer, and statistician Ari Kaplan share a common thread. All are recipients of the California Institute of Technology's Alumni of the Decade Award.

On June 13, 1997, Kaplan received the award for the 1990s along with Dyer during a formal ceremony including many past recipients. Rubbing shoulders with Apollo 17 astronaut Harrison Schmitt and Bill Pickering, founder of the Jet Propulsion Laboratory, Kaplan says, "was an evening of basically me in disbelief that it was actually happening."

Although Capra had passed away before the 1997 award program, Kaplan recalls an earlier visit to the Capra home where he toured the director's fallout shelter built in the 1960s to protect his films in the event of a nuclear war. "This was one example of failing to predict the future," Kaplan says, "but it was a good bet at the time." It seems even Frank "It's A Wonderful Life" Capra was a man who believed in keeping all his bases covered.

Past recipients of the Caltech Alumni of the Decade are:

1900—Franklin Jewett, founder of Bell Labs; Joseph Grinnell, noted ornithologist

1910—Frank Capra, noted film director; Earl Mendenhall, over 200 patents for electric motors

1920—Arnold Beckman, founder of Beckman Instruments; Linus Pauling, two-time Nobel Prize winner (Peace, Chemistry)

1930—Charles Townes, Nobel Prize winner (development of laser); Bill Pickering, founder of Jet Propulsion Laboratory

1940—Paul MacCready, inventor and automotive/aircraft developer; Gene Shoemaker, astrogeology pioneer

1950—Harrison Schmitt, Apollo 17 astronaut and U.S. Senator; Moshe Arens, Israeli defense minister

1960—York Liao, co-founder and executive director of Varitronix; Joseph Rhodes, PA Public Utility Commissioner

1970—David Ho, Time magazine's 1996 Man of the Year; Erik Sirri, SEC's chief economist

1980—Arati Prabhakar, director of NIST; Bill Godd, founder of Knowledge Adventure

1990—Lounette Dyer, co-founder of Cogit Corporation; Ari Kaplan, baseball statistician

Back to Ari Kaplan's Home Page

 

 

 

 

Commissioner

1970—David Ho, Time magazine's 1996 Man of the Year; Erik Sirri, SEC's chief economist

1980—Arati Prabhakar, director of NIST; Bill Godd, founder of Knowledge Adventure

1990—Lounette Dyer, co-founder of Cogit Corporation; Ari Kaplan, baseball statistician

Back to Ari Kaplan's Home Page