The evolution of statistics

During my lunch break, I was perusing the ESPN website and stumbled across this article. It contemplates whether or not a .300 hitter (in baseball, for those of you who are sports-challenged) is meaningful anymore. As a baseball fan, the article caught my attention. I didn’t read through the entire article (it ended up being a much longer read than I expected — too long for me to read while on a lunch break at work), but from what little I did glean from it, a couple of things struck me.

First, they talk about Mickey Mantle‘s batting average and how important hitting .300 was to him. That struck me a little funny, because (as far as I know — as I said, I didn’t get through the entire article) there was no mention of the fact that he actually finished with a batting average under .300. His career batting average was .298.

The second thing that struck me was (Yankees’ first baseman) Luke Voit saying how he felt that “feel like batting average isn’t a thing now.” Indeed, baseball is a much different game than it was, say ten, twenty, or thirty years ago. Analytics are a big part of statistics these days. A lot of stats that are prevalent now — WAR (wins above replacement), exit velocity, OPS (on-base plus slugging), etc. — didn’t even exist when I was a kid growing up, closely following my Yankees. Back when I was eating and sleeping baseball, hitting was about the triple-crown statistics — batting average, home runs, and runs batted in (RBIs). But now, we have “slash lines,” on-base percentage, slugging percentage, and so on. Even as big of a baseball fan as I am, I haven’t a clue about many of these “new age” stats. I still have no idea what WAR represents, I’m not completely sure as to what the numbers in a slash-line are, and I don’t know what constitutes a respectable OPS.

That got me thinking about how statistics have changed over the years, and whether or not that applies to statistics outside of baseball (or sports, for that matter). Maybe people who study data analytics for a living might know this better than I do, but what business statistics have a different meaning now than they did ten, twenty years ago? Are there any numbers from way back when that I should now take with a grain of salt?

I’m sure there are many examples of this outside of sports, but I struggled to come up with any. Off the top of my head, I remember how a company where I once worked made a big deal out of perfect attendance — to the point that they gave out perfect attendance awards at the end of the year. However, that had to contend with situations such as coming to work when you were sick, and so on. Do you really want someone who’s sick coming into work? These days, workplaces do not want sick people in the office, and with the advent of work-at-home provisions, perfect attendance isn’t so meaningful, anymore. (By the way, my understanding is that company no longer recognizes or rewards “perfect” attendance.)

So I suppose the takeaway is, how well do statistics age? Can they be compared with the same statistics now? What needs to be considered when analyzing statistics from years ago? It’s true that numbers often tell a story, but in order to get the full picture, you also need to understand the full context.

Advertisements

Election day

“Can I tell you something; got to tell you one thing if you expect the freedom that you say is yours; prove that you deserve it; help us to preserve it, or being free will just be words and nothing more…”
— Kansas, “Can I Tell You”

I don’t think I can say it any better than the song lyric I quoted above.

Last night, I overheard a coworker say, “I don’t vote.  It doesn’t make any difference.”  And he continued to spew about his views on the world.

I kept silent, but I am not ashamed to say that I wanted to tear him a new a**hole.

People died so I can vote.  That is something I do not take lightly.  For someone to brush it off and disrespect that right like that absolutely incenses me.  I vote every year.  I make sure I vote every year.  And so should you.

The fact is, your vote does matter.  In 2016, the vast majority of the country did not vote — because “it wouldn’t make a difference.”  Had at least half of these people gone to the polls, chances are that the current state of the union would be much different.

Yes, our system is far from perfect.  Yes, our system has flaws.  But the fact is, your vote matters.

Want to change the system?  Vote.

It’s okay to say “I don’t know”

If you ever have a chance, I recommend sitting in on Thomas Grohser’s presentation called “Why candidates fail the job interview in the first minute.”  (Tom is a great speaker, and I suggest you go hear him talk, anyway!)  In his presentation, he discusses a number of reasons why job candidates often blow the job interview.  The first time I sat in on his presentation, I asked him what I thought were some good questions — so good, in fact, that the next time I attended a SQL Saturday where he gave that presentation, he asked me to sit in just so I could ask those same questions and make some comments as a talking point for the audience.  (He even joked about utilizing me as a prop for his presentation!)

One of the points that he makes in his presentation is that a candidate is not expected to know everything.  We are human, and we are not perfect.  Nobody is all-knowing, and as well-versed as we try to be on a subject, we won’t know everything about it.  Even experts in a subject field won’t know every little thing about their subject

Tom mentions that when he interviews a job candidate, he will ask at least one question that either does not have a correct answer, may have multiple correct answers, or is ambiguous.  (For those of you who are not DBAs, data professionals often joke that the standard answer is, “it depends.”)  He is not looking for a singular correct answer; rather, he is looking for how the candidate answers.

This brought to mind a memory of a class I took in grad school.  I missed a class because I was out sick, and it turned out that the material covered that day ended up as a question on the mid-term exam.  I don’t remember exactly how I answered that question, but I remember starting it something like this: “I don’t remember going over this subject, but based on the nature of this question, this is what I think it means…”  Not only did I end up answering the question correctly, I ended up getting a 97 (out of 100) on the exam.

So if you don’t know the answer, how would you go about getting it?  These days, technology makes it easy to look things up online.  “Google it” has become a part of our lexicon.  Trying to find answers is our basis for research; if we don’t have the answer, we try to figure out what it is.  That is how we learn.  I’ll go as far to say that not knowing an answer is better than trying to fake your way through providing an answer.  Would you rather give an answer you don’t know and end up giving a wrong answer, or would you rather take the time to do your homework and give a better answer?

Too many of us stress ourselves out because we try to be perfect.  Any time we are tested — whether it’s on an exam, a job interview, or any instance where we are expected to give testimony — we expect ourselves to be perfect.  We expect to have the answer to every question.  The reality is that this is impossible.  We won’t have every answer, and we shouldn’t expect to.  “I don’t know” is a perfectly acceptable answer, and too many people don’t realize that.  Just say that you don’t know, and explain how you’d go about finding out.  And the next time you’re asked the question, you’ll have a better answer.

A few words can make a difference

A couple of weeks ago, the Rensselaer Polytechnic (the RPI student newspaper) published a couple of op-eds in regard to the situation at RPI.  (My friend, Greg Moore, wrote a piece a while back related to this issue.)  In response to the op-eds, I decided to respond with my own letter to the editor.

This morning, a friend posted to my Facebook that my letter, to my surprise, was garnering some attention.  I won’t say that it’s gone viral, but apparently, it’s caught a number of eyes.

I should note that my donations haven’t been much.  I was only a graduate student at Rensselaer, not an undergrad, so the social impact on my life wasn’t quite the same, and other financial obligations have kept me from donating more of my money.  That said, I’ve donated in other ways; I’ve been a hockey season ticket holder for many years (going back to my days as a student), I’ve attended various events (sports, cultural, etc.) on campus, and I’ve donated some of my time to the Institute.

Although my donations have been relatively meager, more importantly, I wanted to spread the word that I was no longer supporting RPI, and exactly why I was discontinuing my support.  How much I was contributing isn’t the issue; the issue is that I am stopping contributing.  For the first time in years, I have no intention of setting foot in the Field House for a hockey game during a season.  I wanted to make clear exactly why.  A large number of alumni have announced that they were withholding donations.  I wanted to add to that chorus.  It wasn’t so much how much I was donating; rather, I wanted to add my voice, and hopefully encourage other students and alumni to take action against an administration that I deem to be oppressive.

One of RPI’s marketing catchphrases is, “why not change the world?”  It looks like I’m doing exactly that with my letter.  Don’t underestimate the power of words.  Indeed, with just a few words, you can change the world.

Instant decisions


(Source: New York Times)

A NY Times recap of a ballgame got me thinking about instant decisions.

I watched this game on a TV at a restaurant where I was having dinner with my wife.  I remember watching Brett Gardner getting thrown out as he was caught in a rundown between third and home.  I remember thinking, “now the man on third is erased.  What were you thinking, Brett?”

As the Times article points out, it ended up being a fateful decision by (Orioles pitcher) Dylan Bundy.  Had he thrown the ball to the shortstop instead of his catcher, he potentially could have turned a double play to get his team out of the inning.  Instead, the Yankees, with an extra life, rallied in the inning to go up by a score of 5-0 (highlighted by a Tyler Wade grand slam).  The Yankees ended up winning, 9-0 (making me, a Yankee fan, happy).

But this article isn’t about the game.  It’s about the instant decision.  In this case, a quick decision ended up affecting the outcome of a ballgame.

Think about all the times in your life when you’ve had to make an instant decision on your feet.  We’ve all had them.  How did they turn out?  Good?  Bad?  Did they end up changing the course of your life, or were they just blips on your lifetime radar screen?

I’m sure there’s some kind of psychology as to how your background — upbringing, education, etc. — might play a role regarding the kinds of split-second decisions you make, but this is a subject about which I know nothing.  Rather, it got me thinking about the idea that quick decisions can have consequences.  In the scheme of things, many of them might not have any effect.  But depending on the time, place, and circumstances, such decision-making could have disastrous consequences — or result in the opportunity of a lifetime.

#BI101: An introduction to BI using baseball

Edit: This is the first of a series of articles (I hope!) in which I’m trying to teach myself about BI.  Any articles I write that are related to this, starting with this one, will be preceded with “#BI101” in the title.

As I stated in a previous article, one topic about which I’m interested in learning more is business intelligence (BI).  For those of you who are new to BI, it is a broad topic.  In a nutshell, it can probably be described as “consuming and interpreting data so it can be used for business decisions and/or applications.”

I’ll admit that I don’t know a lot about BI (at least the fine details, anyway).  I did work a previous job where I touched upon it; I was tasked with performing some data analysis, and I was introduced to concepts such as OLAP cubes and pivot tables.  I’ve gotten better at creating pivot tables — I’ve done a few of them using MS Excel — but I’ll admit that I’m still not completely comfortable with building cubes.  I suppose that’ll come as I delve further into this.

A while back, my friend, Paresh Motiwala, suggested that I submit a presentation for Boston SQL Saturday BI edition.  At the time, I said to him, “the only thing I know about BI is how to spell it!”  He said to me (something like), “hey, you know how to spell SQL, don’t you?”  Looking back at the link, I might have been able to submit (I didn’t realize, at the time, that they were running a professional development track).  That said, Paresh did indeed had a point.  As I often tell people, I am not necessarily a SQL expert — I know enough SQL to be dangerous — nevertheless, that does not stop me from applying to speak at SQL Saturday.  Likewise, as I dive further into this topic, I’m finding that I probably know more about BI than I’ve led myself to believe.  Still, there is always room for improvement.

To tackle this endeavor, once again, I decided to jump into this using a subject that I enjoy profusely: baseball.  Baseball is my favorite sport, and it is a great source of data for stat-heads, mathematicians, and data geeks.  I’ve always been of the opinion that if I’m going to learn something new, I should make it fun!

Besides, the use of statistical analysis in baseball has exploded.  Baseball analytics is a big deal, ever since Bill James introduced sabermetrics (there is some debate as to whether James has enhanced or ruined baseball).  So what better way to introduce myself to BI concepts?

For starters, I came across some articles (listed below, for my own reference as much as anything else):

I also posted a related question in the SSC forums.  We’ll see what kind of responses (if any) I get to my query.

Let’s start with the basics — what is BI?

Since I’m using baseball to drive this concept, let’s use a baseball example to illustrate this.

Let’s say you’re (NY Yankees manager) Aaron Boone.  You’re down by a run with two outs in the bottom of the 9th.  You have Brett Gardner on first, Aaron Judge at bat, and you’re facing Craig Kimbrel on the mound.

What do you do?  How does BI come into play here?

Let’s talk a little about what BI is.  You have all these statistics available — Judge’s batting average, Kimbrel’s earned run average, Gardner’s stolen base percentage, and so on.  In years BS — “before sabermetrics” — a manager likely would have “gone with his gut,” decided that Judge is your best bet to hit the game-winning home run, and let him swing away.  But is this the best decision to make?

Let’s put this another way.  You have a plethora of data available at your fingertips.  BI represents the ability to analyze all this data and provide information that allows you to make a good decision.

If Aaron Boone (theoretically) had this data available at his fingertips (to my knowledge, Major League Baseball bans the use of electronic devices in the dugout during games), he could use the data to consider Kimbrel’s pitching tendencies, Judge’s career numbers against Kimbrel, and so on.  BI enables Boone to make the best possible decision based upon the information he has at hand.

I do want to make one important distinction.  In the above paragraphs, I used the words data and information.  These two words are not interchangeable.  Data refers to the raw numbers that are generated by the players.  Information refers to the interpretation of that data.  Therein lies the heart of what BI is — it is the process of generating information based upon data.

What’s there to know about BI?

I’ve already mentioned some buzzwords, including OLAP, cubes, and pivot tables.  That’s just scratching the surface.  There’s also KPIs, reporting services, decision support systems, data mining, data warehousing, and a number of others that I haven’t thought of at this point (if you have any suggestions, please feel free to add them in the comments section below).  Other than including the Wikipedia definition links, I won’t delve too deeply into them now, especially when I’m trying to learn about these myself.

So why bother learning about BI?

I have my reasons for learning more about BI.  Among other things…

  • It is a way to keep myself technically relevant.  I’ve written before about how difficult it is to stay up-to-date with technology.  (For further reading regarding this, I highly recommend Eugene Meidinger’s article about keeping up with technology; he also has a related SQL Saturday presentation that I also highly recommend.)  I feel that BI is a subject I’m able to grasp, learn about, and contribute.  By learning about BI, I can continue making myself technically valuable, even as my other technical skills become increasingly obsolete.  Speaking of which…
  • It’s another adjustment.  Again, I’ve written before about making adjustments to keep myself professionally relevant.  If there’s one thing I’ve learned, it’s that if you want to survive professionally, you need to learn to adjust to your environment.
  • It is a subject that interests me.  I’m sure that many of you, as kids, had “imaginary friends.”  (I’ll bet some adults have, too — just look at Lieutenant Kije and Captain Tuttle.)  When I was a kid, I actually had an imaginary baseball team.  I went as far as to create an entire roster full of fictitious ballplayers, even coming up with full batting and pitching statistics for them.  My star player was a power-hitting second baseman who had won MVP awards in both the National and American leagues, winning several batting titles (including a Triple Crown) and leading my imaginary team to three World Series championships.  I figured, if my interest in statistics went that far back, there must be something behind it.  Granted, now that I’ve grown up older, I’m not as passionate about baseball statistics as I was as a kid, but some level of interest still remains, nevertheless.
  • It is a baseline for learning new things.  I’ve seen an increasing number of SQL Saturday presentations related to BI, such as PowerBI, reporting services, and R.  I’m recognizing that these potentially have value for my workplace.  But before I learn more about them, I also need to understand the fundamental baseline that they support.  I feel that I need to learn the “language” of BI before I can learn about the tools that support it.

So, hopefully, this article makes a good introduction (for both you and myself) for talking about BI.  I’ll try to write more as I learn new things.  We’ll see where this journey goes, and I hope you enjoy coming along for the ride.