Instant decisions


(Source: New York Times)

A NY Times recap of a ballgame got me thinking about instant decisions.

I watched this game on a TV at a restaurant where I was having dinner with my wife.  I remember watching Brett Gardner getting thrown out as he was caught in a rundown between third and home.  I remember thinking, “now the man on third is erased.  What were you thinking, Brett?”

As the Times article points out, it ended up being a fateful decision by (Orioles pitcher) Dylan Bundy.  Had he thrown the ball to the shortstop instead of his catcher, he potentially could have turned a double play to get his team out of the inning.  Instead, the Yankees, with an extra life, rallied in the inning to go up by a score of 5-0 (highlighted by a Tyler Wade grand slam).  The Yankees ended up winning, 9-0 (making me, a Yankee fan, happy).

But this article isn’t about the game.  It’s about the instant decision.  In this case, a quick decision ended up affecting the outcome of a ballgame.

Think about all the times in your life when you’ve had to make an instant decision on your feet.  We’ve all had them.  How did they turn out?  Good?  Bad?  Did they end up changing the course of your life, or were they just blips on your lifetime radar screen?

I’m sure there’s some kind of psychology as to how your background — upbringing, education, etc. — might play a role regarding the kinds of split-second decisions you make, but this is a subject about which I know nothing.  Rather, it got me thinking about the idea that quick decisions can have consequences.  In the scheme of things, many of them might not have any effect.  But depending on the time, place, and circumstances, such decision-making could have disastrous consequences — or result in the opportunity of a lifetime.

Advertisements

#BI101: An introduction to BI using baseball

Edit: This is the first of a series of articles (I hope!) in which I’m trying to teach myself about BI.  Any articles I write that are related to this, starting with this one, will be preceded with “#BI101” in the title.

As I stated in a previous article, one topic about which I’m interested in learning more is business intelligence (BI).  For those of you who are new to BI, it is a broad topic.  In a nutshell, it can probably be described as “consuming and interpreting data so it can be used for business decisions and/or applications.”

I’ll admit that I don’t know a lot about BI (at least the fine details, anyway).  I did work a previous job where I touched upon it; I was tasked with performing some data analysis, and I was introduced to concepts such as OLAP cubes and pivot tables.  I’ve gotten better at creating pivot tables — I’ve done a few of them using MS Excel — but I’ll admit that I’m still not completely comfortable with building cubes.  I suppose that’ll come as I delve further into this.

A while back, my friend, Paresh Motiwala, suggested that I submit a presentation for Boston SQL Saturday BI edition.  At the time, I said to him, “the only thing I know about BI is how to spell it!”  He said to me (something like), “hey, you know how to spell SQL, don’t you?”  Looking back at the link, I might have been able to submit (I didn’t realize, at the time, that they were running a professional development track).  That said, Paresh did indeed had a point.  As I often tell people, I am not necessarily a SQL expert — I know enough SQL to be dangerous — nevertheless, that does not stop me from applying to speak at SQL Saturday.  Likewise, as I dive further into this topic, I’m finding that I probably know more about BI than I’ve led myself to believe.  Still, there is always room for improvement.

To tackle this endeavor, once again, I decided to jump into this using a subject that I enjoy profusely: baseball.  Baseball is my favorite sport, and it is a great source of data for stat-heads, mathematicians, and data geeks.  I’ve always been of the opinion that if I’m going to learn something new, I should make it fun!

Besides, the use of statistical analysis in baseball has exploded.  Baseball analytics is a big deal, ever since Bill James introduced sabermetrics (there is some debate as to whether James has enhanced or ruined baseball).  So what better way to introduce myself to BI concepts?

For starters, I came across some articles (listed below, for my own reference as much as anything else):

I also posted a related question in the SSC forums.  We’ll see what kind of responses (if any) I get to my query.

Let’s start with the basics — what is BI?

Since I’m using baseball to drive this concept, let’s use a baseball example to illustrate this.

Let’s say you’re (NY Yankees manager) Aaron Boone.  You’re down by a run with two outs in the bottom of the 9th.  You have Brett Gardner on first, Aaron Judge at bat, and you’re facing Craig Kimbrel on the mound.

What do you do?  How does BI come into play here?

Let’s talk a little about what BI is.  You have all these statistics available — Judge’s batting average, Kimbrel’s earned run average, Gardner’s stolen base percentage, and so on.  In years BS — “before sabermetrics” — a manager likely would have “gone with his gut,” decided that Judge is your best bet to hit the game-winning home run, and let him swing away.  But is this the best decision to make?

Let’s put this another way.  You have a plethora of data available at your fingertips.  BI represents the ability to analyze all this data and provide information that allows you to make a good decision.

If Aaron Boone (theoretically) had this data available at his fingertips (to my knowledge, Major League Baseball bans the use of electronic devices in the dugout during games), he could use the data to consider Kimbrel’s pitching tendencies, Judge’s career numbers against Kimbrel, and so on.  BI enables Boone to make the best possible decision based upon the information he has at hand.

I do want to make one important distinction.  In the above paragraphs, I used the words data and information.  These two words are not interchangeable.  Data refers to the raw numbers that are generated by the players.  Information refers to the interpretation of that data.  Therein lies the heart of what BI is — it is the process of generating information based upon data.

What’s there to know about BI?

I’ve already mentioned some buzzwords, including OLAP, cubes, and pivot tables.  That’s just scratching the surface.  There’s also KPIs, reporting services, decision support systems, data mining, data warehousing, and a number of others that I haven’t thought of at this point (if you have any suggestions, please feel free to add them in the comments section below).  Other than including the Wikipedia definition links, I won’t delve too deeply into them now, especially when I’m trying to learn about these myself.

So why bother learning about BI?

I have my reasons for learning more about BI.  Among other things…

  • It is a way to keep myself technically relevant.  I’ve written before about how difficult it is to stay up-to-date with technology.  (For further reading regarding this, I highly recommend Eugene Meidinger’s article about keeping up with technology; he also has a related SQL Saturday presentation that I also highly recommend.)  I feel that BI is a subject I’m able to grasp, learn about, and contribute.  By learning about BI, I can continue making myself technically valuable, even as my other technical skills become increasingly obsolete.  Speaking of which…
  • It’s another adjustment.  Again, I’ve written before about making adjustments to keep myself professionally relevant.  If there’s one thing I’ve learned, it’s that if you want to survive professionally, you need to learn to adjust to your environment.
  • It is a subject that interests me.  I’m sure that many of you, as kids, had “imaginary friends.”  (I’ll bet some adults have, too — just look at Lieutenant Kije and Captain Tuttle.)  When I was a kid, I actually had an imaginary baseball team.  I went as far as to create an entire roster full of fictitious ballplayers, even coming up with full batting and pitching statistics for them.  My star player was a power-hitting second baseman who had won MVP awards in both the National and American leagues, winning several batting titles (including a Triple Crown) and leading my imaginary team to three World Series championships.  I figured, if my interest in statistics went that far back, there must be something behind it.  Granted, now that I’ve grown up older, I’m not as passionate about baseball statistics as I was as a kid, but some level of interest still remains, nevertheless.
  • It is a baseline for learning new things.  I’ve seen an increasing number of SQL Saturday presentations related to BI, such as PowerBI, reporting services, and R.  I’m recognizing that these potentially have value for my workplace.  But before I learn more about them, I also need to understand the fundamental baseline that they support.  I feel that I need to learn the “language” of BI before I can learn about the tools that support it.

So, hopefully, this article makes a good introduction (for both you and myself) for talking about BI.  I’ll try to write more as I learn new things.  We’ll see where this journey goes, and I hope you enjoy coming along for the ride.

Installing a Baseball Database

Earlier, I wrote about how I installed SQL Server 2016 on my machine.  In this article, I talk about creating and populating a baseball database.

I am a huge baseball fan.  I started following the New York Yankees when I was about 12.  (In an act that is likely anathema to religious fanatics, I married a Boston Red Sox fan; during baseball season, one of us — metaphorically speaking — ends up sleeping on the couch!)  I have been known to schedule vacations around Major League Baseball schedules.  I believe that Cooperstown is Mecca.

Why, you might ask, is this relevant to installing SQL Server?  Because I believe that one of the best ways to learn something is to have fun while you’re doing it.  So in this implementation, instead of installing NorthWind Traders or AdventureWorks (the standard practice databases used by SQL Server enthusiasts everywhere), I will instead install a copy of Sean Lahman’s baseball database.  (At the time of this article, the most recent version goes to and includes the 2015 baseball season.)

I should note that this isn’t to say NorthWind or AdventureWorks isn’t fun; rather, baseball is something about which I’m passionate, and is more likely to keep my attention.  At some point, I’ll likely install NorthWind or AdventureWorks as well, as many SQL references and guides refer to them.

I downloaded the most recent SQL version from Sean Lahman’s website, which left me with a .ZIP file.  Inside the file, I found a MSSQLMASTER.SQL file (you’ll find it in \mysql core\core\) that I extracted from the .ZIP file.  I opened the file with SSMS.

The first thing I noticed was that it created a database called “stats.”  I did not want to use this as my database name, so I first replaced all instances of “stats’ with “baseball”.  Once this was done, I ran the database build query.

Unfortunately, it would not be that easy.  The query file is very large.  When I ran the query, I received the following message.

SSMSOutOfMemory

Okay.  It has a lot of rows to insert (the data includes over a hundred years of statistics, after all).  Instead, I opted for just creating the database and tables.  However, even that proved to be problematic.

SQLBaseballBuildErrors

“Okay,” I said to myself, “what’s going on now?”

First, I had a permission issue to deal with (this doesn’t appear in the above log; I neglected to include that message).  I looked for the C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\DATA folder.  When I tried to open it, I got a message saying that I had insufficient permissions to open it.  “What?” I said, “this is my own machine, and I’m the administrator!”  However, it also prompted me asking whether I wanted to open it.  I answered yes.  Once I did, I tried running the query again, and the permission error seemed to disappear.

However, that’s when I got the errors shown above.

That’s when I noticed that the filename references MSSQL11.  My database path includes MSSQL13.  Okay.  I changed the path in the filename from MSSQL11 to MSSQL13 and reran the query.  This time, it ran successfully.

I did notice one thing about the table infrastructure.  All columns were defined as NULL, which indicated to me that there were no unique indices or primary keys defined.  All tables would be heaps.  That’s something that I’d need to fix, but that was likely another project for another time.  For now, I was more concerned about building the database infrastructure and inserting the data.

Once my database and tables were built, I commented out the CREATE DATABASE and CREATE TABLE lines from the code, leaving only the INSERT commands.  Just for grins, I tried running the query again, and once again got the “insufficient memory” message.  Okay, no dice.  How was I going to get around this?  I suppose I could’ve selected and run the INSERT queries piecemeal, but that was going to be a long and painful process.

I ran a Google search, and came across my old friend, StackOverflow, where I found this entry.  So I tried the command line, substituting my database name and SQL filename (including the path).

sqlcmd -d baseball -i Downloads\mssqlmaster.sql

Here’s what I got:

sqlcmd: Error: Connection failure. SQL Native Client is not 
installed correctly. To correct this, run SQL Server Setup.

Interesting.  I went to look up SQL Server 2016 Native Client, and found this.  So SQL Server 2016 does not include Native Client.  So, I tried to download and install Native Client.

Of course, nothing ever goes as planned.  I got this message.

NativeClientErrorMsg

So, I went into my Settings to check my Native Client.  It told me that 2012 Native Client was already installed.

After Google-searching my error messages, I realized that I had SQL Server 2008 Express installed, and my SQLCMD was likely getting confused between versions.  That was likely the culprit.  So I went ahead and uninstalled SQL Server 2008, and tried my SQLCMD command line again.  This time, the SQLCMD did not give me any errors, and my command prompt started scrolling with the familiar “(1 rows affected)” message that appears when SQL Server inserts rows into tables.  The script finished without any problem.

Once the script finished, I ran a query to see if my data had made it into the tables.

BattingQuery

So it appears that we are good to go!  Our baseball database is populated!

I now have a baseball database with which I can play to my heart’s content.  I’ll be using this to practice my SQL skills — and ‘blog about my adventures (or misadventures, as the case may be) as I do so.