Being the baseball nut that I am, of course, I had to import baseball statistics, so I decided to reimport the most recent data from Sean Lahman’s baseball database. The last time I did this exercise, I downloaded a database format. I don’t remember what format I used (the links all say “Access” — which I don’t remember downloading), but the files I used had an .sql extension. This time, I used the comma-delimited version, which downloaded a zip file containing files with a .csv extension.
I wanted to import the files directly into my database and have them create the tables upon doing so, so I opened up my SSMS, created a new Baseball database, and looked into how to do this. After poking around a bit (and a little bit of Googling), I found that flat files could be imported by right-clicking the database of your choice (in the below example, “Baseball”), clicking Tasks, and selecting Import Flat File.
Selecting this opened an Import Flat File wizard. First, it prompted me to select the input file. (Note: if you are importing multiple files, as I did for this little exercise, the wizard is smart enough to remember your last folder when you click Browse.)
When it looks at the flat file, it gives you a preview of the data that you’re importing. Since, for this exercise, I’m importing comma-delimited flat files, it was able to put my data into nice, neat columns.
Clicking “Next” brought me to a screen where I could modify the columns. I like this option a lot, as it gives me an opportunity to set up my data schema the way I want. If you’re a SQL or database newbie, I strongly suggest that you learn about primary keys and data types and take the time to set them up at this point.
In this particular example, I set my yearID to char(4), stint to int (I will likely change this to tinyint), teamID to char(3), lgID to char(2), and pretty much everything after lgID to int. I also set my first five columns as a composite primary key and everything else to be nullable.
I must have set these columns up successfully, because when I ran it, it did so without complaining.
I wish I could say that I imported all of my flat files without a hitch, but I did run into a few that didn’t run successfully the first time. Here are some of the issues that I came across.
I had opened a file in Excel to check data types, forgotten to close it, and the import complained that it couldn’t work because the file was still open.
I miscalculated a few field sizes, and came across messages saying that my column sizes were too short (for example, setting nvarchar(10) for a column that included data with 15 characters).
There were a few cases where I simply had the wrong data type.
My Pitching table included a column for ERA, which I was surprised to see. Reason: ERA (Earned Run Average, for those of you who are baseball-challenged) is a calculated statistic, like batting average. However, batting average was not included in the Batting table. So, I set the column data type to float. However, when I tried to import it, it failed. When I looked at the data, I found entries under ERA that said “inf” (for “infinity”)*. In this case, I did some data cleansing. I got rid of these entries and saved the flat file. It then imported with no problem
(*Some of you might be wondering, how do you get an ERA of infinity? Answer: you give up runs without getting anyone out! Mathematically, you would get a divide-by-zero error for calculating ERA, but in baseball parlance, it means you give up runs and can’t get anyone out!)
So hopefully at this point, you now have an idea as to how to import flat files into a SQL Server database (and maybe even got a small taste of data types and primary keys). And hopefully, this little utility saves you a lot of grief when trying to import flat files.
I enjoy attending sporting events. My previous post got me thinking about the sports venues that I’ve visited, and I thought it’d be fun to compile that list!
A few caveats: I only list venues (along with their home teams and/or events) in which I’ve actually seen a game. For example, I’ve set foot in Michigan Stadium in Ann Arbor, but I didn’t actually see a game there, so it’s not on my list.
I don’t list opposing teams. I’ve been to so many events that I don’t remember them all. Also, for “home” arenas in which I’ve seen large numbers of games, they’d be too many to list, anyway.
I also denote any arenas that are homes to “my teams.” While I live two hours away from Syracuse, I still consider the Carrier Dome as my “home” arena. Geographically, Siena and UAlbany are only minutes away from me, and I do root for the home team in those arenas, but they’re not necessarily “my” teams or home arenas.
I only consider organized professional (major or minor league) and NCAA (any division) teams or events. Organized non-professional or collegiate events (e.g. Little League World Series, Olympic games, etc.) count too, although I’ve never been to one. The pickup game of touch football in the public park doesn’t count.
These are listed in no particular order, although I try to list my “home” arenas, places I’ve visited more often, and places geographically close to me first.
I mark arenas that either no longer exist or are no longer used for that sport with an asterisk (*).
All games are regular season games, unless denoted.
I have never been to an NBA, NHL, or major soccer game, which is why you don’t see them listed.
So without further ado, here’s that list.
Arenas I’ve visited
Yankee Stadium (new), Bronx, NY — NY Yankees (my home arena), ALDS
Yankee Stadium* (old), Bronx, NY — NY Yankees (former home arena)
Joseph Bruno Stadium, Troy, NY — Tri-City ValleyCats (another home arena), NCAA Div-III tournament regional
Heritage Park*, Colonie, NY — Albany-Colonie Yankees (former home arena), Albany-Colonie Diamond Dogs
Robison Field, Troy, NY — RPI Engineers (my home field)
Fenway Park, Boston, MA — Boston Red Sox
Shea Stadium*, Queens, NY — NY Mets
Citi Field, Queens, NY — NY Mets
Kingdome*, Seattle, WA — Seattle Mariners
Safeco Field (now T-Mobile Park), Seattle WA — Seattle Mariners
Camden Yards, Baltimore, MD — Baltimore Orioles, All-Star Game
SkyDome (now Rogers Centre), Toronto, ON — Toronto Blue Jays
MacArthur Stadium*, Syracuse, NY — Syracuse Chiefs
Alliance Bank Stadium (now NBT Stadium), Syracuse, NY — Syracuse Chiefs
Olympic Stadium*, Montreal, PQ — Montreal Expos
Veterans Stadium*, Philadelphia, PA — Philadelphia Phillies
Tiger Stadium*, Detroit, MI — Detroit Tigers
Coors Field, Denver, CO — Colorado Rockies
Tropicana Field, St. Petersburg, FL — Tampa Bay Rays
Damaschke Field*, Oneonta, NY — Oneonta Yankees
East Field*, Glens Falls, NY — Glens Falls Redbirds, Adirondack Lumberjacks
Stade Canac, Quebec City, PQ — Quebec Capitales
Dwyer Stadium, Batavia, NY — Batavia Trojans
Silver Stadium*, Rochester, NY — Rochester Red Wings
Places where I’ve never seen a game, but are on my wish list: Wrigley Field, Chicago; Dodger Stadium, Los Angeles; Oracle Park, San Francisco; Kaufmann Stadium, Kansas City; Petco Park, San Diego; Nationals Field, Washington DC; PNC Park, Pittsburgh; any Nippon Professional League game in Japan
Carrier Dome, Syracuse, NY — Syracuse Orange (my home arena)
ECAV Stadium, Troy, NY — RPI Engineers (my other home arena)
’86 Field*, Troy, NY — RPI Engineers (another home “arena”)
Bob Ford Field, Albany, NY — UAlbany Great Danes
Alumni Stadium, Chestnut Hill, MA — Boston College Eagles
Navy-Marine Corps Memorial Stadium, Annapolis, MD — Navy Midshipmen
Michie Stadium, West Point, NY — Army Black Knights
Veterans Stadium*, Philadelphia, PA — Temple Owls
Yale Bowl, New Haven, CT — Yale Bulldogs
Met Life Stadium, East Rutherford, NJ — Syracuse Orange (NOT my home arena!)
Giants Stadium*, East Rutherford, NJ — Syracuse Orange (also not my home arena!)
Ohio Stadium, Columbus, OH — Ohio State Buckeyes
Louisiana Superdome, New Orleans, LA — Sugar Bowl
Pontiac Silverdome*, Pontiac, MI — Cherry Bowl
Tampa Stadium*, Tampa, FL — Hall of Fame Bowl
Sun Devil Stadium, Tempe, AZ — Fiesta Bowl
Yankee Stadium, Bronx, NY — Pinstripe Bowl
Camping World Stadium, Orlando, FL — Camping World Bowl
Places where I’ve never seen a game, but are on my wish list: Harvard Stadium, Harvard; Memorial Stadium, Clemson; Beaver Stadium, Penn State; Rose Bowl, UCLA; Michigan Stadium, Michigan; Notre Dame Stadium, Notre Dame
Carrier Dome, Syracuse, NY — Syracuse Orange (my home arena), NCAA tournament
Manley Field House*, Syracuse, NY — Syracuse Orange (women)
RPI Armory*, Troy, NY — RPI Engineers (my other home arena)
Times-Union Center, Albany, NY — Siena Saints, MAAC tournament
Alumni Recreation Center*, Loudonville, NY — Siena Saints
SEFCU Arena, Albany, NY — UAlbany Great Danes, America East tournament
Pittsburgh Civic Arena*, Pittsburgh, PA — Pitt Panthers
Lundholm Gymnasium, Durham, NH — UNH Wildcats
Case Gym, Boston, MA — Boston University Terriers
Hubert H. Humphrey Metrodome*, Minneapolis, MN — NCAA tournament
Reunion Arena*, Dallas, TX — NCAA tournament
Madison Square Garden, New York, NY — St. John’s Red Storm, Big East Tournament, NIT Preseason Tournament
Barclays Arena, Brooklyn, NY — preseason tournament
Places where I’ve never seen a game, but are on my wish list: The Palestra, Penn; Allen Field House, Kansas; Pauley Pavilion, UCLA; Cameron Indoor Stadium, Duke
RPI has a new arena: ECAV (East Campus Athletic Village) Arena. I have yet to see a game there.
Houston Field House, Troy, NY — RPI Engineers (my home arena)
Messa Rink, Schenectady, NY — Union Dutchmen
Times-Union Center, Albany, NY — Mayor’s Cup/Capital Skate Classic, NCAA tournament
Glens Falls Civic Center*, Glens Falls, NY — Mayor’s Cup/Capital Skate Classic
Lynah Rink, Ithaca, NY — Cornell Big Red
Starr Rink, Hamilton, NY — Colgate Raiders
Tate Rink, West Point, NY — Army Black Knights
Bright Hockey Center, Cambridge, MA — Harvard Crimson
Yale Ice Arena, New Haven, CT — Yale Bulldogs
Thompson Arena, Hanover, NH — Dartmouth Big Green
Olympic Ice Arena, Lake Placid, NY — ECAC tournament
Walter Brown Arena*, Boston, MA — Boston University Terriers
Cumberland County Civic Center (now Cross Insurance Arena), Portland, ME — Maine Black Bears
Hartford Civic Center (now XL Center), Hartford, CT — I don’t remember the event, but it was four teams: RPI, Maine, Colgate, and I don’t remember who the fourth team was.
Madison Square Garden, New York, NY — Rivalry On Ice (Yale vs. Harvard)
Places where I’ve never seen a game, but are on my wish list: Alfond Arena, Maine; Hobey Baker Rink, Princeton; Matthews Arena, Northeastern
Times-Union Center*, Albany, NY — Albany River Rats, Albany Devils
Giants Stadium*, East Rutherford, NJ — NY Giants (my home arena)
Rich Stadium (now New Era Field), Orchard Park, NY — Buffalo Bills
Sullivan Stadium*, Foxborough, MA — New England Patriots
Veterans Stadium*, Philadelphia, PA — Philadelphia Eagles
Although I’ve been to Met Life Stadium, it was for a Syracuse game. I have yet to see the Giants there.
Landsdowne Stadium*, Ottawa, ON — Ottawa Roughriders
Times-Union Center*, Albany, NY — Albany Firebirds
Wow, I’ve attended a lot of sporting events!
Anyway, this was a fun exercise, and a neat list to put together. I’m hoping to add to it!
Being the trip planner that I am, I mapped out my plans for this trip a while back. Plans for this trip have actually been in the works for months.
Planning began back in May, when I submitted my presentations. For planning purposes, whenever I submit presentations to any event, I assume that I’ll be selected to speak, even before I find out whether or not my submissions are accepted. As soon as I submit, my plans for whatever event I apply are pretty well written into my calendar, unless either (1) I end up not getting chosen for the event, or (2) some conflict that I can’t get out of comes up for the same date.
Ordinarily, I don’t firm up my travel plans until I know for sure that I’m selected to speak, but this time around, there were a couple of twists. First of all, I saw Thomas Grohser, one of the event’s organizers, at SQL Saturday in Albany in July. He told me that I was going to be speaking in NYC. Granted, Thomas is a friend, but nevertheless, it was still not an official selection. I wanted to make sure that I had the official selection email before I started booking my train and my hotel room.
In early August — still before I received the official acceptance notification — I got an email from Amtrak (I’m a Guest Rewards member) that included fare specials. I discovered a round-trip fare from Albany to Penn Station that was too good to pass up. Unfortunately, the deal had an expiration date, so I had to act fast. I decided to pull the trigger on it. Okay. I had a train reservation. Now I was committed to the trip, regardless of whether I was chosen to speak or not. It wasn’t a big deal; I regularly attend SQL Saturday in New York, regardless of whether or not I’m speaking.
I selected an early afternoon train to New York. I wanted to leave myself time to make the speaker’s dinner, if they had one. As it turned out, that would not be the case, as I’ll explain later on.
Now that my train was reserved, I needed to find a place to stay. My two siblings both have places down in The City, and my sister has repeatedly told me that I can use her place in Brooklyn. While I’m appreciative of the offer, I also wanted to stay someplace closer to the Microsoft office in Manhattan, preferably within walking distance, where SQL Saturday takes place. Of course, as anyone who has traveled to New York City can attest, inexpensive places to stay in midtown Manhattan are nearly non-existent. It also didn’t help that the office was located near one of the world’s biggest tourist traps. (I usually try avoiding it, but that was impossible for this trip.) I checked a variety of places, including a few on AirBnB and a few places that were farther away but near subway lines. I found a few places that had potential, but kept looking.
I hit the jackpot when I tried Hotwire. They advertised a deal where I could stay at an (unnamed) midtown hotel for $109. It promised that I would be booked at one of three hotels, which they listed. The actual hotel would be revealed after I booked. I looked at their locations, decided I could live with them, and decided to take the chance. I ended up getting booked at the Sheraton New York Times Square. The final damage was $173 after taxes and fees — granted, more than the advertised $109, but still a steal for a Sheraton in midtown Manhattan near Times Square!
At some point — I’m not quite sure when — I looked at my own speaker’s profile, and noticed that three of my submissions were now listed as “Regular Session,” not “Submitted Regular Session.” This is usually a pretty good indication that I’ve been selected to speak, although it still isn’t official yet. I was surprised, however, that three of them were listed. I figured, either (1) it was a mistake, (2) they were still working on the schedule, or (3) I was going to be one very busy boy on October 5!
In August, I got an email from Thomas Grohser. It was no mistake. Indeed, I had been selected to give three presentations! Thomas asked me, “let me know if this is too much or not.”
I sent him back a two word reply: “challenge accepted!”
So things were in place. Travel plans were set, and I was definitely speaking. I went about my business, awaiting the first weekend in October to arrive.
A funny thing happened along the way. I’m a big Yankee fan. The Yankees ended up winning the American League Eastern Division. At some point, I looked at the dates for the Yankees’ first two playoff games: October 4 and 5 in New York.
Hey, I was going to be in New York on October 4 and 5!
I looked into getting tickets for ALDS Game 1. They definitely weren’t cheap, but they weren’t so expensive that they would break the bank, either. The only thing that made me hesitate was that no game time was announced. If it was an early afternoon game, there was no way that I’d be able to make it. When they announced that it was a 7 pm game time, I pulled the trigger and bought myself a ticket! I’ve been going to ballgames for years, but I’ve never been to a playoff game before, and attending a postseason game has been on my bucket list for a long time. A weekend that was already going to be fun had just become more exciting!
At this point, all the plans were set. I only had to wait for October 4 to arrive.
Friday, October 4 arrived. My wife dropped me off at Albany-Rensselaer train station around 12:30. Other than the fact that my train, which was supposed to depart at 1:05, was about twenty minutes late, the train ride to Penn Station was uneventful. I arrived in New York around 4:00.
I took the E subway to my hotel. Upon exiting the subway, I had my first (pleasant) surprise of the trip. While I was at the street level, looking for my hotel, someone said hi to me. I was surprised to see that it was Michelle Gutzait, one of the SQL Saturday speakers, and her boyfriend! We spoke briefly. She was speaking at our user group in November, and said she was looking forward to speaking. They were looking for a theater for a show they were seeing that night, while I was looking for my hotel.
Randomly bumping into Michelle on the street turned out to be the first of numerous surprises on this trip.
I found my hotel, dropped off my bags, and proceeded up to the Bronx.
Now, I’ve been a baseball fan since I was around 12 or 13. I grew up rooting for the Yankees. I’ve attended numerous regular season games, more than I can remember. However, despite all those years going to regular season ballgames, I have never been to a postseason playoff game. It’s something that’s been on my bucket list for quite some time. When I saw that the Yankees’ first two playoff games were at home at the same time I was in the City for this trip, I jumped on the opportunity and bought myself a ticket for Friday night.
Friends told me that it was a different atmosphere from a regular season game, and it did not disappoint. The atmosphere was electric, and the crowd was loud — much more than a regular season game. Fans hung on to nearly every pitch during the first seven innings. By the time the seventh inning rolled around, the Yankees had scored ten runs and held nearly an insurmountable lead. I stuck it out until the end of the game and hopped the subway back to my hotel. I did stop to get a couple of slices of pizza on my way back (I can’t pass up genuine New York-style pizza!). It was well after midnight by the time I got back to my room, and around 1 am by the time I went to bed.
My alarm went off at 6. After hitting my snooze button a couple of times, I got up around 6:20. I rolled out of bed, showered, dressed, checked out of the hotel, and proceeded to Ellen’s Stardust Diner for breakfast.
This was the second time that I had gone to breakfast at Stardust; the first was when I spoke at NYC SQL Saturday last year. Now, I’ll say that the food at Stardust is good, but not great. If I picked a place to eat based on the food alone, Stardust would not be my first choice. However, I love Ellen’s Stardust Diner. It isn’t about the food; it’s about the experience. Stardust is known for their singing wait staff, and they put on a good show!
Amusing note: my waiter was named Kansas. Kansas is my favorite band! I told him as much, and he told me he was so named because they were also his parents’ favorite band! I hoped that he (or someone else) could sing a Kansas song before I finished my breakfast, but it wasn’t to be.
I could’ve sat there all morning and listened to the wait staff sing (and I told Kansas this), but alas, my first presentation was at 9:00. I wanted to get to Microsoft as soon as I could so I could prepare. Upon finishing my breakfast, I proceeded to the Microsoft building and SQL Saturday.
I wrote earlier about my presentations, so I won’t rehash them here. I will say that the combination of doing three presentations, combined with waking up at 6 am after having gone to bed at 1 am made for a long and tiring day! After lunch, for the sake of my own sanity, I decided not to attend any more sessions until I presented my own. There were some couches outside the speaker’s room, so I attempted to take a power nap — a plan that was thwarted by a security guard who kicked me awake (literally — he kicked the couch I was on) and told me, “you can’t do that here.” Sheesh.
At one point during the day, Matt hilariously sent this tweet. I got a good laugh out of this!
My trip of fun surprises continued at the end of the day during the conference closing session and raffle drawings. I was sitting in the front row. James Phillips, one of the co-organizers, was running the raffle. Since I was in the front row, he had me pick one of the winners. I stuck my hand in the bowl with the tickets, mixed them up, pulled one out, and gave it to James.
Mind you, I did not look at the ticket. Upon seeing the ticket, James shook his head and said, “I don’t believe it.”
He showed me the ticket. It had my name on it. I had pulled my own ticket! I’d won a Bluetooth speaker!
After SQL Saturday was over, I proceeded to 32nd Street, where Koreatown is located. It’s one of my favorite neighborhoods in Manhattan. As a Korean-American, I feel somewhat obligated to visit this place now and then, but as one who was born in New York State, I also feel at home when I come to this place to visit. I picked out a Korean BBQ place — one where I’d never been before — and had myself an excellent meal.
While I was waiting to be seated, a gentleman who had seen my shirt came up to me and introduced himself as a fellow Syracuse University alum. Yet another example where my clothing became a conversation piece! We spent about ten minutes talking about our alma mater before we were finally seated.
I had purposely scheduled a late train back home so that I could enjoy dinner while I was in Manhattan. After dinner, I walked the block west to Penn Station so I could catch my train.
Upon boarding the train and finding myself a seat, I heard a familiar voice say, “boy, they’ll let anybody on this train!” I turned around and saw Greg Moore sitting a couple of seats back. Yet another surprise on this trip!
Although Greg is very active in the SQL Server community, he did not attend SQL Saturday. Instead, he attended ComicCon with his daughter. (Greg wrote a nice ‘blog article about their ComicCon experience; you can read it here.) I moved back to sit across from them, but we didn’t converse much (if at all) during the ride; we were all pretty tired, and we planned to sleep on the train ride home. No matter; I see Greg often enough, anyway. (I’ll see him next week at our next user group meeting.)
I didn’t sleep well on the train; no matter how much I tried, I couldn’t get comfortable. My wife picked me up at the station, and I arrived home sometime after midnight.
Despite getting very little sleep, I had an absolute blast on this trip!
Mind you, I always have fun every time I go to a SQL Saturday, but I especially have a blast whenever I travel to New York City. It was an opportunity to get together with #SQLFamily, it was an opportunity to network, I got to practice my presentation skills (again), and as an added bonus, I got to attend a postseason baseball game! I absolutely love taking this trip, and I hope to do this again for NYC SQL Saturday again next year!
This is my last scheduled SQL Saturday for 2019. I don’t have any more SQL Saturdays lined up — I applied to speak at Boston BI SQL Saturday, but I will likely withdraw because of a conflict. There are “save-the-dates” listed for Rochester, Philadelphia, and Boston (non-BI) set for next year, and I intend to apply for them once they go live. (I might also apply to Virginia Beach as well; we’ll see.) And, of course, our Albany group usually has our SQL Saturday at the end of July.
I also noticed something else that was going on in The City while I was there — and I decided to jump on the opportunity!
Did I mention that I’m a big Yankee fan? As it turns out, Friday and Saturday also happens to be Games 1 and 2 of the American League Division Series! It’ll be the Minnesota Twins vs. the New York Yankees those days. And lo and behold, I’ll actually be in The City on those days! Can we say, opportunity knocks?
The only issue was the game time. My train arrives at Penn Station at 3:45. If it was an early afternoon game, there was no way I’d be able to go. When I found out the game was at 7 pm, I went online and splurged on a ticket for Game 1.
I’ve never attended a postseason baseball game before. It’s been on my bucket list for a long time. Granted, I’d prefer that it was a World Series game. But it was a situation where opportunity was knocking. The first two playoff games are in New York, and I’ll be in town when they happen!
So I will be in attendance at Yankee Stadium on Friday evening!
As I’ve written before, every now and then, you need to say, what the heck! Your professional life is important, but so is taking the time to stop and smell the roses (or, for me, to catch a ballgame). Opportunities don’t come around very often. And if one comes around, and you have the wherewithal to make it happen, then jump on it. Make it happen, and enjoy yourself!
The article was actually a project for a Writing for Publication class that I took in grad school. It was later republished as a feature article in a baseball preview issue published by The Spotlight News.
However, when I looked at the Wikipedia reference link, I realized that the link was an old one that I’d forgotten about, and didn’t know was still there! I figured I should give the article a new home. So I took my article and created a new page for it. You can find the new article page here!
The article is a neat history piece that dates back to a period around the Industrial Revolution. If you’re a baseball enthusiast (like I am), I hope you enjoy it!
During my lunch break, I was perusing the ESPN website and stumbled across this article. It contemplates whether or not a .300 hitter (in baseball, for those of you who are sports-challenged) is meaningful anymore. As a baseball fan, the article caught my attention. I didn’t read through the entire article (it ended up being a much longer read than I expected — too long for me to read while on a lunch break at work), but from what little I did glean from it, a couple of things struck me.
First, they talk about Mickey Mantle‘s batting average and how important hitting .300 was to him. That struck me a little funny, because (as far as I know — as I said, I didn’t get through the entire article) there was no mention of the fact that he actually finished with a batting average under .300. His career batting average was .298.
The second thing that struck me was (Yankees’ first baseman) Luke Voit saying how he felt that “feel like batting average isn’t a thing now.” Indeed, baseball is a much different game than it was, say ten, twenty, or thirty years ago. Analytics are a big part of statistics these days. A lot of stats that are prevalent now — WAR (wins above replacement), exit velocity, OPS (on-base plus slugging), etc. — didn’t even exist when I was a kid growing up, closely following my Yankees. Back when I was eating and sleeping baseball, hitting was about the triple-crown statistics — batting average, home runs, and runs batted in (RBIs). But now, we have “slash lines,” on-base percentage, slugging percentage, and so on. Even as big of a baseball fan as I am, I haven’t a clue about many of these “new age” stats. I still have no idea what WAR represents, I’m not completely sure as to what the numbers in a slash-line are, and I don’t know what constitutes a respectable OPS.
That got me thinking about how statistics have changed over the years, and whether or not that applies to statistics outside of baseball (or sports, for that matter). Maybe people who study data analytics for a living might know this better than I do, but what business statistics have a different meaning now than they did ten, twenty years ago? Are there any numbers from way back when that I should now take with a grain of salt?
I’m sure there are many examples of this outside of sports, but I struggled to come up with any. Off the top of my head, I remember how a company where I once worked made a big deal out of perfect attendance — to the point that they gave out perfect attendance awards at the end of the year. However, that had to contend with situations such as coming to work when you were sick, and so on. Do you really want someone who’s sick coming into work? These days, workplaces do not want sick people in the office, and with the advent of work-at-home provisions, perfect attendance isn’t so meaningful, anymore. (By the way, my understanding is that company no longer recognizes or rewards “perfect” attendance.)
So I suppose the takeaway is, how well do statistics age? Can they be compared with the same statistics now? What needs to be considered when analyzing statistics from years ago? It’s true that numbers often tell a story, but in order to get the full picture, you also need to understand the full context.
I watched this game on a TV at a restaurant where I was having dinner with my wife. I remember watching Brett Gardner getting thrown out as he was caught in a rundown between third and home. I remember thinking, “now the man on third is erased. What were you thinking, Brett?”
As the Times article points out, it ended up being a fateful decision by (Orioles pitcher) Dylan Bundy. Had he thrown the ball to the shortstop instead of his catcher, he potentially could have turned a double play to get his team out of the inning. Instead, the Yankees, with an extra life, rallied in the inning to go up by a score of 5-0 (highlighted by a Tyler Wade grand slam). The Yankees ended up winning, 9-0 (making me, a Yankee fan, happy).
But this article isn’t about the game. It’s about the instant decision. In this case, a quick decision ended up affecting the outcome of a ballgame.
Think about all the times in your life when you’ve had to make an instant decision on your feet. We’ve all had them. How did they turn out? Good? Bad? Did they end up changing the course of your life, or were they just blips on your lifetime radar screen?
I’m sure there’s some kind of psychology as to how your background — upbringing, education, etc. — might play a role regarding the kinds of split-second decisions you make, but this is a subject about which I know nothing. Rather, it got me thinking about the idea that quick decisions can have consequences. In the scheme of things, many of them might not have any effect. But depending on the time, place, and circumstances, such decision-making could have disastrous consequences — or result in the opportunity of a lifetime.
Edit: This is the first of a series of articles (I hope!) in which I’m trying to teach myself about BI. Any articles I write that are related to this, starting with this one, will be preceded with “#BI101” in the title.
As I stated in a previous article, one topic about which I’m interested in learning more is business intelligence (BI). For those of you who are new to BI, it is a broad topic. In a nutshell, it can probably be described as “consuming and interpreting data so it can be used for business decisions and/or applications.”
I’ll admit that I don’t know a lot about BI (at least the fine details, anyway). I did work a previous job where I touched upon it; I was tasked with performing some data analysis, and I was introduced to concepts such as OLAP cubes and pivot tables. I’ve gotten better at creating pivot tables — I’ve done a few of them using MS Excel — but I’ll admit that I’m still not completely comfortable with building cubes. I suppose that’ll come as I delve further into this.
A while back, my friend, Paresh Motiwala, suggested that I submit a presentation for Boston SQL Saturday BI edition. At the time, I said to him, “the only thing I know about BI is how to spell it!” He said to me (something like), “hey, you know how to spell SQL, don’t you?” Looking back at the link, I might have been able to submit (I didn’t realize, at the time, that they were running a professional development track). That said, Paresh did indeed had a point. As I often tell people, I am not necessarily a SQL expert — I know enough SQL to be dangerous — nevertheless, that does not stop me from applying to speak at SQL Saturday. Likewise, as I dive further into this topic, I’m finding that I probably know more about BI than I’ve led myself to believe. Still, there is always room for improvement.
To tackle this endeavor, once again, I decided to jump into this using a subject that I enjoy profusely: baseball. Baseball is my favorite sport, and it is a great source of data for stat-heads, mathematicians, and data geeks. I’ve always been of the opinion that if I’m going to learn something new, I should make it fun!
Besides, the use of statistical analysis in baseball has exploded. Baseball analytics is a big deal, ever since Bill James introduced sabermetrics (there is some debate as to whether James has enhanced or ruined baseball). So what better way to introduce myself to BI concepts?
For starters, I came across some articles (listed below, for my own reference as much as anything else):
Since I’m using baseball to drive this concept, let’s use a baseball example to illustrate this.
Let’s say you’re (NY Yankees manager) Aaron Boone. You’re down by a run with two outs in the bottom of the 9th. You have Brett Gardner on first, Aaron Judge at bat, and you’re facing Craig Kimbrel on the mound.
What do you do? How does BI come into play here?
Let’s talk a little about what BI is. You have all these statistics available — Judge’s batting average, Kimbrel’s earned run average, Gardner’s stolen base percentage, and so on. In years BS — “before sabermetrics” — a manager likely would have “gone with his gut,” decided that Judge is your best bet to hit the game-winning home run, and let him swing away. But is this the best decision to make?
Let’s put this another way. You have a plethora of data available at your fingertips. BI represents the ability to analyze all this data and provide information that allows you to make a good decision.
If Aaron Boone (theoretically) had this data available at his fingertips (to my knowledge, Major League Baseball bans the use of electronic devices in the dugout during games), he could use the data to consider Kimbrel’s pitching tendencies, Judge’s career numbers against Kimbrel, and so on. BI enables Boone to make the best possible decision based upon the information he has at hand.
I do want to make one important distinction. In the above paragraphs, I used the words data and information. These two words are not interchangeable. Data refers to the raw numbers that are generated by the players. Information refers to the interpretation of that data. Therein lies the heart of what BI is — it is the process of generating information based upon data.
What’s there to know about BI?
I’ve already mentioned some buzzwords, including OLAP, cubes, and pivot tables. That’s just scratching the surface. There’s also KPIs, reporting services, decision support systems, data mining, data warehousing, and a number of others that I haven’t thought of at this point (if you have any suggestions, please feel free to add them in the comments section below). Other than including the Wikipedia definition links, I won’t delve too deeply into them now, especially when I’m trying to learn about these myself.
So why bother learning about BI?
I have my reasons for learning more about BI. Among other things…
It is a way to keep myself technically relevant.I’ve written before about how difficult it is to stay up-to-date with technology. (For further reading regarding this, I highly recommend Eugene Meidinger’s article about keeping up with technology; he also has a related SQL Saturday presentation that I also highly recommend.) I feel that BI is a subject I’m able to grasp, learn about, and contribute. By learning about BI, I can continue making myself technically valuable, even as my other technical skills become increasingly obsolete. Speaking of which…
It is a subject that interests me. I’m sure that many of you, as kids, had “imaginary friends.” (I’ll bet some adults have, too — just look at Lieutenant Kije and Captain Tuttle.) When I was a kid, I actually had an imaginary baseball team. I went as far as to create an entire roster full of fictitious ballplayers, even coming up with full batting and pitching statistics for them. My star player was a power-hitting second baseman who had won MVP awards in both the National and American leagues, winning several batting titles (including a Triple Crown) and leading my imaginary team to three World Series championships. I figured, if my interest in statistics went that far back, there must be something behind it. Granted, now that I’ve grown up older, I’m not as passionate about baseball statistics as I was as a kid, but some level of interest still remains, nevertheless.
It is a baseline for learning new things. I’ve seen an increasing number of SQL Saturday presentations related to BI, such as PowerBI, reporting services, and R. I’m recognizing that these potentially have value for my workplace. But before I learn more about them, I also need to understand the fundamental baseline that they support. I feel that I need to learn the “language” of BI before I can learn about the tools that support it.
So, hopefully, this article makes a good introduction (for both you and myself) for talking about BI. I’ll try to write more as I learn new things. We’ll see where this journey goes, and I hope you enjoy coming along for the ride.
I am a huge baseball fan. I started following the New York Yankees when I was about 12. (In an act that is likely anathema to religious fanatics, I married a Boston Red Sox fan; during baseball season, one of us — metaphorically speaking — ends up sleeping on the couch!) I have been known to schedule vacations around Major League Baseball schedules. I believe that Cooperstown is Mecca.
Why, you might ask, is this relevant to installing SQL Server? Because I believe that one of the best ways to learn something is to have fun while you’re doing it. So in this implementation, instead of installing NorthWind Traders or AdventureWorks (the standard practice databases used by SQL Server enthusiasts everywhere), I will instead install a copy of Sean Lahman’s baseball database. (At the time of this article, the most recent version goes to and includes the 2015 baseball season.)
I should note that this isn’t to say NorthWind or AdventureWorks isn’t fun; rather, baseball is something about which I’m passionate, and is more likely to keep my attention. At some point, I’ll likely install NorthWind or AdventureWorks as well, as many SQL references and guides refer to them.
I downloaded the most recent SQL version from Sean Lahman’s website, which left me with a .ZIP file. Inside the file, I found a MSSQLMASTER.SQL file (you’ll find it in \mysql core\core\) that I extracted from the .ZIP file. I opened the file with SSMS.
The first thing I noticed was that it created a database called “stats.” I did not want to use this as my database name, so I first replaced all instances of “stats’ with “baseball”. Once this was done, I ran the database build query.
Unfortunately, it would not be that easy. The query file is very large. When I ran the query, I received the following message.
Okay. It has a lot of rows to insert (the data includes over a hundred years of statistics, after all). Instead, I opted for just creating the database and tables. However, even that proved to be problematic.
“Okay,” I said to myself, “what’s going on now?”
First, I had a permission issue to deal with (this doesn’t appear in the above log; I neglected to include that message). I looked for the C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\DATA folder. When I tried to open it, I got a message saying that I had insufficient permissions to open it. “What?” I said, “this is my own machine, and I’m the administrator!” However, it also prompted me asking whether I wanted to open it. I answered yes. Once I did, I tried running the query again, and the permission error seemed to disappear.
However, that’s when I got the errors shown above.
That’s when I noticed that the filename references MSSQL11. My database path includes MSSQL13. Okay. I changed the path in the filename from MSSQL11 to MSSQL13 and reran the query. This time, it ran successfully.
I did notice one thing about the table infrastructure. All columns were defined as NULL, which indicated to me that there were no unique indices or primary keys defined. All tables would be heaps. That’s something that I’d need to fix, but that was likely another project for another time. For now, I was more concerned about building the database infrastructure and inserting the data.
Once my database and tables were built, I commented out the CREATE DATABASE and CREATE TABLE lines from the code, leaving only the INSERT commands. Just for grins, I tried running the query again, and once again got the “insufficient memory” message. Okay, no dice. How was I going to get around this? I suppose I could’ve selected and run the INSERT queries piecemeal, but that was going to be a long and painful process.
I ran a Google search, and came across my old friend, StackOverflow, where I found this entry. So I tried the command line, substituting my database name and SQL filename (including the path).
sqlcmd -d baseball -i Downloads\mssqlmaster.sql
Here’s what I got:
sqlcmd: Error: Connection failure. SQL Native Client is not
installed correctly. To correct this, run SQL Server Setup.
Interesting. I went to look up SQL Server 2016 Native Client, and found this. So SQL Server 2016 does not include Native Client. So, I tried to download and install Native Client.
Of course, nothing ever goes as planned. I got this message.
So, I went into my Settings to check my Native Client. It told me that 2012 Native Client was already installed.
After Google-searching my error messages, I realized that I had SQL Server 2008 Express installed, and my SQLCMD was likely getting confused between versions. That was likely the culprit. So I went ahead and uninstalled SQL Server 2008, and tried my SQLCMD command line again. This time, the SQLCMD did not give me any errors, and my command prompt started scrolling with the familiar “(1 rows affected)” message that appears when SQL Server inserts rows into tables. The script finished without any problem.
Once the script finished, I ran a query to see if my data had made it into the tables.
So it appears that we are good to go! Our baseball database is populated!
I now have a baseball database with which I can play to my heart’s content. I’ll be using this to practice my SQL skills — and ‘blog about my adventures (or misadventures, as the case may be) as I do so.