It was the best of times, it was the worst of times

Steve Jones recently asked: what were your best days at work, and what was your worst?  He also issued a challenge to write about our typical days at work.  In terms of writing about a typical workday, I have the first of four (per Steve’s challenge instructions) draft articles warming up in the bullpen; hopefully, I’ll crank that one out sometime within the next week or so.  But in the meantime, for this article, I want to take a moment to address the best and worst days.

In terms of the worst day, I don’t think there is any contest.  I think it’s pretty safe to say that 9/11 was my worst day at work.  (For the benefit of those of you who don’t feel like clicking my article link, I worked for a company that had an office in the World Trade Center — when 9/11 happened.)

In terms of the best days, however, that requires a little more thought.  It’s not that I haven’t had any great days — that isn’t true — it’s just that there are a number of them, and having been a working professional for (cough! cough!!!) years, trying to pick out a few that stand out over the years is difficult for me to do.  So what I’ll do is pick out a few project victories that I’ve had over the course of my career.  Granted, I’m picking these from the top of my head, and there very well might be others that were more significant that I’m not remembering right now, but for purposes of this exercise, I’ll write about some projects with which I was involved and take a measure of pride.

I’ll start with a project related to the worst day that I mentioned above.  One of my tasks was to maintain an inventory of the servers in our data centers.  For a long time, this was a tediously manual task.  I went through our data centers with a clipboard, checking to see what servers were in each rack, noting any changes and adding new servers and racks that I found.  I drew a map of each room in Visio, even going as far as to count the floor tiles so that I could draw them to scale.  I populated the maps with boxes representing server racks and came up with a naming scheme directly tied into a row-and-column location scheme, making it possible to identify and label each rack so they’d be easy to find.  Included with the maps was a listing of servers within each rack.  I maintained the Visio file on my PC, making sure it was backed up to a departmental file server, and keeping hardcopies in each server room as a reference for various IT workers.

Because this was a manual process, the maps were never completely accurate — I have no doubt that new servers were continually added in-between map updates — and it was a tedious process.  All the while, I kept thinking, “there has to be a better way to do this.”  Sure enough, I found one!

I discovered that all our servers included a product called Insight Manager.  Among other things, it included the ability to collect server BIOS information and store it in a format suitable for importing into a database such as SQL Server.  Using the Insight Manager data structure as a template, I set up a SQL Server database on one of our departmental servers and created a system that enabled it to import data from any server on demand through Insight Manager.  I now had a central database with server data that could be updated at any given time!

Of course, data isn’t information unless it can be interpreted and understood, so the next step was to create an interface for it.  I was responsible for maintaining a departmental intranet site; although it was an internal intranet, I treated it as though it was a full-blown web site.  I created a web site to display the server data stored in my back-end.  I took my Visio server room maps and created image files from them.  From the image files, I created image maps that enabled a user to click a server rack on the map, drilling down to a list of servers in that rack.  Clicking on a server displayed data for that server — serial numbers, IP addresses, applications, and so on.

My server inventory system, which I previously had to update manually, was now automated!

This project was a major milestone for my career.  It was my first significant foray into SQL Server.  (At that time, I hadn’t yet learned about data normalization; had I known about it, I could’ve made the back-end even better.)  It gave me some experience with image maps, HTML, and classic ASP (the technology used on that intranet server at that time).  Most of all, it was my first taste of what it was like to be a web applications developer.

Memories of this project also reminded me of another good day I had on the job.  One particular day, I traveled to remote offices in Yorktown Heights and Middletown to survey their data centers for the server inventory system that I just described.  I hopped into my truck (I owned a small Toyota pickup truck at the time) and drove to Yorktown.  It was a gorgeous picture-perfect day; the sun was out and it was comfortably warm with low humidity.  It was comfortable enough that I left the air conditioner turned off and drove the entire trip with my windows rolled all the way down.  It was the kind of day where I wished I owned a convertible!  My route between the Yorktown and Middletown offices took me through the Bear Mountain area, including traversing the Hudson River over the Bear Mountain Bridge.  If you’ve never driven through that area of New York State, it is an absolutely picturesque and beautiful drive.  The entire trip was so fun and relaxing that it did not feel like a business trip at all; I actually felt as though I was on a vacation!

Finally, I want to talk about one last project in which I had a hand.  In one of my first jobs out of college (my second job out of school, actually), I was responsible for supporting my company’s document imaging system at a client site (a client that eventually ended up hiring me directly).  One of the system’s components was a WORM optical platter jukebox that was in constant use and occasionally needed operations maintenance, which ranged from simple tasks such as inserting an optical platter to complex ones such as restarting it when it locked up.  The device had to be operational 24/7, even at hours when we were not in the office.

I was tasked with putting together a small set of instructions — nothing big, just a few pages — that explained how to perform these tasks, including the correct way to insert a platter, what to do when (unfortunately, not if) the machine stopped working, and so on.  It needed to be written in such a way that the night maintenance staff could maintain the device.  So I sat down in front of my PC with a blank MS Word file and said to myself, “if I was a member of the night staff, what would I want to read that would enable me to perform these tasks?”

I had intended for the document to be a simple three to four page quick reference.  As I wrote and came up with more ideas that would help the readers who would be using it (including the innovative — for me — use of illustrations), the document kept getting bigger and bigger.  I don’t remember how big it eventually became, but I think it was somewhere in the ballpark of thirty pages.  I had absolutely zero technical writing experience at the time; I wrote the document completely by instinct.  I didn’t really know what I was doing; I just did things (illustrations, headers, subject organizations, etc.) that made sense to me.

The final product was a huge success — so much so, in fact, that management developed a training program based around this document.

A couple of years later, I discovered that nearby Rensselaer Polytechnic Institute had a Masters degree program in technical communication (note: I am not entirely sure if the program still exists).  I had been interested in pursuing a graduate degree, and I thought the program sounded interesting, so I decided to apply.  During my application interview with the faculty, I brought along a copy of that operational jukebox document I’d written.  I explained that I’d written the document completely by instinct and with no knowledge or experience in technical writing whatsoever.  The faculty seemed to be impressed with my effort on that project.

I was accepted into the program.  I now have an MS from RPI hanging on my home office wall and listed on my resume.

This article ended up being a lot longer than I expected.  Looking back on this exercise that Steve assigned, I suppose my good days at work were more significant than I thought.  These projects were major events that ended up shaping this professional career.  I suppose the moral of the story is not to underestimate job achievements.  You never know where they might lead!

Advertisements

Is Your DR Plan Complete?

Here’s another article reblog, this time from my friend, Andy Levy. Disaster recovery is a big deal, and you need to make sure that you’re prepared.

Don’t think a disaster can’t happen to you? Well, it happened to me.

The Rest is Just Code

Kevin Hill (b|t) posted a thought-provoking item on his last week about Disaster Recovery Plans. While I am in the 10% who perform DR tests for basic functionality on a regular basis, there’s a lot more to being prepared for disaster than just making sure you can get the databases back online.

You really need to have a full-company business continuity plan (BCP), which your DR plan is an integral portion of. Here come the Boy Scouts chanting “Be Prepared!”

When disaster strikes:

  • How will you communicate it to your customers, including regular status updates?
  • How will you communicate within the company?
  • Do you have your systems prioritized so that you know what order things have to be brought online? Which systems can lag by a day or two while you get the most critical things online?
  • Do you have contingency plans for all of…

View original post 543 more words

SQL Saturday #741 Albany — the debrief

Once again, we had another great SQL Saturday here in Albany this past weekend!  I don’t know how many people attended, but by my estimate, we had well over two-hundred people.  I had a great time (as usual), and it serves to remind me why this is one of my favorite events.

My own presentation went very well!  I had nine people attend my session.  Two people (not including myself) were in the room when I started, but more people started filing in after I began my presentation.  Everyone I spoke with told me it was a great presentation, and I got nothing but positive feedback!

I did, however, get one piece of feedback that I did not expect.  I spoke with a number of people who did not attend my presentation, and one who almost skipped it (but ended up coming, anyway).  They all told me the same thing: they read my presentation title and automatically assumed that my talk was about computer networking, not business networking.  Had they known that, they all told me, they would have been there.  So I missed out on a potentially larger audience, simply because of my presentation title.

My initial reaction was frustration.  Mt first thought was, “read the presentation abstract, people!”  I didn’t want to change it; I liked the title and thought it was clever (I’d taken it from one of my ‘blog articles with the same name).  At the same time, I also couldn’t ignore the feedback; it’s human nature to judge a book by its cover (or, in the case, the title), and first impressions are important.  So, reluctantly, I renamed my presentation.  At the same time, I also made a slight tweak to my presentation slides; I removed one slide that did not serve much purpose during my presentation.

I’d be remiss if I didn’t mention a great event without talking about other great presentations.  I attended James Serra‘s session on presenting.  I’m always looking to improve upon my own presentation skills, and I got a lot out of James’ presentation (as always).  Not only did I pick up a few new tips, he also reinforced some points that I use in my own presentations.  This is always good to hear; it legitimizes things that I discuss.  I also sat in on a less serious presentation by Thomas Grohser.  At the end of the day, a little humor is a good thing.  James and Thomas are both excellent speakers, and I always recommend them.  I’d also heard great things about a session I’d missed presented by my friend, Deborah Melkin, another wonderful speaker that I highly recommend.

Unfortunately, part of the reason why I missed Deborah’s session (and others) was an issue that needed my attention.  When I arrived at the event site that morning, one of my tires went flat.  I didn’t have the time to address it, and I didn’t feel much like dealing with it, especially on a warm and humid day.  Fortunately, I have a AAA membership.  Let me tell you about how handy that became that afternoon.

Of course, there were also the events around SQL Saturday.  Friday night was the speaker’s dinner.  It was held at a Mediterranean restaurant about ten minutes away from my office.  I’ve driven past this place many times, and had no idea that it was there.  It was a good place, and I’m going to keep it in mind!  The closing and accompanying raffles are always fun.  My wife’s name was actually drawn for a prize!  Unfortunately, she couldn’t claim it, simply because she was not there, and the rules stipulate that you need to be present to win!  (Some people said that I should claim it by proxy, but at the same time, rules are rules!)  The post-event party was held at a place across the street, a great opportunity to mix and mingle with fellow speakers and event volunteers in a loose and casual atmosphere!

All in all, it was a great event!  This is one of my favorite events, and I look forward to doing this again next year!

SQL Saturday #741, Albany, NY — Come to upstate New York!

On Saturday, July 28 (a week from tomorrow as I write this), our local Albany-area SQL user group will host our fifth SQL Saturday!  I have participated in all five; I worked as a volunteer at the first one, and I presented at the other four (including next Saturday).  This is one of my favorite events, and I look forward to it every single year!

This year, I am debuting a brand-new presentation: “Networking: it isn’t just for breakfast anymore,” based upon my ‘blog article of the same name.  (An alternate name for it could be “Networking 101: networking for beginners.”)  This presentation is primarily for people who want to get better at networking but don’t know how to do it, although seasoned veterans might be able to get something out of it as well.  It’s one of the first sessions of the day (8:30 am!), so come early!

As much as I promote my own presentations, mine is not the only one on the docket.  There are many wonderful speakers and presentations being given at this event, and I encourage you to come out and check out as many as you can that interest you!

SQL Saturday is always a great time, a great opportunity for free learning, and a great opportunity to network with data professionals.  The Capital District region here in upstate New York has been my home for many years.  I hope to see you here in my home turf!

#BI101: Some SQL Saturday speakers to check out

This is part of a series of articles in which I’m trying to teach myself about BI.  Any related articles I write are preceded with “#BI101” in the title.

As a speaker on the SQL Saturday circuit, I’ve had the honor and privilege of having met, connected with, and even befriended a number of experts in SQL, data, and BI.  If you can get to get to a SQL Saturday, you can also have that opportunity.

In a couple of weeks (July 28), we will be hosting SQL Saturday here in Albany, NY.  I was going through the schedule, and noticed a number of speakers on the docket who will be talking about various BI topics.  I’ve attended a lot of their sessions, and I recommend these speakers highly!

(Note: for purposes of this article, I am limiting this list to BI topics, although these speakers may be giving other presentations as well.)

SQL Saturday is a great free learning resource, a great opportunity to network, and is always a good time!  If you’re looking to learn about BI or other data-related or professional topics, go check out a SQL Saturday event near you!

#BI101: Introduction to data warehousing

This is part of a series of articles in which I’m trying to teach myself about BI.  Any related articles I write are preceded with “#BI101” in the title.

Because this is a new (to me) topic, it’s possible that what I write might be inaccurate.  I invite you to correct me in the comments, and I will make it a point to edit this article for accuracy.

As part of my personal education about business intelligence, I kicked off a SkillSoft course made available to me through my employer.  My strategy is to take the course, perform some supportive research, and write about what I learn.  It turned out that BI was one of the training options available.  So, I kicked off the course and began my training.

The initial topic discussed the concept of data warehousing.  The course began with a pre-test — a proverbial “how much do I really know?”  There were a few subtopics that were familiar to me — normalization and denormalization, for one — so my initial thought was, how much do I have to learn?  As it turned out, the answer was, a lot.

What is a data warehouse, and what does it do?

The short and simple answer is that a data warehouse is, as the name implies, a storage repository for data.

That’s the short answer.  The longer answer gets a little more complicated.

Data warehouse architecture (source: Wikipedia)

I learned that a lot of the concepts behind a data warehouse pretty much breaks a lot of what I thought I knew about relational databases.  For starters, I’ve always been under the impression that all relational databases needed to be normalized.  However, I learned that, in the case of a data warehouse, that might not necessarily be the case.  While a data warehouse could be normalized, it might not necessarily be.  While a normalized database is designed to minimize data redundancy, a denormalized data table might have redundant data.  Although it occupies more space, having redundant data reduces the number of table joins, thus reducing query time (as the SkillSoft lesson put it, sacrificing storage space for speed).  When a data warehouse is storing millions of rows of data, reducing the number of query joins could be significant.

Populating a data warehouse

How does data get into a data warehouse?  It turns out that a data warehouse can have multiple data sources — SQL Server, Oracle, Access, Excel, flat files, and so on.  The trick is, how does this data get into a data warehouse?  This is where the integration layer comes in.

Because the SkillSoft course I took was SQL Server-specific, it focused on SSIS.  SSIS provides tools that allow it to connect to multiple data sources, as well as tools for ETL (Extract, Transform, Load).  ETL involves a process that includes obtaining data from the various sources and processing it for data warehouse storage.  A big part of ETL is data cleansing — formatting data so it is usable and consistent.  For example, imagine that several data sources use different formats for the same Boolean data field.  One uses “1” and “0”, another uses “T” and “F”, another uses “Yes” and “No”, and so on.  Data cleansing formats these fields into a single, consistent format so that it is usable by the data warehouse.  As there are many such fields, ETL is often a very long and involved process.

Just the facts, ma’am…

I had heard of fact and dimension tables before, but I wasn’t entirely sure about their application until I started diving into BI.  In a nutshell, facts are raw data, while dimensions provide context to the facts.

To illustrate facts and dimensions, let me go back to one of my favorite subjects: baseball.  How about Derek Jeter’s hitting statistics?  Clicking “Game Log” displays a list of game-by-game statistics for a given season (the link provided defaults to the 2014 season, Jeter’s last season).  Looking at his last home game on Sept. 25, Jeter accumulated two hits, including a double, driving in three runs, scoring once, and striking out once in five at-bats.  These single-game numbers are the facts.  The facts are these raw numbers that Jeter generated in that single game.  A fact table stores these individual game statistics in a single table.  Dimensions provide context to these numbers.  Some dimensions might include total accumulated statistics for a season, batting average, and some (non-statistical) information about Jeter himself (name, hometown, birth date, etc.).  These derived statistics would be stored in dimension tables.

If you’re still confused about fact and dimension tables, I found this question on StackOverflow that does a pretty good job of answering the question.  I also came across this article that also does a good job of describing fact and dimension tables.

Stars and snowflakes

Star schema (source: Wikipedia)

A fact table usually maintains a relationship with a number of dimension tables.  The fact table connects to the dimension tables through a foreign key relationship.  The visual table design usually resembles a star, in which the fact table is at the center and the dimension tables branch out from the center.  For this reason, this structure is called a star schema.

A snowflake schema is related to the star schema.  The main difference is that the dimension tables are further normalized.  The resulting foreign key relationships result in a schema that visually resembles a snowflake.

Snowflake schema (source: Wikipedia)

Attention, data-mart shoppers…

A data mart can probably be considered either a smaller version of a data warehouse, or a subset of a data warehouse (a “dependent” data mart, according to Wikipedia).  As I understand it, a data warehouse stores data for an entire corporation, organization, or enterprise, whereas a data mart stores data for a specific business unit, division, or department.  Each business unit utilizes data marts for various information purposes, such as reporting, forecasting, and data mining.  (I’ll likely talk about these functions in a separate article; for our purposes, they go outside the scope of this article.)

So, this winds up what I’ve learned about data warehousing (so far).  Hopefully, you’ll have learned as much reading this article as I have writing it.  And hopefully, you’ll keep reading along as I continue my own education into BI.  Enjoy the ride.

 

#BI101: An introduction to BI using baseball

Edit: This is the first of a series of articles (I hope!) in which I’m trying to teach myself about BI.  Any articles I write that are related to this, starting with this one, will be preceded with “#BI101” in the title.

As I stated in a previous article, one topic about which I’m interested in learning more is business intelligence (BI).  For those of you who are new to BI, it is a broad topic.  In a nutshell, it can probably be described as “consuming and interpreting data so it can be used for business decisions and/or applications.”

I’ll admit that I don’t know a lot about BI (at least the fine details, anyway).  I did work a previous job where I touched upon it; I was tasked with performing some data analysis, and I was introduced to concepts such as OLAP cubes and pivot tables.  I’ve gotten better at creating pivot tables — I’ve done a few of them using MS Excel — but I’ll admit that I’m still not completely comfortable with building cubes.  I suppose that’ll come as I delve further into this.

A while back, my friend, Paresh Motiwala, suggested that I submit a presentation for Boston SQL Saturday BI edition.  At the time, I said to him, “the only thing I know about BI is how to spell it!”  He said to me (something like), “hey, you know how to spell SQL, don’t you?”  Looking back at the link, I might have been able to submit (I didn’t realize, at the time, that they were running a professional development track).  That said, Paresh did indeed had a point.  As I often tell people, I am not necessarily a SQL expert — I know enough SQL to be dangerous — nevertheless, that does not stop me from applying to speak at SQL Saturday.  Likewise, as I dive further into this topic, I’m finding that I probably know more about BI than I’ve led myself to believe.  Still, there is always room for improvement.

To tackle this endeavor, once again, I decided to jump into this using a subject that I enjoy profusely: baseball.  Baseball is my favorite sport, and it is a great source of data for stat-heads, mathematicians, and data geeks.  I’ve always been of the opinion that if I’m going to learn something new, I should make it fun!

Besides, the use of statistical analysis in baseball has exploded.  Baseball analytics is a big deal, ever since Bill James introduced sabermetrics (there is some debate as to whether James has enhanced or ruined baseball).  So what better way to introduce myself to BI concepts?

For starters, I came across some articles (listed below, for my own reference as much as anything else):

I also posted a related question in the SSC forums.  We’ll see what kind of responses (if any) I get to my query.

Let’s start with the basics — what is BI?

Since I’m using baseball to drive this concept, let’s use a baseball example to illustrate this.

Let’s say you’re (NY Yankees manager) Aaron Boone.  You’re down by a run with two outs in the bottom of the 9th.  You have Brett Gardner on first, Aaron Judge at bat, and you’re facing Craig Kimbrel on the mound.

What do you do?  How does BI come into play here?

Let’s talk a little about what BI is.  You have all these statistics available — Judge’s batting average, Kimbrel’s earned run average, Gardner’s stolen base percentage, and so on.  In years BS — “before sabermetrics” — a manager likely would have “gone with his gut,” decided that Judge is your best bet to hit the game-winning home run, and let him swing away.  But is this the best decision to make?

Let’s put this another way.  You have a plethora of data available at your fingertips.  BI represents the ability to analyze all this data and provide information that allows you to make a good decision.

If Aaron Boone (theoretically) had this data available at his fingertips (to my knowledge, Major League Baseball bans the use of electronic devices in the dugout during games), he could use the data to consider Kimbrel’s pitching tendencies, Judge’s career numbers against Kimbrel, and so on.  BI enables Boone to make the best possible decision based upon the information he has at hand.

I do want to make one important distinction.  In the above paragraphs, I used the words data and information.  These two words are not interchangeable.  Data refers to the raw numbers that are generated by the players.  Information refers to the interpretation of that data.  Therein lies the heart of what BI is — it is the process of generating information based upon data.

What’s there to know about BI?

I’ve already mentioned some buzzwords, including OLAP, cubes, and pivot tables.  That’s just scratching the surface.  There’s also KPIs, reporting services, decision support systems, data mining, data warehousing, and a number of others that I haven’t thought of at this point (if you have any suggestions, please feel free to add them in the comments section below).  Other than including the Wikipedia definition links, I won’t delve too deeply into them now, especially when I’m trying to learn about these myself.

So why bother learning about BI?

I have my reasons for learning more about BI.  Among other things…

  • It is a way to keep myself technically relevant.  I’ve written before about how difficult it is to stay up-to-date with technology.  (For further reading regarding this, I highly recommend Eugene Meidinger’s article about keeping up with technology; he also has a related SQL Saturday presentation that I also highly recommend.)  I feel that BI is a subject I’m able to grasp, learn about, and contribute.  By learning about BI, I can continue making myself technically valuable, even as my other technical skills become increasingly obsolete.  Speaking of which…
  • It’s another adjustment.  Again, I’ve written before about making adjustments to keep myself professionally relevant.  If there’s one thing I’ve learned, it’s that if you want to survive professionally, you need to learn to adjust to your environment.
  • It is a subject that interests me.  I’m sure that many of you, as kids, had “imaginary friends.”  (I’ll bet some adults have, too — just look at Lieutenant Kije and Captain Tuttle.)  When I was a kid, I actually had an imaginary baseball team.  I went as far as to create an entire roster full of fictitious ballplayers, even coming up with full batting and pitching statistics for them.  My star player was a power-hitting second baseman who had won MVP awards in both the National and American leagues, winning several batting titles (including a Triple Crown) and leading my imaginary team to three World Series championships.  I figured, if my interest in statistics went that far back, there must be something behind it.  Granted, now that I’ve grown up older, I’m not as passionate about baseball statistics as I was as a kid, but some level of interest still remains, nevertheless.
  • It is a baseline for learning new things.  I’ve seen an increasing number of SQL Saturday presentations related to BI, such as PowerBI, reporting services, and R.  I’m recognizing that these potentially have value for my workplace.  But before I learn more about them, I also need to understand the fundamental baseline that they support.  I feel that I need to learn the “language” of BI before I can learn about the tools that support it.

So, hopefully, this article makes a good introduction (for both you and myself) for talking about BI.  I’ll try to write more as I learn new things.  We’ll see where this journey goes, and I hope you enjoy coming along for the ride.