Setting up a primary key

When I set up my baseball database, I noticed something in the SQL code used to build the tables.  In the CREATE TABLE T-SQL code that was supplied by the Sean Lahman’s baseball database, I saw code that looked like this:

CREATE TABLE [dbo].[Batting](
	[playerID] [varchar](50) NULL,
	[yearID] [varchar](50) NULL,
	[stint] [varchar](50) NULL,
	[teamID] [varchar](50) NULL,
	[lgID] [varchar](50) NULL,
	[G] [varchar](50) NULL,
	[AB] [varchar](50) NULL,
	[R] [varchar](50) NULL,
	[H] [varchar](50) NULL,
	[2B] [varchar](50) NULL,
	[3B] [varchar](50) NULL,
	[HR] [varchar](50) NULL,
	[RBI] [varchar](50) NULL,
	[SB] [varchar](50) NULL,
	[CS] [varchar](50) NULL,
	[BB] [varchar](50) NULL,
	[SO] [varchar](50) NULL,
	[IBB] [varchar](50) NULL,
	[HBP] [varchar](50) NULL,
	[SH] [varchar](50) NULL,
	[SF] [varchar](50) NULL,
	[GIDP] [varchar](50) NULL
) ON [PRIMARY]

Granted, there are a number of things about this code that strikes me, but one thing hit me immediately: all the columns are nullable.  That tells me that there are no primary keys defined.  I checked the other CREATE TABLE statements in the code as well; likewise, all of them had all their columns set to be nullable.

(There are some other things about the CREATE TABLE statements that struck me as well, but I’ll save those topics for another time.  For the purposes of this article, I’ll concentrate on primary keys.)

First, if you’re getting your feet wet with relational databases, it behooves the question…

What is a primary key?

First, I’ll start with a few links that help explain this:

primary key uniquely identifies a row of data.  To illustrate this, here’s an example.

Let’s say we have a person named John Smith taking night classes at State University.  In his database file (which, for the purposes of this example, has his name and address), he has the following information:

Name Address
John Smith 123 Main Street

Now, let’s say his son, John Smith Jr. (who lives at the same address) enrolls at State University (for the purposes of this example, let’s assume the database doesn’t allow suffixes like “Jr.”).  He is entered in the database.  So now, you have this.

Name Address
John Smith 123 Main Street
John Smith 123 Main Street

How do you know which record is which?  Answer: you don’t.  So, let’s add an ID column to our table.

ID Name Address
1 John Smith 123 Main Street
2 John Smith 123 Main Street

Now we know that we have two different records for two different people who are both named John Smith who live at 123 Main Street.

Granted, this is an overly-simplistic example, but the point is that the ID column — which is always unique and is never NULL — uniquely identifies each row.

So now that we know what a primary key is, let’s go back to our baseball database.

Creating primary keys for our baseball tables

We could create an ID column for our baseball tables, but I don’t want to do that here.  Instead, I want to make use of existing columns for our primary keys.  I’ll start with the MASTER table, which keeps track of ballplayers.

I ran a simple SELECT * FROM MASTER to get an idea of what’s in the table.  Running the query gave me this.

BaseballMasterQuery

I noted a couple of things.  First, I see a column called “playerID” which could be unique and used as a primary key.  Second, I see that the table contains 18,846 rows.  So can I use “playerID” for my primary key?  Let’s find out.

I ran two query statements.  The first gives me a count of unique entries I have for “playerID.”  The second tells me if any NULL values for “playerID” exist.

Here’s what I got.

BaseballMasterPlayerID.png

The first statement returns 18,846, which matches the number of rows in my earlier query, telling me that all values are unique.  The second statement doesn’t return anything, which tells me that there are no NULL values for “playerID.”  Therefore, the “playerID” column can be used as the primary key for this table.

So how do we make this column the primary key?  There are a couple of ways to do it in SQL Server.  For this, I decided to use T-SQL.

alter table dbo.Master
add constraint PK_playerID primary key clustered (playerID)

When I ran this, it returned an error.

Msg 8111, Level 16, State 1, Line 1
Cannot define PRIMARY KEY constraint on nullable column in table 'Master'.
Msg 1750, Level 16, State 0, Line 1
Could not create constraint or index. See previous errors.

Okay.  Why did this appear?  Keep in mind that a primary key cannot be null.  When our code created our tables, it made them all nullable.  So we need to add a NOT NULL constraint before we create the primary key.

alter table dbo.Master
alter column playerID varchar(50) not null

Running this statement sets the column to not allow NULL values.  Once I ran this, I re-ran the command to create the primary key, and it completed successfully.  So I now have a primary key on my Master table.

What if I don’t have a single unique column?

Let’s take a look at the Batting table.

BaseballBatting

So, what should we use for our primary key?  A batter can play multiple seasons, and can play for multiple teams in a single season.  Chances are that you won’t be able to make use of a single column.  However, we can make use of multiple columns for our primary key (this is called a “composite” primary key).

We find that the Batting table contains 101,332 rows, so we need a combination of columns that will give us 101,332 unique results.  I figured that playerID, yearID, and stint should give us those results.

I ran the following query.

select distinct playerID, yearID, stint from Batting

This returned 101,332 results.  Indeed, I seem to have identified my composite primary key.

My first task is to set these three columns to not accept NULL values (otherwise, we’d get the same error from earlier).

alter table dbo.Batting alter column playerID varchar(50) not null
alter table dbo.Batting alter column yearID varchar(50) not null
alter table dbo.Batting alter column stint varchar(50) not null

Once the NOT NULL constraint is in place, we can create our composite primary key.  Note that in my T-SQL statement below, I provide three columns (playerID, yearID, stint), not just one.

alter table dbo.Batting 
add constraint PK_playerID_yearID_stint 
primary key clustered (playerID, yearID, stint)

The query ran successfully.  We now have a composite primary key on the Batting table that uses three columns.

Hopefully, this gives you a good introduction to primary keys on database tables.  In upcoming articles, I’ll cover other aspects such as foreign keys, indexes, and data types as we continue to further refine and improve our baseball database.

Advertisements

Installing SQL Server 2016

In an earlier article, I discussed how to get started with SQL Server.  I mentioned downloading a copy of SQL Server Developer Edition and installing a sample database.

It had been quite some time since I performed a SQL Server installation (my current job only requires that I know T-SQL, not how to install or implement it; after all, I am a developer, not a DBA).  Additionally, I figured that having SQL Server installed on my own local machine gave me a platform on which I could practice my skills in a safe environment.  With that, I decided that I would shake off the rust and perform an installation on my own machine.  This article documents my efforts.

I’ll start by describing the platform on which I’m installing SQL Server.  I am installing this on my own laptop PC, not a server.  It might or might not be an ideal environment for running SQL Server, but it’s what I have.

  • HP ProBook 4430s laptop
  • Intel Core i5-2450M processor @ 2.50 GHz
  • Windows 10 Pro 64 bit
  • 8GB RAM
  • 325GB available disk space

I decided that I wanted to keep up with the latest version (which, at the time of this article, is SQL Server 2016), so I needed to find a copy to install on my machine.  To do that, I need to go into Visual Studio Dev Essentials.  Visual Studio Dev Essentials is free, but it does require you to create an account.  If you don’t have an account, you will be prompted to create one when you click “Join Now.”

Once your account is established (or if you already have one), you are redirected to the Visual Studio Dev Essentials page.

VSDevEss

SQL Server can be downloaded from the Downloads menu at the top of the page, as well as the Microsoft SQL Server icon under Tools.

I went to the Dev Essentials page, and saw a note that SQL Server 2016 had a critical prerequisite, so I started with downloading and installing the patch.  Select the download that matches your system (for me, it was English-United States x64.exe).  Note that this requires a restart after setup completes.  Once the patch was installed, I went ahead and downloaded SQL Server 2016.  It’s a 2.1GB download.

Once the download finished, I ran the setup.exe program, which brought me to the SQL Server Installation Center.SQLServerInstallationCenter

Clicking the Hardware and Software Requirements brought me to the corresponding page.  Reading through the requirements, I didn’t see anything that would immediately be an issue.  The only thing I saw was that it required .NET Framework 4.6, which supposedly is installed by the program.

I decided to go ahead with the installation.  After clicking the Installation tab, I clicked New SQL Server stand-alone installation or add features to an existing installation.  For the Product Key, I specified the free Developer edition, clicked Next, and followed the subsequent prompts.  I received a warning about Windows Firewall being enabled.  I opened my Settings, went into Network & Internet, and checked my Windows Firewall.  Interestingly, my Windows Firewall would not let me turn it off; it said it was controlled by Norton 360, which is installed on my computer.  This was interesting; I had no idea that Norton 360 was controlling my Windows Firewall.

For the moment, my notification regarding Windows Firewall was a warning, not a failure.  I decided to continue to see what would happen.  I went ahead and clicked Next.  This brought me to the Feature Selection window.

SQLServerFeatureSelection

I went ahead and clicked Select All.  I had no idea what features I would use, I figured that I was installing SQL Server 2016 so I could practice and learn about its features, and it would only take 8GB of disk space — considering that I have 325GB available, it’s only a drop in the bucket.

However, when I clicked Next this time, I came across an error.

PolybaseInstallFailOracle

Interesting.  Should I install the Oracle SE Java Runtime Environment or not?  To answer that question, I asked myself whether or not I needed the PolyBase Query Service for External Data.  For that matter, what, exactly, is the PolyBase Query Service?  I found my answer here.  (Note that this feature is new to SQL Server 2016.)

For the moment, I decided that I didn’t need to work with data in either Hadoop or Azure blob storage, nor did I foresee an immediate need to interface with BI tools.  If I ever come across a need for them, I figure that I could always add it later.  So I unchecked the PolyBase Query Service (for now) and decided to proceed.  (Unchecking this, by the way, brought my disk requirement down to 6.5GB.)

I went with the default named instance (MSSQLSERVER).  In fact, I pretty much went with the default configurations (the one exception being selecting Mixed Mode instead of Windows authentication mode; I did add myself as a SQL Server administrator, as well as other services configurations).

I didn’t think to keep track of how long it took to install SQL Server 2016, but I would guess it took about an hour.  SQL Server installed with no further issues.

SQL Server Management Studio 2016

For as long as I can remember, SQL Server Management Studio (SSMS) was included as a standard part of installing SQL Server.  That is not the case with SQL Server 2016; SSMS must be installed separately.  Clicking Install SQL Server Management Tools in the SQL Server Installation Center takes you to this page.  Because SSMS tends to be my go-to tool for using SQL Server, I wanted to make sure it was included with my installation.

I downloaded the SSMS .EXE file, which is 825MB.  Installing SSMS is straightforward; click Accept to accept the license agreement, and the package installs.

Other SQL Server 2016 Tools

The SQL Server Installation Center also includes support for SQL Server data tools (SSDT) and for a new R server installation.  For now, I am bypassing these tools; I will likely install them at a later date.

After all was said and done…

Once everything was installed, I opened SSMS, which ran successfully.  (One thing I’ll add is that SSMS now looks like Visual Studio, which makes sense since it is powered by the VS engine.)

I now have SQL Server 2016 installed on my machine.

What’s next?

I will be implementing Sean Lahman’s baseball database into my SQL Server.  My exploits will be documented in a future ‘blog article called “Installing a Baseball Database.”  Stay tuned!

SQL Server: How to get started

A few weeks ago, my friend Jim called me.  His company was ending its relationship with a major client, and as a result, his position was likely to be reduced (or eliminated — I don’t remember which one he said) over the long term.  He knew that I’d done a lot of work with SQL Server (mind you, I don’t know enough to be an expert, but I know enough to get by), and had some questions for me — mainly to the effect of, “I want to learn SQL Server.  How do I get started?”

I see this question come up a lot in various forums, especially on SQLServerCentral.com (which, by the way, was one the first references that I gave Jim).  I suggested that he create a free account on the site so he could access and ask questions on the forum.

I mentioned he should look into attending SQL Saturday.  And on hindsight, I’d forgotten to mention that he should look into joining a PASS chapter.

I suggested that he obtain and install a copy of Microsoft SQL Server Developer Edition, which is a free download.  I also mentioned downloading the famous (infamous?) Northwind/AdventureWorks sample databases.  Practice makes perfect, after all, and if he could get hands-on experience with SQL Server in a sandbox environment, all the better.

Side note: Jim (a Red Sox fan) and I (a Yankee fan) are both big baseball fans (yes, we argue frequently!).  When I think of databases on which to practice, my preference is to download a copy of Sean Lahman’s baseball database.  I have nothing against either the Northwind or AdventureWorks sample databases; my thinking is that if you’re going to learn a new technology, you might as well make it fun!

And, of course, no introduction to SQL Server would be complete without Books Online (frequently abbreviated “BOL”).

Those are my suggestions.  Do you have any more?  If so, please feel free to leave them in the comments below!