#SQL101: Raising awareness of SQL injection

(Image credit: XKCD.com)

I don’t think there’s an experienced web developer or DBA who isn’t familiar with the classic “Bobby Tables” XKCD cartoon above. Just about any time you mention “Bobby Tables” to most experienced IT people, (s)he will immediately know to whom you are referring. Most experienced web developers and DBAs are aware of SQL injection and will take steps to ensure that it’s addressed. Grant Fritchey has a presentation about SQL injection (you can view and download his slide deck here) in which he’s not shy about his desire to “kill Bobby Tables.” I’ve seen him present it at SQL Saturday, and I highly recommend it.

Of course, the keyword here is “experienced.” For people who don’t have that experience, and who build websites that connect to databases, I think it should be lesson #1. Today, I had an experience that reminded me of that.

Earlier today, my sister texted me, asking for help with editing SQL code. She asked me what I use to edit SQL. I told her I generally use SSMS, although you can edit SQL code with a straight-up text editor, if necessary (she is not a DBA, so I felt somewhat comfortable telling her this). She told me she had to clean up spam comments in her data.

That last comment immediately grabbed my attention. I then asked her, how are your security settings, and do you have data backups.

She told me: that IS her data backup.

If her earlier comment had gotten my attention, this one immediately set off alarm klaxons in my head.

I started thinking about what could have corrupted her data to this extent. I started asking questions about her admin setup (I should’ve asked her to make sure she wasn’t using “sa” or “admin” as her admin login — Sis, if you’re reading this, make sure you check this!), including her passwords. Her admin password was pretty secure (thankfully).

She then mentioned her website. I asked if her website was accessing her data. She said yes.

I asked her about Bobby Tables (admittedly, in my advancing age, the term “SQL injection” didn’t immediately come to my mind). Her response: “who?”

At this point, I was convinced that I had my answer. Her database had been corrupted through SQL injection attacks. I told her to make sure you address your SQL injection issue before you even think about your data backups. Worrying about your data backups before addressing your SQL injection issue is like trying to rebuild your house before you’ve put out the fire.

I’ve been talking about SQL injection all throughout this article. For a brand-new web or database developer who has no idea what SQL injection is, here’s a quick primer: it’s a data security attack in which a hacker breaches your database by sending SQL commands through your web interface. I won’t get too much into how it works; instead, here are a few links that explain what it is.

And make no mistake: SQL injection attacks can cause major damage.

So consider this a warning to any fledgling developers who are interested in web or data development: data security issues, such as SQL injection (and there are many others) are a big deal and need to be considered when building your setup; it’s not as simple as just setting up your website and connecting it to a database. By not considering this when you first assemble your system, you might be setting yourself up for major issues down the road.

When it’s appropriate to use fake data (no, really!!!)

It isn’t uncommon for me to include data examples whenever I’m writing documentation. I’ve written before about how good examples will enhance documentation.

Let me make one thing clear. I am not talking about using data or statistics in and of itself to back up any assertions that I make. Rather, I am talking about illustrating a concept that just happens to include data as part of the picture. In this scenario, the illustration is the important part; the data itself is irrelevant. In other words, the information within the data isn’t the example; the data is the example. Take a moment to let that sink in, then read on. Once you grasp that, you’ll understand the point of this article.

I need to strike a type of balance as to what kind of examples I use. Since I work in a multi-client data application environment, I need to take extra steps to ensure that any data examples I use are client agnostic. Clients should not see — nor is it appropriate to use — data examples that are specific to, or identifies, a client.

There’s also a matter of data security. I needn’t explain how big of a deal data security is these days. We are governed by laws such as HIPAA, GDPR, and a number of other data protection laws. Lawsuits and criminal charges have come about because of the unauthorized release of data.

For me, being mindful of what data I use for examples is a part of my daily professional life. Whenever I need data examples, I’ll go through the data to make sure that I’m not using any live or customer data. If I don’t have any other source, I’ll make sure I alter the data to make it appear generic and untraceable, sometimes even going as far as to alter screen captures pixel by pixel to change the data. I’ll often go through great pains to ensure the data I display is agnostic.

I remember a situation years ago when a person asking a question on the SQLServerCentral forums posted live data as an example. I called him out on it, telling him, “you just broke the law.” He insisted that it wasn’t a big deal because he “mixed up names so they didn’t match the data.” I, along with other forum posters, kept trying to tell him that what he was doing was illegal and unethical, and to cease and desist, but he just didn’t get it. Eventually, one of the system moderators removed his post. I don’t know what happened to the original poster, but it wouldn’t surprise me if, at a minimum, he lost his job over that post.

Whenever I need to display data examples, there are a number of sources I’ll employ to generate the data that I need.

  • Data sources that are public domain or publically available — Data that is considered public domain is pretty much fair game. Baseball (and other sports) statistics come to mind off the top of my head.
  • Roll your own — I’ll often make up names (e.g. John Doe, Wile E. Coyote, etc.) and data wherever I need to do so. As an added bonus, I often have fun while I’m doing it!

Are there any other examples I missed? If you have any others, feel free to comment below.

So if you’re writing documentation in which you’re using an illustration that includes data, be mindful of the data in your illustration. Don’t be the person who is inadvertently responsible for a data breach in your organization because you exposed live data in your illustration.

Agnostic document references

Many of you who are technical writers — or data professionals — understand how important it is to not expose real data.  There is a huge emphasis on data security, as there should be.  But a lot of people never think about smaller pieces of data within a document — something as simple as, say, an email address.  In a lot of documentation I write, I make it a point to make much of it as anonymous and agnostic as possible.  There are a number of reasons for this, as I outline below.

I’ll start with a point of emphasis that I make in documentation time and again: know your audience.  Earlier this year, I was responsible for writing a user guide for an application that was to be used by multiple clients.  When I first started working on the document, the application was only being used by one client.  The application — and the document — contained many references to that one client.  In order to update the application to serve multiple clients, all references to the one client had to be removed, not just in the application itself but also in the documentation.  If a document is to be used by multiple audiences, specific client references should be made generic.

Likewise, I’ve accessed applications that display my name on the screen: “Welcome, Ray Kim,” for example.  If I screen-capture that image, I often alter the image so that my name is removed.  I don’t know about you, but I wouldn’t want to be contacted incessantly about an application that I probably know little about.

Another reason is data security, which becomes a larger issue by the day.  Sensitive corporate data risks being exposed within documentation.  For example, for my screen captures and illustrations, I request a “dummy” account that displays “bogus” corporate data for document illustration purposes.  I’ve gotten into the habit where I request this for every document I write.  Corporate data is sensitive; indeed, there is an increasing awareness of keeping such data private (the European Union’s GDPR and laws such as HIPAA and various other data protection laws come to mind).  Live data should never be used in documentation (unless authorized by the client in question); you can never tell what can be revealed in that data.

Here’s another reason I have for keeping documentation generic.  Let’s say, for example, an instruction for additional help directs you to “contact john.smith@yourcompany.com” in your document.  However, there’s one problem: John Smith no longer works for the company.  The correct contact is john.public@yourcompany.com.  But nobody made you aware of this change, nor is anyone aware that this contact is in the documentation.  And who is to say that John Public won’t leave the company or change roles in the next couple of weeks?  Additionally, listing a specific person runs the risk of that person getting spammed by peope who decide to use the contact for reasons other than the one listed in the document.  These risks come up any time a document references a specific person.  A better approach is to reference something more generic — say, a generic mailbox (e.g. HelpDesk@yourcompany.com) or a departmental site.

Here’s another argument for using generic references: how often does the document need to be updated?  Let’s say, for example, you write a document that references “Acme Corporation.”  Over the course of time, “Acme Corporation” changes — perhaps the company is bought or merged, the name changes, and so on.  How often do you need to go through your document looking for “Acme Corporation” and all its variations (shortened to “Acme,” abbreviated as “AC,” misspellings, etc.)?  And suppose your document is hundreds of pages long, with company references interspersed all throughout the document?  Making those edits does not make for a good working day.

Documentation runs the risk of exposing large amounts of data.  For data security, privacy, or even simple editing reasons (and maybe a bunch of others that I haven’t thought about yet), unless there’s a specific reason for not doing so, illustrations, examples, and references should often be made generic.  It’s a simple but critical step that can make your job better as a technical writer, and save your organization from a number of headaches — possibly ranging from complaints from audience members to a lawsuit.

Is Your DR Plan Complete?

Here’s another article reblog, this time from my friend, Andy Levy. Disaster recovery is a big deal, and you need to make sure that you’re prepared.

Don’t think a disaster can’t happen to you? Well, it happened to me.

The Rest is Just Code

Kevin Hill (b|t) posted a thought-provoking item on his last week about Disaster Recovery Plans. While I am in the 10% who perform DR tests for basic functionality on a regular basis, there’s a lot more to being prepared for disaster than just making sure you can get the databases back online.

You really need to have a full-company business continuity plan (BCP), which your DR plan is an integral portion of. Here come the Boy Scouts chanting “Be Prepared!”

When disaster strikes:

  • How will you communicate it to your customers, including regular status updates?
  • How will you communicate within the company?
  • Do you have your systems prioritized so that you know what order things have to be brought online? Which systems can lag by a day or two while you get the most critical things online?
  • Do you have contingency plans for all of…

View original post 543 more words

Maintain Your Trustworthiness

This is a reblog of an article written by my friend, Steve Jones. I would hope that this is something that goes without saying among data professionals like myself, but I think that it’s important enough that it’s worth repeating (and reblogging).

Voice of the DBA

Many of us that are DBAs and/or sysadmins find ourselves with privileged access to many systems. We can often read the data that’s stored in these systems, whether that’s a relational database, a NoSQL store, or even a mail system. As a result, it is incumbent upon us to be trustworthy and maintain confidentiality with privileged information.

Overall I think most of us do this, but there are always some rogue administrators out there, some of which might take malicious actions. There have been a few people that were arrested or sued for hacking into systems, trashing backups, or causing other issues. Often those are emotional outbursts that disrupt operations, and many people are aware there is an issue. However, what if people weren’t aware they were being hacked in some way?

I ran across this story about some “admin” software being sold on a hacker forum site, which was…

View original post 309 more words