Answering Simple Questions in SQL Coding Interviews

Requirements

Any database server will do. For this instance, MySQL is used.

SELECT @@version_comment AS `Version`, 
  @@version_compile_machine AS `Version Compile Machine`, 
  @@innodb_version AS `InnoDB Version`, 
  @@version_compile_os AS `Version Compile OS`;
Table 1: Server Info
Version Version Compile Machine InnoDB Version Version Compile OS
MySQL Community Server (GPL) x86_64 5.7.21 Linux

Introduction

In Interviewing ‘SQL Developers’ (and failling!), author pankswork frustratingly writes about having interviewers fail what seems to be very simple SQL questions at a rate of about 50%. “Is my test too hard,” they wonder.

When I read this post I immeidately wondered if they were trick questions I wasn’t picking up on. They all seemed very easy to answer. They were a little tricky, but certainly not unreasonably complicated. Indeed, most commenters to the Reddit post agreed.

I wanted to use the opportunity to demonstrate how to go about solving the interview problems, which are:

  1. Select all data from both tables.

  2. What team has the most wins?

  3. What is the combined salary for each team?

  4. What player does not belong to a team?

I slightly reworded some questions to avoid confusion but the results will still be the same, along with the methods of obtaining them.

Database Setup

First, I’ll create the database and build the Players and Teams tables.

DROP DATABASE IF EXISTS `PlayersTeams`;
CREATE DATABASE `PlayersTeams`;
USE `PlayersTeams`;
CREATE TABLE `Players` (
  `PlayerID` TINYINT(1)  UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY, 
  `Salary`   SMALLINT(4) UNSIGNED NOT NULL, 
  `TeamID`   TINYINT(1)  UNSIGNED NOT NULL
  );
INSERT INTO `Players`
  (`Salary`, `TeamID`)
VALUES
  (1500, 1), 
  (1359, 1),
  (1070, 1), 
  (1165, 3), 
  (1474, 2), 
  (1411, 1), 
  (1211, 2), 
  (1334, 1), 
  (1486, 4), 
  (1223, 2);
CREATE TABLE `Teams` (
  `TeamID`   TINYINT(1)  UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY, 
  `TeamName` VARCHAR(10)          NOT NULL, 
  `Wins`     TINYINT(2)  UNSIGNED NOT NULL, 
  `Losses`   TINYINT(2)  UNSIGNED NOT NULL
);
INSERT INTO `Teams`
  (`TeamName`, `Wins`, `Losses`)
VALUES
  ("Jets",   10,  4), 
  ("Giants",  4, 10), 
  ("Eagles",  7,  7);

Select all data from both tables

What we are being asked to do here is effectly do a SELECT * from both tables which we join on some type of JOIN statement; but, which one? The dataset is small enough for us to easily see we should have 10 rows in final result set.

If we use a LEFT JOIN with Players as the left table, Teams as the right table and TeamID as the foreign key, then every record within Players will be returned along with associated Teams. If we have an instance of a player that does not have an associated team then that record will be NULL.

SELECT 
  * 
FROM 
  `Players`
LEFT JOIN 
  `Teams`
ON 
  `Teams`.`TeamID` = `Players`.`TeamID`
ORDER BY 
  `Players`.`PlayerID`;
Table 2: 10 records
PlayerID Salary TeamID TeamID TeamName Wins Losses
1 1500 1 1 Jets 10 4
2 1359 1 1 Jets 10 4
3 1070 1 1 Jets 10 4
4 1165 3 3 Eagles 7 7
5 1474 2 2 Giants 4 10
6 1411 1 1 Jets 10 4
7 1211 2 2 Giants 4 10
8 1334 1 1 Jets 10 4
9 1486 4 NA NA NA NA
10 1223 2 2 Giants 4 10

If we were to use a RIGHT JOIN then our server would instead take all Teams.TeamID and find corresponding Players per id. However, we have a Player.TeamID value of 4 which does not exist in Teams.TeamID. Therefore, our result set would only contain nine records.

SELECT 
  * 
FROM 
  `Players`
RIGHT JOIN 
  `Teams`
ON 
  `Teams`.`TeamID` = `Players`.`TeamID`
ORDER BY 
  `Players`.`PlayerID`;
Table 3: 9 records
PlayerID Salary TeamID TeamID TeamName Wins Losses
1 1500 1 1 Jets 10 4
2 1359 1 1 Jets 10 4
3 1070 1 1 Jets 10 4
4 1165 3 3 Eagles 7 7
5 1474 2 2 Giants 4 10
6 1411 1 1 Jets 10 4
7 1211 2 2 Giants 4 10
8 1334 1 1 Jets 10 4
10 1223 2 2 Giants 4 10

However, if we used RIGHT JOIN and reversed the order of our tables (make Teams our primary table, joining Players) then we would get the correct results.

SELECT 
  * 
FROM 
  `Teams`
RIGHT JOIN
  `Players`
ON 
  `Players`.`TeamID` = `Teams`.`TeamID`
ORDER BY
  `Players`.`PlayerID`;
Table 4: 10 records
TeamID TeamName Wins Losses PlayerID Salary TeamID
1 Jets 10 4 1 1500 1
1 Jets 10 4 2 1359 1
1 Jets 10 4 3 1070 1
3 Eagles 7 7 4 1165 3
2 Giants 4 10 5 1474 2
1 Jets 10 4 6 1411 1
2 Giants 4 10 7 1211 2
1 Jets 10 4 8 1334 1
NA NA NA NA 9 1486 4
2 Giants 4 10 10 1223 2

Notice also how the column order of our result set changes. In the previous RIGHT JOIN, the associated columns from Teams were on the right side. Now, they are on the left. This is our visual cue of the query we created.

I suspect what’s happening here is those doing the interview get twisted up on the differences not recognizing they’re missing some data.

What team has the most wins?

This question does not require us to do fancy joins. All of the data we need is contained within the Teams table. One way some may be going about this problem is simply using MAX().

SELECT
  MAX(`Wins`)
FROM
  `Teams`;
Table 5: 1 records
MAX(Wins)
10

Notice though our question asked “What team”, not “what is the maximum number of Wins”. We need to return the TeamName (or TeamID). But, you cannot just add this column into the SELECT statement without issue.

SELECT
  `TeamName`, 
  MAX(`Wins`)
FROM
  `Teams`;

This query will generate an error on most all newer database servers. In this case, what MySQL is telling us is that we either must remove TeamName from the query or move it to a GROUP BY statement. See MySQL Handling of GROUP BY for more info.

For our question, all of this is unnecessary anyway. We want to find the maximum Wins and return the corresponding TeamName. Just sort the dataset by Wins, largest to smallest, and return the first record.

SELECT
  `TeamName`
FROM
  `Teams`
ORDER BY 
  `Wins` DESC
LIMIT
  1;
Table 6: 1 records
TeamName
Jets

What is the combined salary for each team?

In the original post, this question was phrased as “How much does each team make?” The author acknowledged this was intended to be a trick question; the teams do not have a “profit” or “revenue”-type column.

However, every player does have a Salary and it is reasonable to assume the question is most likely looking to find the total salary of all players per team.

We can do the first part of this query very simply by summing Salary with a grouping on TeamID.

SELECT
  SUM(`Salary`), 
  `TeamID`
FROM 
  `Players`
GROUP BY
  `TeamID`;
Table 7: 4 records
SUM(Salary) TeamID
6674 1
3908 2
1165 3
1486 4

Realize we would have recevied the group-by-handling error again had I not used the GROUP BY statement; this applies for SUM as well.

At this point I now have the total Salary per team. And, yes, I can probably just leave it at that - question answered. But, what fun is that?

Let’s get the TeamName. To do this I LEFT JOIN Teams again by TeamID. I also modify my original SELECT statement to return TeamName rather than TeamID. Finally, I modify the GROUP BY clause to append the table name. Without this last correction my query will return an Error 1052 that TeamID is ambiguous.

SELECT
  SUM(`Salary`), 
  `Teams`.`TeamName`
FROM 
  `Players`
LEFT JOIN
  `Teams`
ON 
  `Teams`.`TeamID` = `Players`.`TeamID`
GROUP BY
  `Players`.`TeamID`;
Table 8: 4 records
SUM(Salary) TeamName
6674 Jets
3908 Giants
1165 Eagles
1486 NA

What player does not belong to a team?

This question, I think, is the easiest of all. Actually, we already answered it when solving our first question (2). Recall we had one record of NULL values; this was for a player in our Players table that did not have a corresponding team in the Teams table.

This was our query for those results:

SELECT 
  * 
FROM 
  `Players`
LEFT JOIN 
  `Teams`
ON 
  `Teams`.`TeamID` = `Players`.`TeamID`
ORDER BY 
  `Players`.`PlayerID`;

What I need to do here is return only the records where Teams.TeamID IS NULL. And, since the question is asking for “player” and not any additional information, we should only return Players.PlayerID. Our final query is as follows:

SELECT 
  `Players`.`PlayerID`
FROM 
  `Players`
LEFT JOIN 
  `Teams`
ON 
  `Teams`.`TeamID` = `Players`.`TeamID`
WHERE
  `Teams`.`TeamID` IS NULL
ORDER BY 
  `Players`.`PlayerID`;
Table 9: 1 records
PlayerID
9

Summary

I think if 50% of people are failing this first step then they have seriously manipulated their resume or perhaps other screening methods need to be improved. I could understand if occasionally someone tripped up on one of these questions. But, not half of an interview class.

Most importantly for those doing code interviews, in my opinion, is to not just study code and how to perform certain tasks but also understand the errors generated and how to resolve them. When working on this article I made an error and it took me 30 minutes to figure out what it was; I had written “UNISIGNED”, rather than “UNSIGNED”.

People will make mistakes and there’s nothing wrong with that. The best we can do on these job interviews is to just try to prepare the best we can.

Related

Previous
comments powered by Disqus