The following discussion provides several examples of the ways in which MySQL detects functional dependencies. The examples use this notation:
{X} -> {Y}
Understand this as “X
uniquely determines Y
,” which also means that Y
is functionally dependent on X
.
The examples use the world
database, which can be downloaded from https://dev.mysql.com/doc/index-other.html. You can find details on how to install the database on the same page.
The following query selects, for each country, a count of spoken languages:
SELECT co.Name, COUNT(*)
FROM countrylanguage cl, country co
WHERE cl.CountryCode = co.Code
GROUP BY co.Code;
co.Code
is a primary key of co
, so all columns of co
are functionally dependent on it, as expressed using this notation:
{co.Code} -> {co.*}
Thus, co.name
is functionally dependent on GROUP BY
columns and the query is valid.
A UNIQUE
index over a NOT NULL
column could be used instead of a primary key and the same functional dependence would apply. (This is not true for a UNIQUE
index that permits NULL
values because it permits multiple NULL
values and in that case uniqueness is lost.)
This query selects, for each country, a list of all spoken languages and how many people speak them:
SELECT co.Name, cl.Language,
cl.Percentage * co.Population / 100.0 AS SpokenBy
FROM countrylanguage cl, country co
WHERE cl.CountryCode = co.Code
GROUP BY cl.CountryCode, cl.Language;
The pair (cl.CountryCode
, cl.Language
) is a two-column composite primary key of cl
, so that column pair uniquely determines all columns of cl
:
{cl.CountryCode, cl.Language} -> {cl.*}
Moreover, because of the equality in the WHERE
clause:
{cl.CountryCode} -> {co.Code}
And, because co.Code
is primary key of co
:
{co.Code} -> {co.*}
“Uniquely determines” relationships are transitive, therefore:
{cl.CountryCode, cl.Language} -> {cl.*,co.*}
As a result, the query is valid.
As with the previous example, a UNIQUE
key over NOT NULL
columns could be used instead of a primary key.
An INNER JOIN
condition can be used instead of WHERE
. The same functional dependencies apply:
SELECT co.Name, cl.Language,
cl.Percentage * co.Population/100.0 AS SpokenBy
FROM countrylanguage cl INNER JOIN country co
ON cl.CountryCode = co.Code
GROUP BY cl.CountryCode, cl.Language;
Whereas an equality test in a WHERE
condition or INNER JOIN
condition is symmetric, an equality test in an outer join condition is not, because tables play different roles.
Assume that referential integrity has been accidentally broken and there exists a row of countrylanguage
without a corresponding row in country
. Consider the same query as in the previous example, but with a LEFT JOIN
:
SELECT co.Name, cl.Language,
cl.Percentage * co.Population/100.0 AS SpokenBy
FROM countrylanguage cl LEFT JOIN country co
ON cl.CountryCode = co.Code
GROUP BY cl.CountryCode, cl.Language;
For a given value of cl.CountryCode
, the value of co.Code
in the join result is either found in a matching row (determined by cl.CountryCode
) or is NULL
-complemented if there is no match (also determined by cl.CountryCode
). In each case, this relationship applies:
{cl.CountryCode} -> {co.Code}
cl.CountryCode
is itself functionally dependent on {cl.CountryCode
, cl.Language
} which is a primary key.
If in the join result co.Code
is NULL
-complemented, co.Name
is as well. If co.Code
is not NULL
-complemented, then because co.Code
is a primary key, it determines co.Name
. Therefore, in all cases:
{co.Code} -> {co.Name}
Which yields:
{cl.CountryCode, cl.Language} -> {cl.*,co.*}
As a result, the query is valid.
However, suppose that the tables are swapped, as in this query:
SELECT co.Name, cl.Language,
cl.Percentage * co.Population/100.0 AS SpokenBy
FROM country co LEFT JOIN countrylanguage cl
ON cl.CountryCode = co.Code
GROUP BY cl.CountryCode, cl.Language;
Now this relationship does not apply:
{cl.CountryCode, cl.Language} -> {cl.*,co.*}
Indeed, all NULL
-complemented rows made for cl
is put into a single group (they have both GROUP BY
columns equal to NULL
), and inside this group the value of co.Name
can vary. The query is invalid and MySQL rejects it.
Functional dependence in outer joins is thus linked to whether determinant columns belong to the left or right side of the LEFT JOIN
. Determination of functional dependence becomes more complex if there are nested outer joins or the join condition does not consist entirely of equality comparisons.
Suppose that a view on countries produces their code, their name in uppercase, and how many different official languages they have:
CREATE VIEW country2 AS
SELECT co.Code, UPPER(co.Name) AS UpperName,
COUNT(cl.Language) AS OfficialLanguages
FROM country AS co JOIN countrylanguage AS cl
ON cl.CountryCode = co.Code
WHERE cl.isOfficial = 'T'
GROUP BY co.Code;
This definition is valid because:
{co.Code} -> {co.*}
In the view result, the first selected column is co.Code
, which is also the group column and thus determines all other selected expressions:
{country2.Code} -> {country2.*}
MySQL understands this and uses this information, as described following.
This query displays countries, how many different official languages they have, and how many cities they have, by joining the view with the city
table:
SELECT co2.Code, co2.UpperName, co2.OfficialLanguages,
COUNT(*) AS Cities
FROM country2 AS co2 JOIN city ci
ON ci.CountryCode = co2.Code
GROUP BY co2.Code;
This query is valid because, as seen previously:
{co2.Code} -> {co2.*}
MySQL is able to discover a functional dependency in the result of a view and use that to validate a query which uses the view. The same would be true if country2
were a derived table (or common table expression), as in:
SELECT co2.Code, co2.UpperName, co2.OfficialLanguages,
COUNT(*) AS Cities
FROM
(
SELECT co.Code, UPPER(co.Name) AS UpperName,
COUNT(cl.Language) AS OfficialLanguages
FROM country AS co JOIN countrylanguage AS cl
ON cl.CountryCode=co.Code
WHERE cl.isOfficial='T'
GROUP BY co.Code
) AS co2
JOIN city ci ON ci.CountryCode = co2.Code
GROUP BY co2.Code;