<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Normalisation on ENKR's Blog | Jing Hui PANG</title><link>https://blog.enkr1.com/tags/normalisation/</link><description>Recent content in Normalisation on ENKR's Blog | Jing Hui PANG</description><generator>Hugo -- gohugo.io</generator><language>en-us</language><copyright>ENKR</copyright><lastBuildDate>Tue, 26 May 2026 18:06:31 +0800</lastBuildDate><atom:link href="https://blog.enkr1.com/tags/normalisation/index.xml" rel="self" type="application/rss+xml"/><item><title>Database Normalisation: From 1NF to BCNF</title><link>https://blog.enkr1.com/database-normalisation-from-1nf-to-bcnf/</link><pubDate>Sat, 20 Apr 2024 03:09:54 +0800</pubDate><guid>https://blog.enkr1.com/database-normalisation-from-1nf-to-bcnf/</guid><description>&lt;p&gt;Database normalisation is about one thing: &lt;strong&gt;don&amp;rsquo;t store the same fact in more than one place.&lt;/strong&gt; Every normal form is a stricter version of that rule, backed by a formal tool called functional dependencies.&lt;/p&gt;
&lt;!-- more --&gt;
&lt;h1 id="functional-dependencies-the-tool-behind-it-all"&gt;Functional Dependencies: The Tool Behind It All
&lt;/h1&gt;&lt;p&gt;A functional dependency (FD) &lt;code&gt;X → Y&lt;/code&gt; means: &lt;strong&gt;if two rows have the same X value, they must have the same Y value.&lt;/strong&gt; Same input, guaranteed same output.&lt;/p&gt;
&lt;p&gt;Example from an NUS student table:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;email&lt;/th&gt;
&lt;th&gt;name&lt;/th&gt;
&lt;th&gt;department&lt;/th&gt;
&lt;th&gt;faculty&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a class="link" href="mailto:tikki@gmail.com" &gt;tikki@gmail.com&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;TIKKI TAVI&lt;/td&gt;
&lt;td&gt;CS&lt;/td&gt;
&lt;td&gt;School of Computing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a class="link" href="mailto:rikki@gmail.com" &gt;rikki@gmail.com&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;RIKKI TAVI&lt;/td&gt;
&lt;td&gt;CS&lt;/td&gt;
&lt;td&gt;School of Computing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a class="link" href="mailto:bob@gmail.com" &gt;bob@gmail.com&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;BOB&lt;/td&gt;
&lt;td&gt;Chemistry&lt;/td&gt;
&lt;td&gt;Faculty of Science&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;email → name&lt;/code&gt; (same email = same name, always)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;email → department&lt;/code&gt; (same email = same department)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;department → faculty&lt;/code&gt; (CS is always School of Computing)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;FDs tell you &lt;strong&gt;why&lt;/strong&gt; to split tables. Foreign keys are &lt;strong&gt;how&lt;/strong&gt; you reconnect them after splitting.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Key vocabulary:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Candidate key:&lt;/strong&gt; a minimal set of columns that uniquely identifies every row. A table can have several; you pick one as the primary key.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Prime attribute:&lt;/strong&gt; any column that belongs to at least one candidate key.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Non-prime attribute:&lt;/strong&gt; a column that isn&amp;rsquo;t part of any candidate key.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Superkey:&lt;/strong&gt; a candidate key, or any superset of one (candidate key + extra columns).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Partial dependency:&lt;/strong&gt; a non-prime attribute depends on only &lt;em&gt;part&lt;/em&gt; of a composite candidate key.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Transitive dependency:&lt;/strong&gt; a non-prime attribute depends on another non-prime attribute (A → B → C, where B is not a candidate key).&lt;/li&gt;
&lt;/ul&gt;
&lt;h1 id="1nf-atomic-values"&gt;1NF: Atomic Values
&lt;/h1&gt;&lt;p&gt;&lt;strong&gt;Rule:&lt;/strong&gt; every column holds a single, indivisible value. No lists, no comma-separated strings, no nested structures.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Bad (not 1NF):&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;order_id&lt;/th&gt;
&lt;th&gt;customer&lt;/th&gt;
&lt;th&gt;items&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;001&lt;/td&gt;
&lt;td&gt;Alice&lt;/td&gt;
&lt;td&gt;Apple 2, Banana 5&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;You can&amp;rsquo;t query &amp;ldquo;all orders containing Apple&amp;rdquo; without string parsing.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Good (1NF):&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;order_id&lt;/th&gt;
&lt;th&gt;customer&lt;/th&gt;
&lt;th&gt;product_id&lt;/th&gt;
&lt;th&gt;product&lt;/th&gt;
&lt;th&gt;qty&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;001&lt;/td&gt;
&lt;td&gt;Alice&lt;/td&gt;
&lt;td&gt;A1&lt;/td&gt;
&lt;td&gt;Apple&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;001&lt;/td&gt;
&lt;td&gt;Alice&lt;/td&gt;
&lt;td&gt;B2&lt;/td&gt;
&lt;td&gt;Banana&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Candidate key: &lt;code&gt;(order_id, product_id)&lt;/code&gt;. Atomic. But Alice&amp;rsquo;s name is repeated, and if you add &lt;code&gt;address&lt;/code&gt;, it repeats too. That&amp;rsquo;s the next problem.&lt;/p&gt;
&lt;h1 id="2nf-no-partial-dependencies"&gt;2NF: No Partial Dependencies
&lt;/h1&gt;&lt;p&gt;&lt;strong&gt;Rule:&lt;/strong&gt; already 1NF, and every non-prime attribute depends on the &lt;strong&gt;entire&lt;/strong&gt; candidate key, not just part of it.&lt;/p&gt;
&lt;p&gt;In the 1NF table above, &lt;code&gt;customer&lt;/code&gt; depends only on &lt;code&gt;order_id&lt;/code&gt;, not on the full key &lt;code&gt;(order_id, product_id)&lt;/code&gt;. That&amp;rsquo;s a partial dependency.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; pull partial dependencies into their own tables.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;order_id&lt;/th&gt;
&lt;th&gt;customer&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;001&lt;/td&gt;
&lt;td&gt;Alice&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;product_id&lt;/th&gt;
&lt;th&gt;product&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;A1&lt;/td&gt;
&lt;td&gt;Apple&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;B2&lt;/td&gt;
&lt;td&gt;Banana&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;order_id&lt;/th&gt;
&lt;th&gt;product_id&lt;/th&gt;
&lt;th&gt;qty&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;001&lt;/td&gt;
&lt;td&gt;A1&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;001&lt;/td&gt;
&lt;td&gt;B2&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Now every non-prime attribute depends on the full key in its table.&lt;/p&gt;
&lt;h1 id="3nf-no-transitive-dependencies"&gt;3NF: No Transitive Dependencies
&lt;/h1&gt;&lt;p&gt;&lt;strong&gt;Rule:&lt;/strong&gt; already 2NF, and no non-prime attribute depends on another non-prime attribute.&lt;/p&gt;
&lt;p&gt;Back to the NUS student table. After 2NF, it looks like:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;email&lt;/th&gt;
&lt;th&gt;name&lt;/th&gt;
&lt;th&gt;department&lt;/th&gt;
&lt;th&gt;faculty&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a class="link" href="mailto:tikki@gmail.com" &gt;tikki@gmail.com&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;TIKKI TAVI&lt;/td&gt;
&lt;td&gt;CS&lt;/td&gt;
&lt;td&gt;School of Computing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a class="link" href="mailto:rikki@gmail.com" &gt;rikki@gmail.com&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;RIKKI TAVI&lt;/td&gt;
&lt;td&gt;CS&lt;/td&gt;
&lt;td&gt;School of Computing&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;code&gt;email → department → faculty&lt;/code&gt;. The chain &lt;code&gt;email → faculty&lt;/code&gt; is transitive through &lt;code&gt;department&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The anomaly:&lt;/strong&gt; NUS renames &amp;ldquo;School of Computing&amp;rdquo; to &amp;ldquo;NUS Computing.&amp;rdquo; You must update every row where &lt;code&gt;department = CS&lt;/code&gt;. Miss one row and your DB says CS belongs to two different faculties.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; decompose the transitive FD into its own table.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;email&lt;/th&gt;
&lt;th&gt;name&lt;/th&gt;
&lt;th&gt;department&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a class="link" href="mailto:tikki@gmail.com" &gt;tikki@gmail.com&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;TIKKI TAVI&lt;/td&gt;
&lt;td&gt;CS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a class="link" href="mailto:rikki@gmail.com" &gt;rikki@gmail.com&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;RIKKI TAVI&lt;/td&gt;
&lt;td&gt;CS&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;department&lt;/th&gt;
&lt;th&gt;faculty&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;CS&lt;/td&gt;
&lt;td&gt;School of Computing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Chemistry&lt;/td&gt;
&lt;td&gt;Faculty of Science&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;One row to update. SSOT.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Memory trick:&lt;/strong&gt; &amp;ldquo;every non-key column describes the key, the whole key, and nothing but the key.&amp;rdquo;&lt;/p&gt;
&lt;h1 id="bcnf-every-determinant-must-be-a-superkey"&gt;BCNF: Every Determinant Must Be a Superkey
&lt;/h1&gt;&lt;p&gt;3NF has a loophole: it only restricts non-prime attributes. If &lt;em&gt;all&lt;/em&gt; attributes are prime (part of some candidate key), 3NF can&amp;rsquo;t complain, even when a problematic FD exists.&lt;/p&gt;
&lt;p&gt;BCNF closes that loophole with a simpler, stricter rule: &lt;strong&gt;for every non-trivial FD &lt;code&gt;X → Y&lt;/code&gt;, X must be a superkey.&lt;/strong&gt; No exceptions.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Example: Gym form-review system&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Say you&amp;rsquo;re building an app where coaches review users&amp;rsquo; exercise form via video clips. Each coach specialises in one exercise (e.g., a deadlift coach only reviews deadlift clips).&lt;/p&gt;
&lt;p&gt;&lt;code&gt;form_review(user_id, exercise_id, coach_id)&lt;/code&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;user_id&lt;/th&gt;
&lt;th&gt;exercise_id&lt;/th&gt;
&lt;th&gt;coach_id&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Jing&lt;/td&gt;
&lt;td&gt;deadlift&lt;/td&gt;
&lt;td&gt;Coach_A&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Edmund&lt;/td&gt;
&lt;td&gt;deadlift&lt;/td&gt;
&lt;td&gt;Coach_A&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Jing&lt;/td&gt;
&lt;td&gt;squat&lt;/td&gt;
&lt;td&gt;Coach_B&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;FDs:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;{user_id, exercise_id} → coach_id&lt;/code&gt; (each user-exercise pair gets one coach)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;coach_id → exercise_id&lt;/code&gt; (each coach specialises in one exercise)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Candidate keys: &lt;code&gt;{user_id, exercise_id}&lt;/code&gt; and &lt;code&gt;{user_id, coach_id}&lt;/code&gt;. All three columns are prime — they all belong to at least one candidate key. 3NF has nothing to complain about.&lt;/p&gt;
&lt;p&gt;But &lt;code&gt;coach_id → exercise_id&lt;/code&gt; violates BCNF. &lt;code&gt;coach_id&lt;/code&gt; alone is not a superkey because it doesn&amp;rsquo;t determine &lt;code&gt;user_id&lt;/code&gt; (one coach reviews many users).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The anomaly:&lt;/strong&gt; Coach_A switches specialisation from deadlift to squat. You have to update every row where &lt;code&gt;coach_id = Coach_A&lt;/code&gt;. Miss one and your DB says Coach_A specialises in both deadlift and squat, contradicting the business rule.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; decompose into two tables.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;user_id&lt;/th&gt;
&lt;th&gt;coach_id&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Jing&lt;/td&gt;
&lt;td&gt;Coach_A&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Edmund&lt;/td&gt;
&lt;td&gt;Coach_A&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Jing&lt;/td&gt;
&lt;td&gt;Coach_B&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;coach_id&lt;/th&gt;
&lt;th&gt;exercise_id&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Coach_A&lt;/td&gt;
&lt;td&gt;deadlift&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Coach_B&lt;/td&gt;
&lt;td&gt;squat&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Coach_A switches to squat? One row updated in the second table. Every determinant is now a superkey in its own table.&lt;/p&gt;
&lt;h1 id="summary"&gt;Summary
&lt;/h1&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Normal Form&lt;/th&gt;
&lt;th&gt;Eliminates&lt;/th&gt;
&lt;th&gt;Core Rule&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;1NF&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Non-atomic values&lt;/td&gt;
&lt;td&gt;Every column holds one value&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;2NF&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Partial dependencies&lt;/td&gt;
&lt;td&gt;Non-prime attributes depend on the &lt;strong&gt;whole&lt;/strong&gt; candidate key&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;3NF&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Transitive dependencies&lt;/td&gt;
&lt;td&gt;Non-prime attributes depend only on candidate keys&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;BCNF&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Non-superkey determinants (including prime-on-prime)&lt;/td&gt;
&lt;td&gt;Every determinant is a superkey&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;For most systems, 3NF or BCNF is the practical target. Normalise to eliminate redundancy. Denormalise deliberately (with awareness of the trade-off) when query performance demands it.&lt;/p&gt;</description></item></channel></rss>